Bios upgrade from BMC
patrick at stwcx.xyz
Fri Jan 31 11:22:18 AEDT 2020
On Thu, Jan 30, 2020 at 11:30:10AM -0500, Oskar Senft wrote:
> Hi Patrick
> Here some thoughts:
> > 1. Power off host server.
> > I will admit I don't know the Intel architecture well enough yet, but is
> > powering off the server prior to BIOS update actually required? Is the
> > BIOS NOR chip always mapped into a physical address and used, or is the
> > BIOS at some point loaded and resident? Are there stable points where
> > it is safe to perform an update? Can we monitor POST code to know when
> > the BIOS is completed?
> There are two issues:
> - The host may access the BIOS SPI flash at any time by making BIOS
> calls. UEFI variables are such an example. The problem is that the BIOS
> code that executes these requests does not handle cases at all where the
> BIOS SPI flash becomes inaccessible. This results in an immediate crash of
> the host.
> - With ME in operational mode, we cannot guarantee that ME would not
> attempt to read/write from the SPI flash while the host is running. I'm not
> sure if it's possible to put ME into recovery mode WHILE the host is
> running or if the host needs to be shut down for that.
> My understanding is that the only way to safely write the BIOS SPI flash
> from the BMC is to shut the host down and put ME into recovery mode.
> Alternatively, hold the host in full reset via RSMRST.
Good to know, thanks.
> > > 2. Set ME/NM (Management engine or Node manager in x86) to recovery
> > mode
> > Is this specific to the BIOS update path or is this something we should
> > do whenever the Host is powered off? In either case I guess you can
> > make it a dependency on the systemd unit file, but it seems like it
> > would be nice if it were able to be generically applied to all power
> > on/off paths.
> This question opens a can of worms. There are people who say that ME should
> always be run in recovery mode ...
Hah. I think it is worth answering if the ME provides any useful
function when the server is powered off though. I don't know, but it
would potentially simplify the BIOS update flow if Host Off => ME in
> > > 3. Flip GPIO to access SPI flash used by host.
> > > 4. Bind spi driver to access flash
> > This is another thing that seems like we could do generically on all
> > power on / power off paths? Any time the host isn't running we can hit
> > the GPIO to put ownership at the BMC. Is there any disadvantage to
> > that?
> Yes. You cannot turn the host on via a power button if the PCH cannot
> access the SPI flash. You'd have to catch that signal in the BMC and do the
> right thing.
> What's the advantage of having the BIOS SPI flash always connected to the
> BMC when the host is off? That seems to be making things more complicated
> to me.
It was just another simplification. Usually we have special user
utilities to steal the flash to the BMC and we have this logic in BIOS
update path. Again, if Host Off => BIOS SPI owned by BMC, it simplifies
/ eliminates logic.
> > Is the GPIO something unique to Facebook's machines or do most other
> > Intel machines have the same requirements?
> I'm not sure if it was explained what the GPIO does:
> Since the SPI flash can only have one master, a "mux" (it's really a
> digital switch, or a pair of digital switches) connect the SPI flash either
> to the PCH for access by the ME / host or to the BMC. The GPIO or pair of
> GPIOs is used to control the mux / bus switches.
> If the SPI flash is connected to the BMC, the ME / host cannot access it at
> all. As it turns out, the PCH needs to be able to read the SPI flash to be
> able to "turn on" the host.
Yep, I'm aware of the mux (on Facebook systems). I wasn't sure if this
was common or typical Intel architecture feature or something we
specifically had on our Facebook systems.
> > > 5. Flashcp image to device.
> > I don't think `flashcp` is used today, or at least not in my
> > recollection of the previous Witherspoon implementation. Is there any
> > advantage to it over `dd` to the raw mtdblock device?
> I'm new to this, too, and found this explanation:
> This question was asked in the context of erase, but it applies to writes,
The stackexchange here is referring to /dev/mtdN devices and not
/dev/mtdblockN devices (and I agree for plain-mtd). mtdblock
specifically has the extra logic to deal with erasing and writing in
pages as appropriate.
> > > 9. Power on server.
> > Doesn't seem like "power on" should be a side-effect of a BIOS update.
> > Is this intended to be "go back to the previous power state"?
> Having said all that, I was experimenting with pretty much the same flow
> but ended up with unreliable writes with individual bit flips. I'm pretty
> sure the HW is fine, since the original (AMI) stock firmware that comes
> with the board can do it just fine. This is with an Aspeed AST2500, a C620
> PCH and a Dediprog EM100 SPI flash emulator.
> I had even tried to change the SPI flash clock from the Aspeed down to the
> minimum, with no change :-/ I already hooked up a logic analyzer to see
> what's going on but haven't had a chance to investigate yet. Any ideas?
Sorry, I've got nothing except maybe the original code retries a bunch
to get past random flips? If you are seeing bit-flips even with the
Dediprog, are you sure the bus is any good? Did you solder on headers
to be able to affix the Dediprog? That might have changed the
capacitance enough to affect SPI activity.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: not available
More information about the openbmc