Bios upgrade from BMC
Oskar Senft
osk at google.com
Fri Jan 31 03:30:10 AEDT 2020
Hi Patrick
Here some thoughts:
> 1. Power off host server.
>
> I will admit I don't know the Intel architecture well enough yet, but is
> powering off the server prior to BIOS update actually required? Is the
> BIOS NOR chip always mapped into a physical address and used, or is the
> BIOS at some point loaded and resident? Are there stable points where
> it is safe to perform an update? Can we monitor POST code to know when
> the BIOS is completed?
>
There are two issues:
- The host may access the BIOS SPI flash at any time by making BIOS
calls. UEFI variables are such an example. The problem is that the BIOS
code that executes these requests does not handle cases at all where the
BIOS SPI flash becomes inaccessible. This results in an immediate crash of
the host.
- With ME in operational mode, we cannot guarantee that ME would not
attempt to read/write from the SPI flash while the host is running. I'm not
sure if it's possible to put ME into recovery mode WHILE the host is
running or if the host needs to be shut down for that.
My understanding is that the only way to safely write the BIOS SPI flash
from the BMC is to shut the host down and put ME into recovery mode.
Alternatively, hold the host in full reset via RSMRST.
> > 2. Set ME/NM (Management engine or Node manager in x86) to recovery
> mode
>
> Is this specific to the BIOS update path or is this something we should
> do whenever the Host is powered off? In either case I guess you can
> make it a dependency on the systemd unit file, but it seems like it
> would be nice if it were able to be generically applied to all power
> on/off paths.
>
This question opens a can of worms. There are people who say that ME should
always be run in recovery mode ...
> > 3. Flip GPIO to access SPI flash used by host.
> > 4. Bind spi driver to access flash
>
> This is another thing that seems like we could do generically on all
> power on / power off paths? Any time the host isn't running we can hit
> the GPIO to put ownership at the BMC. Is there any disadvantage to
> that?
>
Yes. You cannot turn the host on via a power button if the PCH cannot
access the SPI flash. You'd have to catch that signal in the BMC and do the
right thing.
What's the advantage of having the BIOS SPI flash always connected to the
BMC when the host is off? That seems to be making things more complicated
to me.
> Is the GPIO something unique to Facebook's machines or do most other
> Intel machines have the same requirements?
>
I'm not sure if it was explained what the GPIO does:
Since the SPI flash can only have one master, a "mux" (it's really a
digital switch, or a pair of digital switches) connect the SPI flash either
to the PCH for access by the ME / host or to the BMC. The GPIO or pair of
GPIOs is used to control the mux / bus switches.
If the SPI flash is connected to the BMC, the ME / host cannot access it at
all. As it turns out, the PCH needs to be able to read the SPI flash to be
able to "turn on" the host.
>
> > 5. Flashcp image to device.
>
> I don't think `flashcp` is used today, or at least not in my
> recollection of the previous Witherspoon implementation. Is there any
> advantage to it over `dd` to the raw mtdblock device?
>
I'm new to this, too, and found this explanation:
https://unix.stackexchange.com/questions/274217/how-is-erasing-mtd-with-dd-if-dev-zero-different-from-flash-eraseall
This question was asked in the context of erase, but it applies to writes,
too.
> > 9. Power on server.
>
> Doesn't seem like "power on" should be a side-effect of a BIOS update.
> Is this intended to be "go back to the previous power state"?
>
+1
Having said all that, I was experimenting with pretty much the same flow
but ended up with unreliable writes with individual bit flips. I'm pretty
sure the HW is fine, since the original (AMI) stock firmware that comes
with the board can do it just fine. This is with an Aspeed AST2500, a C620
PCH and a Dediprog EM100 SPI flash emulator.
I had even tried to change the SPI flash clock from the Aspeed down to the
minimum, with no change :-/ I already hooked up a logic analyzer to see
what's going on but haven't had a chance to investigate yet. Any ideas?
Oskar.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20200130/83e71430/attachment.htm>
More information about the openbmc
mailing list