<div dir="ltr"><div dir="ltr">Hi Patrick<div><br></div><div>Here some thoughts:</div><div><br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> 1. Power off host server.<br>
<br>I will admit I don't know the Intel architecture well enough yet, but is<br>
powering off the server prior to BIOS update actually required? Is the<br>
BIOS NOR chip always mapped into a physical address and used, or is the<br>
BIOS at some point loaded and resident? Are there stable points where<br>
it is safe to perform an update? Can we monitor POST code to know when<br>
the BIOS is completed?<br></blockquote><div>There are two issues:</div><div><ul><li>The host may access the BIOS SPI flash at any time by making BIOS calls. UEFI variables are such an example. The problem is that the BIOS code that executes these requests does not handle cases at all where the BIOS SPI flash becomes inaccessible. This results in an immediate crash of the host.</li><li>With ME in operational mode, we cannot guarantee that ME would not attempt to read/write from the SPI flash while the host is running. I'm not sure if it's possible to put ME into recovery mode WHILE the host is running or if the host needs to be shut down for that.</li></ul></div><div>My understanding is that the only way to safely write the BIOS SPI flash from the BMC is to shut the host down and put ME into recovery mode. Alternatively, hold the host in full reset via RSMRST.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> 2. Set ME/NM (Management engine or Node manager in x86) to recovery mode<br>
<br>
Is this specific to the BIOS update path or is this something we should<br>
do whenever the Host is powered off? In either case I guess you can<br>
make it a dependency on the systemd unit file, but it seems like it<br>
would be nice if it were able to be generically applied to all power<br>
on/off paths.<br></blockquote><div>This question opens a can of worms. There are people who say that ME should always be run in recovery mode ...</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> 3. Flip GPIO to access SPI flash used by host.<br>
> 4. Bind spi driver to access flash<br>
<br>
This is another thing that seems like we could do generically on all<br>
power on / power off paths? Any time the host isn't running we can hit<br>
the GPIO to put ownership at the BMC. Is there any disadvantage to<br>
that?<br></blockquote><div>Yes. You cannot turn the host on via a power button if the PCH cannot access the SPI flash. You'd have to catch that signal in the BMC and do the right thing.</div><div><br></div><div>What's the advantage of having the BIOS SPI flash always connected to the BMC when the host is off? That seems to be making things more complicated to me.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Is the GPIO something unique to Facebook's machines or do most other<br>
Intel machines have the same requirements?<br></blockquote><div><div><br></div><div>I'm not sure if it was explained what the GPIO does:</div><div>Since the SPI flash can only have one master, a "mux" (it's really a digital switch, or a pair of digital switches) connect the SPI flash either to the PCH for access by the ME / host or to the BMC. The GPIO or pair of GPIOs is used to control the mux / bus switches.</div><div><br></div><div>If the SPI flash is connected to the BMC, the ME / host cannot access it at all. As it turns out, the PCH needs to be able to read the SPI flash to be able to "turn on" the host.</div><div></div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
> 5. Flashcp image to device.<br>
<br>
I don't think `flashcp` is used today, or at least not in my<br>
recollection of the previous Witherspoon implementation. Is there any<br>
advantage to it over `dd` to the raw mtdblock device?<br></blockquote><div>I'm new to this, too, and found this explanation: <a href="https://unix.stackexchange.com/questions/274217/how-is-erasing-mtd-with-dd-if-dev-zero-different-from-flash-eraseall">https://unix.stackexchange.com/questions/274217/how-is-erasing-mtd-with-dd-if-dev-zero-different-from-flash-eraseall</a></div><div><br></div><div>This question was asked in the context of erase, but it applies to writes, too.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> 9. Power on server.<br>
<br>
Doesn't seem like "power on" should be a side-effect of a BIOS update.<br>
Is this intended to be "go back to the previous power state"?<br></blockquote><div>+1<br></div><div><br></div><div>Having said all that, I was experimenting with pretty much the same flow but ended up with unreliable writes with individual bit flips. I'm pretty sure the HW is fine, since the original (AMI) stock firmware that comes with the board can do it just fine. This is with an Aspeed AST2500, a C620 PCH and a Dediprog EM100 SPI flash emulator.</div><div><br></div><div>I had even tried to change the SPI flash clock from the Aspeed down to the minimum, with no change :-/ I already hooked up a logic analyzer to see what's going on but haven't had a chance to investigate yet. Any ideas?</div><div><br></div><div>Oskar.</div></div></div>