OpenBMC replacing AMI AST2500 BMC fw on Gigabyte MC12-LE0 - questions

Johannes Truschnigg johannes at truschnigg.info
Wed May 29 04:37:08 AEST 2024


Hi again! :)

On Tue, May 28, 2024 at 09:56:38AM +0930, Andrew Jeffery wrote:

> [...]
> > ```
> > root at grml ~ # /tmp/culvert probe
> > [*] failed to initialise devmem bridge: -1
> > [*] Accessing the BMC's AHB via the p2a bridge
> > debug:  Permissive
> >         Debug UART port: UART5
> > xdma:   Restricted
> >         BMC: Disabled
> >         VGA: Enabled
> >         XDMA on VGA: Enabled
> >         XDMA is constrained: Yes
> > p2a:    Permissive
> >         BMC: Disabled
> >         VGA: Enabled
> >         MMIO on VGA: Enabled
> >         [0x00000000 - 0x0fffffff]   Firmware: Writable
> >         [0x10000000 - 0x1fffffff]     SoC IO: Writable
> >         [0x20000000 - 0x2fffffff]  BMC Flash: Writable
> >         [0x30000000 - 0x3fffffff] Host Flash: Writable
> >         [0x40000000 - 0x5fffffff]   Reserved: Writable
> >         [0x60000000 - 0x7fffffff]   LPC Host: Writable
> >         [0x80000000 - 0xffffffff]       DRAM: Writable
> > ilpc:   Permissive
> >         SuperIO address: 0x2e
> > ```
> > 
> > (Before my mishap, whenever I tried to run that command, all but `ilpc` yielded
> > nothing, and even `ilpc` reported "Restricted" - none of which I know how to
> > interpret at all, by the way! :D)
> 
> So I have a bit of an educated guess here:
> 
> 1. For AST SoCs prior to the AST2600, disabling the hardware backdoors
> must be done in firmware
> 2. Your process above has corrupted u-boot
> 3. The corruption is such that u-boot fails prior to applying the
> backdoor mitigations (if that's where the mitigation was done to begin
> with - however it's where we do the mitigation in OpenBMC)
> 
> As for the culvert 'probe' output and the
> Permissive/Restricted/Disabled states:
> 
> - Permissive means there are no constraints on the bridge - culvert can
> read and write any AHB address
> - Restricted means some constraints apply to the bridge - either the
> address space is restricted (e.g. XDMA is constrained to the VGA
> reserved memory), or write access is disabled for some portion or all
> of the AHB address space (e.g. the P2A write filters for the listed
> regions)
> - Disabled means what it says on the tin, the BMC's AHB address space
> cannot be accessed via the bridge at hand, we cannot read or write.

Thanks for this summary and explanation. The "I know some of these words" is
strong in me still, esp. during the later paragraphs, but I think I have a
chance to get to understand more about how it all comes together eventually ;)


> > So I guess I have at least six questions now:
> > 
> > 1.) What happened when the kernel called it quits, u-boot reloaded and decided
> > to format some of its flash?
> 
> Not sure, but you'd do well to boot a kernel that doesn't try to mount
> partitions from the flash.

Understood. Is that always the case for OpenBMC kernel images with default
config?


> > 3.) Can I restore the original firmware/SPI content on my board by any means
> > from the now running host OS? If so, what way would you suggest I try first?
> 
> > 4.) Does having `culvert` have this new level of access have any new advantages
> > or open possibilities that I should be aware of?
> 
> So one of the motivations for culvert is to reflash the BMC over the
> AHB bridges reported in the probe output. This works regardless of the
> state of the BMC firmware, so long as it hasn't disabled the hardware
> backdoors. You can try it with e.g.
> 
> ```
> # culvert -vv write firmware < $firmware_image
> ```
> 
> (you may want to experiment with `culvert -vv read firmware` first).
> 
> That said, experience in [1] suggests Gigabyte have introduced some
> gremlins that aren't accounted for by culvert, and that you might have
> more success with gigaflash.
> 
> [1]:
> https://github.com/amboar/culvert/issues/51#issuecomment-2129043859

So I did play around with this after I'd read the above github issue's latest
additions (how nice it was to learn that someone actually could use what
little info I had produced so far already! :)), and I concluded I would want
to try to get `culvert` to not flash back stock fw, but the OpenBMC phosphor
image I had prepared a while ago.

And indeed, it can be done, and I did it. I am just nor exactly sure *how* or
*why* I could do it, but maybe you can figure that out with the information I
gathered.

First, I used `gigaflash` to dump BMC ROM to a file, and that worked somewhat
unreliably at first, to my surprise - the first `gigaflash_x64 -dump somefile`
attempt caused my host to hang, but it recovered to its original state (BMC
dead, live distro (grml amd64) booted OK) after a power cycle.

The subsequent attempts using a more up-to-date release of gigaflash worked,
and while fooling around with culvert and gigaflash (to check if gigaflash
always produced the same result and to find out if the switches that Gigabyte's
provided wrapper shellscript used to updaye fw actually made a difference), I
noticed that at some point, culvert suddenly could (correctly, bit-by-bit
identically!) dump the ROM, too!

For posterity, this is what it looked like when gigaflash successfully dumped
the image for me:

```
# Tools/gigaflash_x64 -dump test.img -cs 0 -2500
gigaflash v2.0.10
Failed to connect BMC, try to dump image!
--- Dump image from BMC...
Find ASPEED Device 1a03:2000 on 4:0.0 
MMIO Virtual Address: a4e84000 
Relocate IO Base: f000 
Found ASPEED Device 1a03:2500 rev. 41 
Static Memory Controller Information: 
CS0 Flash Type is SPI 
CS1 Flash Type is SPI 
CS2 Flash Type is SPI 
CS3 Flash Type is NOR 
CS4 Flash Type is NOR 
Boot CS is 0 
Option Information: 
CS: 0 
Flash Type: SPI 
[Warning] Don't AC OFF or Reboot System During BMC Firmware Update!! 
[SOCFLASH] Flash ID : 20ba20 
Find Flash Chip #1: Numonyx N25Q512 
Backup Flash Chip O.K.                 
--- Dump image finished
--- Wait 90 secs for BMC ready...
```


I tried to arrive at a minimal reproducer on what I had to do after a power
cut to get `culvert read firmware` to work, and in the end, it seems to be
that one needs to run

`gigaflash -dump some_filename -2500`

... once, and *only then* culvert can read the data as well.

A session log I kept when it successfully dumped looked like this:

```
[*] Found 5 registered bridge drivers
[*] Trying bridge driver l2a
[*] Failed to initialise L2A bridge: -95
[*] Trying bridge driver ilpc
[*] Probing ilpc
[*] Probing 0x2e for SuperIO
[*] Unlocking SuperIO: 0
[*] Selecting SuperIO device 2 (SUART1): 0
[*] Found device 2 selected: 0
[*] Selecting SuperIO device 12 (SUART4): 0
[*] Found device 12 selected: 0
[*] Locking SuperIO
[*] Found SuperIO device at 0x2e
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf7cffedc
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Trying bridge driver devmem
[*] failed to initialise devmem bridge: -1
[*] Trying bridge driver debug-uart
[*] Unrecognised argument list for debug interface (0)
[*] Trying bridge driver p2a
[*] Probing p2a
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf7cffedc
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Accessing the BMC's AHB via the p2a bridge
[*] Probing for SoC revision registers
[*] ahb_readl: 0x1e6e2004: 0xf7cffedc
[*] ahb_readl: 0x1e6e207c: 0x04030303
[*] Found revision 0x4030303
[*] Selected devicetree for SoC 'aspeed,ast2500'
[*] Found 15 registered drivers
[*] Processing devicetree node at /aliases
[*] Processing devicetree node at /memory at 80000000
[*] Processing devicetree node at /ahb
[*] Processing devicetree node at /ahb/sram at 1e720000
[*] Processing devicetree node at /ahb/bus-controller at 1e600000
[*] Bound trace driver to /ahb/bus-controller at 1e600000
[*] Processing devicetree node at /ahb/apb
[*] Processing devicetree node at /ahb/apb/spi at 1e620000
[*] Bound sfc driver to /ahb/apb/spi at 1e620000
[*] Processing devicetree node at /ahb/apb/spi at 1e630000
[*] Bound sfc driver to /ahb/apb/spi at 1e630000
[*] Processing devicetree node at /ahb/apb/spi at 1e631000
[*] Bound sfc driver to /ahb/apb/spi at 1e631000
[*] Processing devicetree node at /ahb/apb/memory-controller at 1e6e0000
[*] Bound sdmc driver to /ahb/apb/memory-controller at 1e6e0000
[*] Processing devicetree node at /ahb/apb/syscon at 1e6e2000
[*] Processing devicetree node at /ahb/apb/syscon at 1e6e2000/clock
[*] Bound clk driver to /ahb/apb/syscon at 1e6e2000/clock
[*] Processing devicetree node at /ahb/apb/syscon at 1e6e2000/strapping
[*] Bound strap driver to /ahb/apb/syscon at 1e6e2000/strapping
[*] Processing devicetree node at /ahb/apb/syscon at 1e6e2000/superio
[*] Bound sioctl driver to /ahb/apb/syscon at 1e6e2000/superio
[*] Processing devicetree node at /ahb/apb/syscon at 1e6e2000/bridge-controller
[*] Bound bridge-controller driver to /ahb/apb/syscon at 1e6e2000/bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon at 1e6e2000/debug-bridge-controller
[*] Bound debugctl driver to /ahb/apb/syscon at 1e6e2000/debug-bridge-controller
[*] Processing devicetree node at /ahb/apb/syscon at 1e6e2000/pcie-bridge-controller
[*] Bound pciectl driver to /ahb/apb/syscon at 1e6e2000/pcie-bridge-controller
[*] Bound scu driver to /ahb/apb/syscon at 1e6e2000
[*] Processing devicetree node at /ahb/apb/watchdog at 1e785000
[*] Bound wdt driver to /ahb/apb/watchdog at 1e785000
[*] Processing devicetree node at /ahb/apb/watchdog at 1e785020
[*] Bound wdt driver to /ahb/apb/watchdog at 1e785020
[*] Processing devicetree node at /ahb/apb/watchdog at 1e785040
[*] Bound wdt driver to /ahb/apb/watchdog at 1e785040
[*] Processing devicetree node at /ahb/apb/serial at 1e787000
[*] Bound vuart driver to /ahb/apb/serial at 1e787000
[*] Processing devicetree node at /ahb/apb/lpc at 1e789000
[*] Processing devicetree node at /ahb/apb/lpc at 1e789000/bridge-controller
[*] Bound ilpcctl driver to /ahb/apb/lpc at 1e789000/bridge-controller
[*] Bound uart-mux driver to /ahb/apb/lpc at 1e789000
[*] Initialising flash controller
[*] fdt: Looking up device name 'fmc'
[*] fdt: Locating node with device path '/ahb/apb/spi at 1e620000'
[*] ahb_readl: 0x1e6e2000: 0x00000001
[*] Initialised scu driver
[*] Initialised clk driver
[*] ahb_readl: 0x1e6e2070: 0xf120f287
[*] ahb_readl: 0x1e620010: 0x00002400
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620094: 0x00000000
[*] Initialised sfc driver
[*] Initialising flash chip
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x02020202
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Init status: 02
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x1020ba20
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Flash ID: 20.ba.20 (20ba20)
[*] LIBFLASH: Found chip Micron N25Qx512Ax size 64M erase granule: 4K
[*] LIBFLASH: Flash >16MB, enabling 4B mode...
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_readl: 0x20000000: 0x02020202
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000403
[*] ahb_writel: 0x1e620010: 0x00000407
[*] ahb_writel: 0x1e620010: 0x00000400
[*] LIBFLASH: Enabling controller 4B mode...
[*] ahb_readl: 0x1e620004: 0x00000701
[*] ahb_writel: 0x1e620010: 0x00002400
[*] ahb_writel: 0x1e620004: 0x00000701
[*] Write-protecting all chip-selects
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] Exfiltrating BMC flash to stdout

................................................................
[*] ahb_readl: 0x1e620000: 0x8007002a
[*] ahb_writel: 0x1e620000: 0x8007002a
[*] Unbound instance of driver uart-mux
[*] Unbound instance of driver ilpcctl
[*] Unbound instance of driver vuart
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver scu
[*] Unbound instance of driver pciectl
[*] Unbound instance of driver debugctl
[*] Unbound instance of driver bridge-controller
[*] Unbound instance of driver sioctl
[*] Unbound instance of driver strap
[*] Unbound instance of driver clk
[*] Unbound instance of driver sdmc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] ahb_writel: 0x1e620010: 0x00002400
[*] Unbound instance of driver sfc
[*] Unbound instance of driver trace
```

I did also try to get mmiotrace, which you mentioned in the GH issue (I have
never worked with it before, but I am familiar with ftrace, so I am not 100%
but reasonably certain that I did not hold it wrong), to work, but I could not
make it emit any tracing data while either culvert or gigaflash were dumping
ROM. Only when (un)loading the `ast` driver, lots of tracing data could be
collected.

I do have strace capture of `gigaflash` running for the first time after a
reboot, but all the juicy bits seem to hide behind mmap() anyway, so I will
only provide it upon request.

Then it dawned on me that `culvert probe` before and after the gigaflash
"unlock" might hold the key, and when diffing two runs' results (one before,
when culvert could not dump anything, and one after, when it very well could)
yielded this result:

```
$ diff -u1 culvert_probe_initial culvert_probe_after_dump 
--- culvert_probe_initial
+++ culvert_probe_after_dump
@@ -121,4 +121,4 @@
 [*] ahb_readl: 0x1e6e2070: 0xf120f286
-[*] ahb_readl: 0x1e789100: 0x00000000
-ilpc:  Permissive
+[*] ahb_readl: 0x1e789100: 0x00000040
+ilpc:  Restricted
 [*] ahb_readl: 0x1e6e2070: 0xf120f286
```

I do not know how to interpret this, but here's goping you can tell if this
might help solve the mystery also reported in the GH issue? :)


> > 5.) Suppose I can restore the BMC's original SPI content and behavior - what's
> > a recommended way to have the TFPT'd kernel boot into an OpenBMC rootfs
> > *without* having it store on the BMC's main storage/overwriting SPI?
> 
> If you're looking to deal with OpenBMC directly then this collection of
> patches from Patrick will probably help:
> 
> https://gerrit.openbmc.org/q/topic:%22no-rootfs%22

Thanks a bunch, this seems like a very useful reading list for when I sorted my
immediate trouble with the board/BMC resulting of what I am about to detail
at the end of this mail! ;)


> > [...]
> > 6.) Assuming this cannot be recovered in software - what are my chances of
> > identifying the SPI flash on my board as such, and re-writing its contents
> > using an affordable SPI programmer solution, given that I've never done
> > anything like this with hardware before? :^)
> 
> From the manual[2] I expect it's in the unmarked socket between the
> PCIe x4 (27), M.2 (28) and PCIe x16 (29) slots.

Thanks, I was afraid so - despite the diagram in the manual, the IC is not
socketed, so I guess I'll have to find someone who'd be able to desolder it for
me, or find a test clip that is compatible with that kind of "socket".


Anyhow, like I hinted at above, that's not the end of today's episode: I did
try if I could get culvert to *write* my OpenBMC flash in from my root prompt,
and it readily complied on first try (after I'd gotten it to dump first).
Unfortunately, I ran it with `-v -v` in effect, LOTS of debug output
overwhelmed my generous scrollback buffer, and I could only preserve the last
few thousands of lines it put out while writing and verifying flash. It looks a
lot like this:

```
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002400
.[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002403
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002400
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002403
[*] ahb_readl: 0x20000000: 0x02020202
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002400
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002403
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002400
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002403
[*] ahb_readl: 0x20000000: 0x03030303
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002400
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002403
[*] ahb_readl: 0x20000000: 0x00000000
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002400
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002403
[*] ahb_readl: 0x20000000: 0x81818181
[*] ahb_writel: 0x1e620010: 0x00002407
[*] ahb_writel: 0x1e620010: 0x00002400
.
[*] LIBFLASH: Verifying...
................................................................................................................................................................................................................................................................
[*] Performing SoC reset
[*] fdt: Looking up device name 'wdt2'
[*] fdt: Locating node with device path '/ahb/apb/watchdog at 1e785020'
[*] ahb_readl: 0x1e78502c: 0x00000010
[*] wdt_readl:  base: 0x1e785020, reg: 0x0c, val: 0x00000010
[*] wdt_writel: base: 0x1e785020, reg: 0x0c, val: 0x00000010
[*] ahb_writel: 0x1e78502c: 0x00000010
[*] ahb_readl: 0x1e78502c: 0x00000010
[*] wdt_readl:  base: 0x1e785020, reg: 0x0c, val: 0x00000010
[*] wdt_writel: base: 0x1e785020, reg: 0x0c, val: 0x00000010
[*] ahb_writel: 0x1e78502c: 0x00000010
[*] wdt_writel: base: 0x1e785020, reg: 0x1c, val: 0x023ffffb
[*] ahb_writel: 0x1e78503c: 0x023ffffb
[*] ahb_readl: 0x1e78502c: 0x00000010
[*] wdt_readl:  base: 0x1e785020, reg: 0x0c, val: 0x00000010
[*] wdt_writel: base: 0x1e785020, reg: 0x04, val: 0x004c4b40
[*] ahb_writel: 0x1e785024: 0x004c4b40
[*] wdt_writel: base: 0x1e785020, reg: 0x08, val: 0x00004755
[*] ahb_writel: 0x1e785028: 0x00004755
[*] ahb_readl: 0x1e78502c: 0x00000010
[*] wdt_readl:  base: 0x1e785020, reg: 0x0c, val: 0x00000010
[*] wdt_writel: base: 0x1e785020, reg: 0x0c, val: 0x00000013
[*] ahb_writel: 0x1e78502c: 0x00000013
[*] Waiting 6000000 microseconds for watchdog timer to expire
[*] ahb_writel: 0x1e6e207c: 0x00000001
[*] wdt_writel: base: 0x1e785020, reg: 0x04, val: 0x00000000
[*] ahb_writel: 0x1e785024: 0x00000000
[*] Unbound instance of driver uart-mux
[*] Unbound instance of driver ilpcctl
[*] Unbound instance of driver vuart
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver wdt
[*] Unbound instance of driver scu
[*] Unbound instance of driver pciectl
[*] Unbound instance of driver debugctl
[*] Unbound instance of driver bridge-controller
[*] Unbound instance of driver sioctl
[*] Unbound instance of driver strap
[*] Unbound instance of driver clk
[*] Unbound instance of driver sdmc
[*] Unbound instance of driver sfc
[*] Unbound instance of driver sfc
[*] ahb_writel: 0x1e620010: 0x00002400
[*] Unbound instance of driver sfc
[*] Unbound instance of driver trace
/tmp/culvert -v -v write firmware <   190.48s user 69.45s system 97% cpu 4:25.94 total
```

The result is both very encouraging and also disappointing, because as you
intially theorized, the BMC boots fine with OpenBMC flashed onto it - but none
of its host system manmagement capabilities actually work. I do have the
vanilla OpenBMC web application available now, with DHCP, DNS, NTP et al.
working fine for the BMC, can log in via SSH, but all the peripherals the BMC
ought to be able to manage and hook into do not work at present.


The next step that I would want to take is that I find a way to revert to the
BMC stock fw with having only OpenBMC running, since the host apparently cannot
boot any more (same situation as with the stock BMC fw when u-boot had
initialized, but no BMC system was allowed to boot up, afaict - the power
button/contacts just do nothing) in this state. After that, I would like to
establish a sane (and hopefully easy) way to convert the board's BMC firmware
from OpenBMC to stock, and vice versa.

Once I have established a surefire and straightfoward way to do what I have
done in such meandering and clumsy attempts, I would like to learn more about
how the "M" is actually put into this whole "BMC" thing, and see how far I can
take that. The stock fw has some interesting description files regarding i2c
configs that might come in handy, but I am just not educated enough (yet, I
hope) to make real sense of it :)

Can you perhaps offer me advice on how to flash arbitrary new SPI flash
contents from either OpenBMC's u-boot or an OpenBMC root shell, or what I would
need to look at in detail to learn how to do that?

As always, I am very grateful for anyone's advice and time. Thank you! :)

-- 
with best regards:
- Johannes Truschnigg ( johannes at truschnigg.info )

www:   https://johannes.truschnigg.info/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20240528/17b71cc1/attachment-0001.sig>


More information about the openbmc mailing list