ipmitool fru write 0 - does not update "baseboard" FRU

Willy Tu wltu at google.com
Wed May 8 09:49:40 AEST 2024


I think we should make it so that if a different IPMI client tries to write
the Fru we prevent it and only allow one write at a time. I think the
fruCache for Read is mostly from some commands that is dealing with one
device ID multiple times. When reading subsequent ids... then it doesn't do
anything.

I think we will need more discussion on that topic since it will be a
larger refactor to make that work.

Sounds good. Please let me know how it goes.

Willy Tu

On Tue, May 7, 2024 at 10:32 AM Oskar Senft <osk at google.com> wrote:

> Hi Willy
>
> Thanks for your input!
>
> From what I can tell, the current implementation will fail in wondrous
> ways if there's more than one IPMI client trying to write FRU at the same
> time. The existing getFru guards against changing target devId between
> calls to not hand-out the same cache for different requests. However, this
> will clearly break when different IPMI clients attempt to write the same or
> different FRUs at the same time.
>
> We could argue whether that's a supported use case or if we just assume
> that'll never happen ... it does seem like quite a bit of an edge case,
> though.
>
> I do see it as an issue if there were multiple clients with only one
> writing, but others reading - that'll fail in similarly weird ways.
>
> I'm wondering: Do we want to have the fruCache for _reads_ at all? It
> seems actually quite wrong, since subsequent reads for the same FRU would
> always return the same result, even if the FRU changed through some other
> mechanism.
>
> Let me work on a fix that would use the cache only for writing and would
> keep it around until the timeout expired.
>
> Oskar.
>
> On Fri, May 3, 2024 at 11:38 AM Willy Tu <wltu at google.com> wrote:
>
>> Hi Oskar,
>>
>> > C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>> was called. Maybe we should keep that cache around until the timeout
>> expires?
>>
>> It seems like this is an issue of multiple clients taking to ipmid. In
>> the middle of writing... There is another client that is reading or
>> something
>> else and will reset the fruCache as you mentioned. In that case, I think
>> it may be best to refactor it out to use another getFru method... maybe
>> like getFruToWrite
>>
>> Maybe just something like this
>>
>> ```
>> std::vector<uint8_t> getFruToWrite(...){
>>   if (writeTimer->isRunning()) {
>>     return fruCacheForWrite;
>>   }
>>   auto [_, fruCacheForWrite] = getFru(...);
>>   return fruCacheForWrite;
>> }
>> ```
>>
>> Also need to change `writeFruCache` and such to use `fruCacheForWrite`
>> and such.
>>
>> Please let me know if that works for you. Feel free to reach out
>> internally for faster feedback.
>>
>> Willy Tu
>>
>> On Thu, May 2, 2024 at 11:32 AM Oskar Senft <osk at google.com> wrote:
>>
>>> Hi everyone
>>>
>>> tl;dr - Attempting "ipmitool fru write" with an input file that
>>> contains additional bytes beyond the actual FRU data does not actually
>>> update the FRU on OpenBMC at head w/ entity-manager.
>>>
>>> Details:
>>>
>>> I ran into an issue with updating the "baseboard" FRU (0), which is
>>> stored as /etc/fru/baseboard.fru.bin. However, I don't think this is
>>> specific to FRU 0 and could apply to other updateable FRUs in the same
>>> way.
>>>
>>>
>>> 1. Start off with a "minimal" /etc/fru/baseboard.fru.bin which just
>>> contains chassis type (that's required for entity-manager's fru-device
>>> to recognize the file.
>>>
>>> root at akita:~# ls -al /etc/fru/baseboard.fru.bin
>>> -rw-r--r--    1 root     root           512 Sep 21 05:21
>>> /etc/fru/baseboard.fru.bin
>>>
>>> root at akita:~# hexdump -C /etc/fru/baseboard.fru.bin
>>> 00000000  01 00 01 00 00 00 00 fe  01 01 17 c0 c0 c1 00 a6
>>> |................|
>>> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>> |................|
>>> *
>>> 00000200
>>>
>>> root at akita:~# ipmitool fru print 0
>>>  Chassis Type          : Rack Mount Chassis
>>>  Chassis Area Checksum : OK
>>>
>>>
>>> 2. Prepare a FRU file with additional data, e.g. with frugy:
>>>
>>> (frugy) osk at osk:~$ cat demo.yml
>>> ChassisInfo:
>>>   type: 23
>>>   part_number: '4711'
>>>   serial_number: '12345'
>>>
>>> (frugy) osk at osk:~$ frugy demo.yml -o demo.bin
>>>
>>> (frugy) osk at osk:~$ ls -al demo.bin
>>> -rw-r----- 1 osk primarygroup 24 May  2 13:51 demo.bin
>>>
>>> (frugy) osk at osk:~$ hexdump -C demo.bin
>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>> |............4711|
>>> 00000010  c5 31 32 33 34 35 c1 d0                           |.12345..|
>>> 00000018
>>>
>>> Note that frugy generates a minimal binary by default. However, other
>>> processes may not and instead produce a fixed size binary blob. This
>>> happens, e.g. when performing "ipmitool fru read" which returns the
>>> whole contents of the underlying storage. A subsequent "ipmitool fru
>>> write" with that blob will fail as explained here.
>>>
>>> To simulate this here, increase the file to 256 bytes:
>>>
>>> (frugy) osk at osk:~$ cp demo.bin demo-256.bin
>>> (frugy) osk at osk:~$ dd if=/dev/zero bs=1 seek=256 count=0 of=demo-256.bin
>>> 0+0 records in
>>> 0+0 records out
>>> 0 bytes copied, 5.1138e-05 s, 0.0 kB/s
>>>
>>> (frugy) osk at osk:~$ ls -al demo-256.bin
>>> -rw-r----- 1 osk primarygroup 256 May  2 13:57 demo-256.bin
>>>
>>> (frugy) osk at osk:~$ hexdump -C demo-256.bin
>>> 00000000  01 00 01 00 00 00 00 fe  01 02 17 c4 34 37 31 31
>>> |............4711|
>>> 00000010  c5 31 32 33 34 35 c1 d0  00 00 00 00 00 00 00 00
>>> |.12345..........|
>>> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>>> |................|
>>> *
>>> 00000100
>>>
>>>
>>> 3. Write the newly generated FRU:
>>>
>>> root at akita:~# ipmitool fru write 0 demo-256.bin
>>> Fru Size         : 512 bytes
>>> Size to Write    : 256 bytes
>>>
>>> Wait ~10 seconds for the fru-device process to reload the contents.
>>>
>>>
>>> 4. Re-read the FRU contents:
>>>
>>> root at akita:~# ipmitool fru print 0
>>>  Chassis Type          : Rack Mount Chassis
>>>  Chassis Area Checksum : OK
>>>
>>>
>>> 5. For comparison, write only the minimal FRU:
>>>
>>> root at akita:~# ipmitool fru write 0 demo.bin
>>> Fru Size         : 512 bytes
>>> Size to Write    : 24 bytes
>>>
>>> Wait ~10 seconds.
>>>
>>> root at akita:~# ipmitool fru print 0
>>>  Chassis Type          : Rack Mount Chassis
>>>  Chassis Part Number   : 4711
>>>  Chassis Serial        : 12345
>>>  Chassis Area Checksum : OK
>>>
>>>
>>> I dug into this and found that this is caused by an interaction
>>> between
>>> https://github.com/openbmc/phosphor-host-ipmid/blob/master/dbus-sdr/storagecommands.cpp
>>> and
>>> https://github.com/openbmc/entity-manager/blob/master/src/fru_device.cpp
>>> .
>>>
>>> Here's what happens:
>>> - The "ipmitool fru write" request is handled by storagecommands.cpp
>>> ipmiStorageWriteFruData.
>>>
>>> - ipmiStorageWriteFruData receives the whole FRU blob from IPMI across
>>> several calls. On the initial call, it requests the current FRU blob
>>> via D-bus from fru-device and caches it in the process. It then
>>> overwrites sections of that cache with the received data from IPMI.
>>>
>>> - ipmiStorageWriteFruData uses information from the FRU header to
>>> check whether it received all the bytes that make up the new FRU. Note
>>> that this could be fewer bytes than the size of the cached blob. Once
>>> it receives all the bytes for the new FRU, it calls
>>> /xyz/openbmc_project/FruDevice WriteFru via D-Bus with the full FRU
>>> blob (i.e. the full cache with modifications on top). After that the
>>> cache is cleared.
>>>
>>> - The D-Bus call to WriteFru is handled by fru_device.cpp writeFRU. It
>>> first performs a sanity check and then writes the blob to the
>>> underlying device (or the /etc/fru/baseboard.fru.bin for FRU 0). It
>>> subsequently calls rescanBusses() which actually executes after about
>>> 1 second.
>>>
>>> - Meanwhile, "ipmitool fru write" continues to happily send additional
>>> bytes to ipmiStorageWriteFruData (as in step #3 above). Since its
>>> cache has just been cleared, it retrieves the current FRU from
>>> fru-device again. However, since that has not yet completed
>>> "rescanBusses", it actually received the original FRU again, not the
>>> modified version. The above cycle repeats, but this time the original
>>> FRU with additional modifications from the additional bytes.
>>>
>>> In the best case (if the new FRU data is larger than the original FRU
>>> data), the result is that the FRU did not get updated at all, since we
>>> effectively just wrote back the original FRU. However, if the new FRU
>>> data is smaller than the original FRU data, this may result in
>>> corrupted FRU data to be persisted.
>>>
>>>
>>> I was trying to figure out how to best fix that, but couldn't come up
>>> with a design that would still work. Some ideas:
>>>
>>> A)  I think what we're missing is feedback from fru-device to ipmid
>>> that the FRU write operation has actually completed and the FRU data
>>> was re-read. The ipmid should probably not accept any additional write
>>> requests until the previous write request has completed and the new
>>> FRU data is available.
>>>
>>> B) If ipmiStorageWriteFruData cannot detect the expected size of the
>>> FRU data, it relies on a timeout to eventually write the blob if no
>>> more data was received from IPMI. I wonder if this mechanism is "too
>>> clever" and we should simply always just wait for the timeout?
>>>
>>> C) ipmiStorageWriteFruData clears the cache immediately after WriteFru
>>> was called. Maybe we should keep that cache around until the timeout
>>> expires?
>>>
>>> Thoughts?
>>>
>>> Thanks
>>> Oskar.
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20240507/0c967844/attachment.htm>


More information about the openbmc mailing list