[RFC] Special handlers for post-codes
Amithash Prasad
amithash at meta.com
Fri Jul 18 07:00:51 AEST 2025
>> I don't think this needs to be solved immediately but "which processor
>> type is installed in a socket" is not necessarily fixed. For example,
>> AMD socket SP5[1] supports both "Genoa" and "Bergamo" processor variants,
>> which could require different post code handling. There is little
>> reason why a system with an SP5 socket couldn't have a BMC that should
>> be able to handle both Genoa and Bergamo chips.
Agree with the general direction. We would need to accomplish a couple things before we get here.
1.
We would need EM or other runtime detection of CPU types.
2.
Also to extend your example, I would not be surprised if Genoa and Bergamo are more similar than different. So, having different handler JSONs would mean they might be more alike than different. But, we can do that optimization when we cross that bridge.
3.
I believe there are post codes defined by system software (BIOS) vendors in addition to CPU vendors. I would not be surprised if there are additional codes which are then OEM defined. This might need platform layer extensions because I would be surprised if there are universally consistent.
At the moment, I think we can go with machine owners packaging the configuration in their meta-layer while the nuances are developed for CPU type detection along with handling of SW/OEM bits.
>> Please add a name and/or description field.
>> I don't think we should do conversion to decimal just to make it JSON-native; optimize for humans and (especially) reviewers.
Ah yes, this will make it easy to review and the configuration more human readable. I will go ahead and push updates to change this (Add name, description fields and convert the primary/secondary to hex strings).
>> I'd like to see a jsonschema validation of whatever the config ends up being
Yes. This will really help catch a lot of things at compile time.
>> You should also consider how multi-host systems, like yosemite4, might be handled here.
+1. I was thinking of extending the code to use magic fields in the configuration for the service to insert the "host" index.
Is there a general common approach taken for this by other services? I see EM uses $ to indicate variables.
Example:
```
"arguments": {
"POWER_RAIL": "/path/to/host$HOST",
```
>> Why do we need to be able to trigger custom systemd services?
I was considering cases certain platforms could do some platform specific debug collection when they receive certain post-code.
Thanks,
Amithash
________________________________
From: Patrick Williams
Sent: Friday, May 30, 2025 6:17 AM
To: Amithash Prasad
Cc: LF/OpenBMC Mailing List; wangkuiying.wky at alibaba-inc.com; zhikui.ren at intel.com
Subject: Re: [RFC] Special handlers for post-codes
On Fri, May 30, 2025 at 02:02:20AM +0000, Amithash Prasad wrote:
> Hello,
>
> There are many occasions when a post code from a server actually means something is wrong — especially crucial if a boot failure occurs before the part of the system firmware capable of sending a SEL to the BMC is loaded. To support this, I am proposing enhancing phosphor-post-code-mfg to support configurable special handling of post codes.
Thanks, this looks like interesting work. I know some processors that
have magic postcodes that mean things like memory training has failed.
How do you anticipate these configurations are managed? I see 3
options:
1. People add them to their meta-layer for a particular machine
and/or processor.
2. The configuration files are part of phosphor-post-code-manager
(and enabled via CompatibleHardware matching from entity-manager?).
3. The configuration is part of the entity-manager config instead.
My initial impression is that we have two different kinds of configs:
- Configuration that is entirely processor dependent; any system
using a particular processor version will have the same postcode
handling.
- Configuration that is vendor / BIOS / machine specific.
For configuration that is processor dependent, install option (1) does
not seem like a good direction, since it means we're going to be
duplicating this work. I would lean towards option (2) here, but you
probably need a method to load multiple configs: "processor.json" and
"system.json".
I don't think this needs to be solved immediately but "which processor
type is installed in a socket" is not necessarily fixed. For example,
AMD socket SP5[1] supports both "Genoa" and "Bergamo" processor variants,
which could require different post code handling. There is little
reason why a system with an SP5 socket couldn't have a BMC that should
be able to handle both Genoa and Bergamo chips.
>
> Example configuration:
> [
> {
Please add a name and/or description field.
> "primary": [123],
> "secondary": [234, 123],
This is a bit awkward to me; you should probably look at what
entity-manager does. People tend to think of postcodes as hex and not
decimal. I don't think we should do conversion to decimal just to make
it JSON-native; optimize for humans and (especially) reviewers.
> "targets": ["my_special.service"]
Why do we need to be able to trigger custom systemd services? This
isn't clear. To me, this starts to cause the configs to be system
specific, which is far less ideal.
I'd rather see some well-defined "actions" like "catastrophic failure
that requires a server reboot".
You should also consider how multi-host systems, like yosemite4, might
be handled here. We will have multiple instances of phosphor-post-code-manager
running, one for each host. If you do have systemd targets, they have
to be templated.
> },
> {
> "primary": [999],
> "targets": ["power_failure.service"],
> "event": {
> "name": "xyz.openbmc_project.State.Power.PowerRailFault",
> "arguments": {
> "POWER_RAIL": "MyDevice",
> "FAILURE_DATA": ""
> }
> }
> }
> ]
>
I'd like to see a jsonschema validation of whatever the config ends up
being. We do that in at least entity-manager and sdbusplus if you need
examples (EM uses JSON for the schema, sdbusplus uses YAML).
> I would love to get feedback before I continue down this path.
>
>
> Thanks,
>
> Amithash
[1]: https://en.wikipedia.org/wiki/Socket_SP5
--
Patrick Williams
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20250717/b51bf0c7/attachment.htm>
More information about the openbmc
mailing list