[RFC] Special handlers for post-codes
Patrick Williams
patrick at stwcx.xyz
Fri May 30 23:17:44 AEST 2025
On Fri, May 30, 2025 at 02:02:20AM +0000, Amithash Prasad wrote:
> Hello,
>
> There are many occasions when a post code from a server actually means something is wrong — especially crucial if a boot failure occurs before the part of the system firmware capable of sending a SEL to the BMC is loaded. To support this, I am proposing enhancing phosphor-post-code-mfg to support configurable special handling of post codes.
Thanks, this looks like interesting work. I know some processors that
have magic postcodes that mean things like memory training has failed.
How do you anticipate these configurations are managed? I see 3
options:
1. People add them to their meta-layer for a particular machine
and/or processor.
2. The configuration files are part of phosphor-post-code-manager
(and enabled via CompatibleHardware matching from entity-manager?).
3. The configuration is part of the entity-manager config instead.
My initial impression is that we have two different kinds of configs:
- Configuration that is entirely processor dependent; any system
using a particular processor version will have the same postcode
handling.
- Configuration that is vendor / BIOS / machine specific.
For configuration that is processor dependent, install option (1) does
not seem like a good direction, since it means we're going to be
duplicating this work. I would lean towards option (2) here, but you
probably need a method to load multiple configs: "processor.json" and
"system.json".
I don't think this needs to be solved immediately but "which processor
type is installed in a socket" is not necessarily fixed. For example,
AMD socket SP5[1] supports both "Genoa" and "Bergamo" processor variants,
which could require different post code handling. There is little
reason why a system with an SP5 socket couldn't have a BMC that should
be able to handle both Genoa and Bergamo chips.
>
> Example configuration:
> [
> {
Please add a name and/or description field.
> "primary": [123],
> "secondary": [234, 123],
This is a bit awkward to me; you should probably look at what
entity-manager does. People tend to think of postcodes as hex and not
decimal. I don't think we should do conversion to decimal just to make
it JSON-native; optimize for humans and (especially) reviewers.
> "targets": ["my_special.service"]
Why do we need to be able to trigger custom systemd services? This
isn't clear. To me, this starts to cause the configs to be system
specific, which is far less ideal.
I'd rather see some well-defined "actions" like "catastrophic failure
that requires a server reboot".
You should also consider how multi-host systems, like yosemite4, might
be handled here. We will have multiple instances of phosphor-post-code-manager
running, one for each host. If you do have systemd targets, they have
to be templated.
> },
> {
> "primary": [999],
> "targets": ["power_failure.service"],
> "event": {
> "name": "xyz.openbmc_project.State.Power.PowerRailFault",
> "arguments": {
> "POWER_RAIL": "MyDevice",
> "FAILURE_DATA": ""
> }
> }
> }
> ]
>
I'd like to see a jsonschema validation of whatever the config ends up
being. We do that in at least entity-manager and sdbusplus if you need
examples (EM uses JSON for the schema, sdbusplus uses YAML).
> I would love to get feedback before I continue down this path.
>
>
> Thanks,
>
> Amithash
[1]: https://en.wikipedia.org/wiki/Socket_SP5
--
Patrick Williams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20250530/72562d9b/attachment.sig>
More information about the openbmc
mailing list