Enhanced sensor monitor

Patrick Williams patrick at stwcx.xyz
Mon Oct 3 06:11:42 AEDT 2022


On Tue, Sep 27, 2022 at 05:44:03PM +0800, George Liu wrote:
> Hi, all:
>   I am working at Inspur and we're investigating a feature related to
> sensor monitoring.
> 
>   As far as I know, the OpenBMC community will only trigger LED alarms
> when the parsing FRU/VPD fails or is not in place. The lack of the
> function of triggering the corresponding Sensor fault light for the
> fault status (Warning/Critical) of the sensors, including
> threshold-type sensors and discrete-type sensors.
> 
>   For threshold-type sensors, this function has been implemented in
> the Intel warehouse [1], I think this should be a general function,
> and even many companies have implemented it downstream, so can we push
> this function upstream?
>   For discrete-type sensors, it is only implemented in the
> sensor.yaml[2] of the phosphor-ipmi-hostd, and we found that the
> present state is simply implemented. I think it is necessary for us to
> improve the discrete function and support all types and offsets.

Shouldn't this be reported as an Event of some sort and have an action
based of the Event?  I thought we already had the ability for
phosphor-logging errors to affect LEDs.

>   In addition: For the SEL function, the phosphor-sel-logger has
> implemented threshold-type sensor alarms and records SEL, and I hope
> to also integrate discrete-type functions, and be able to get all
> sensor information through `ipmitool sel elist`.
> 
>   So we currently have an idea, we hope to create a
> phosphor-sensor-monitor repository and implement the following
> functions:
>   1. Provide a PDI interface (eg:
> xyz.openbmc_project.Discrete.Sensor.Value) to record discrete states

I've previously written about "Discrete Sensors" here:
    https://lore.kernel.org/openbmc/YAl32I0oGFi5i7Cl@heinlein/

In my opinion a Dbus interface for "Discrete.Sensor" doesn't fit our
architecture.   As far as I can tell it is only relevant to IPMI and I don't
see any indication from Redfish of such a concept.  While modeling
everything as a "Discrete Sensor" might make the IPMI providers simpler,
it is an overall worse design.

>   2. Provide a way to monitor threshold sensor status -> trigger LED
> -> log SEL (the function of logging SEL has been implemented in
> phosphor-sel-logger, I hope the two repositories can be merged in the
> future)

I would definitely like to see a more converged event/error infrastructure.
The current "sel-logger" and similarly constructed Redfish message
structure is, as I've previously remarked, kind of a complex Rube
Goldberg machinery:
    https://lore.kernel.org/openbmc/YhY9lX6a8RDGcY2K@heinlein/

>   3. Provide a way to monitor discrete sensor status
>       a. If it is the data on the Host side, trigger the PDI interface
> through the ipmiStorageAddSEL method of phosphor-host-ipmid -> trigger
> LED -> record SEL
>       b. If it is the data on the BMC side (eg: PSU, OCC, etc.), it
> should inherit this PDI interface in the respective daemon, and the
> phosphor-sensor-monitor only needs to monitor the property value of
> the PDI interface -> trigger LED -> record SEL
>   4. Flexible JSON configuration file, ideally, when adding or
> deleting sensors, you do not need to change the code, just update the
> JSON

I'm having a bit of trouble visualizing all of this, especially
considering what I've said above about Discrete Sensors.  We certainly
have a spectrum of real-code vs JSON-as-code in various implementations,
but I think we're generally moving more away from JSON-as-code.  A
simple (Condition A -> Condition B) is probably acceptable but we should
not be coming up with another JSON-as-scripting-language.

>   There may be many situations here that we have not considered.
> Welcome to ask questions. If the current proposal is accepted, I will
> push a design document, thanks!
> 
> [1]: https://github.com/Intel-BMC/provingground/tree/master/callback-manager
> [2]: https://github.com/openbmc/phosphor-host-ipmid/blob/master/scripts/sensor-example.yaml
> 
> BRs
> George Liu

-- 
Patrick Williams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20221002/88473b55/attachment-0001.sig>


More information about the openbmc mailing list