Proposal: configurable per-sensor error behavior in phosphor-hwmon
Matthew Barth
msbarth at linux.ibm.com
Tue Jul 16 00:45:32 AEST 2019
This is a great proposal, just a few concerns/notes.
On 7/12/19 5:27 PM, Kun Yi wrote:
> Hi there,
>
> Current phosphor-hwmon code is filled with preprocessor macros to branch
> error condition for sysfs reads, and it seems to me that adding a
> per-sensor configuration would solve two issues at least:
> 1. code can be greatly simplified
> 2. we can code more flexible sensor reading behavior
>
> Why 2) is needed: with many types of sensors that BMC controls, having
> an one-size-fits-all policy will always have cases that it can't handle.
> Each flaky sensor is flaky in its own way.
>
> Rough proposal on how this will work:
>
> add properties to each sensor group's configuration file:
>
> "error behavior": can be one of
> - always keep
> - remove from D-Bus on error
There is a REMOVERCS device config file option that can be configured to
remove an individual sensor or any sensor of the device when a given set
of return codes occur when attempting to read the sensor.
>
> "error condition": can be combination of
> - certain sysfs return codes
REMOVERCS combines this error condition to the behavior of removing the
sensor from Dbus. I'd be interested in how these types of
bahavior-to-conditions will be mapped within the device's config file.
> - timeout
In the case of phosphor-hwmon, isnt a timeout condition similar to error
retries since a timeout condition is presented as a ETIMEDOUT return
code on the sensor.
> - invalid value
This is another area I'd be interested to hear more on, how would one go
about defining when a value would be invalid? Or is this a simple,
negative values are invalid for a sensor that should always return a
positive value?
>
> "error retries": number of retries before declaring the sensor has an error
This would be great to have configurable per sensors, however a possible
issue here would be allowing too many retries causing hwmon to take too
long. So this should be capped or controlled in someway with the delay
between reads as well. Right now a sensor is allowed to be retried 10x's
with a 100ms delay between each attempt.
>
> Happy to hear any feedback.
>
> Regards,
> Kun
More information about the openbmc
mailing list