Proposal: configurable per-sensor error behavior in phosphor-hwmon

Matthew Barth msbarth at linux.ibm.com
Tue Jul 16 00:45:32 AEST 2019


This is a great proposal, just a few concerns/notes.

On 7/12/19 5:27 PM, Kun Yi wrote:
> Hi there,
> 
> Current phosphor-hwmon code is filled with preprocessor macros to branch 
> error condition for sysfs reads, and it seems to me that adding a 
> per-sensor configuration would solve two issues at least:
> 1. code can be greatly simplified
> 2. we can code more flexible sensor reading behavior
> 
> Why 2) is needed: with many types of sensors that BMC controls, having 
> an one-size-fits-all policy will always have cases that it can't handle. 
> Each flaky sensor is flaky in its own way.
> 
> Rough proposal on how this will work:
> 
> add properties to each sensor group's configuration file:
> 
> "error behavior": can be one of
> - always keep
> - remove from D-Bus on error
There is a REMOVERCS device config file option that can be configured to 
remove an individual sensor or any sensor of the device when a given set 
of return codes occur when attempting to read the sensor.
> 
> "error condition":  can be combination of
> - certain sysfs return codes
REMOVERCS combines this error condition to the behavior of removing the 
sensor from Dbus. I'd be interested in how these types of 
bahavior-to-conditions will be mapped within the device's config file.

> - timeout
In the case of phosphor-hwmon, isnt a timeout condition similar to error 
retries since a timeout condition is presented as a ETIMEDOUT return 
code on the sensor.
> - invalid value
This is another area I'd be interested to hear more on, how would one go 
about defining when a value would be invalid? Or is this a simple, 
negative values are invalid for a sensor that should always return a 
positive value?
> 
> "error retries": number of retries before declaring the sensor has an error
This would be great to have configurable per sensors, however a possible 
issue here would be allowing too many retries causing hwmon to take too 
long. So this should be capped or controlled in someway with the delay 
between reads as well. Right now a sensor is allowed to be retried 10x's 
with a 100ms delay between each attempt.
> 
> Happy to hear any feedback.
> 
> Regards,
> Kun


More information about the openbmc mailing list