Proposal: configurable per-sensor error behavior in phosphor-hwmon
Matt Spinler
mspinler at linux.ibm.com
Tue Jul 16 02:10:17 AEST 2019
On 7/15/2019 9:45 AM, Matthew Barth wrote:
> This is a great proposal, just a few concerns/notes.
>
> On 7/12/19 5:27 PM, Kun Yi wrote:
>> Hi there,
>>
>> Current phosphor-hwmon code is filled with preprocessor macros to
>> branch error condition for sysfs reads, and it seems to me that
>> adding a per-sensor configuration would solve two issues at least:
>> 1. code can be greatly simplified
>> 2. we can code more flexible sensor reading behavior
>>
>> Why 2) is needed: with many types of sensors that BMC controls,
>> having an one-size-fits-all policy will always have cases that it
>> can't handle. Each flaky sensor is flaky in its own way.
>>
>> Rough proposal on how this will work:
>>
>> add properties to each sensor group's configuration file:
>>
>> "error behavior": can be one of
>> - always keep
>> - remove from D-Bus on error
> There is a REMOVERCS device config file option that can be configured
> to remove an individual sensor or any sensor of the device when a
> given set of return codes occur when attempting to read the sensor.
>>
>> "error condition": can be combination of
>> - certain sysfs return codes
> REMOVERCS combines this error condition to the behavior of removing
> the sensor from Dbus. I'd be interested in how these types of
> bahavior-to-conditions will be mapped within the device's config file.
>
>> - timeout
> In the case of phosphor-hwmon, isnt a timeout condition similar to
> error retries since a timeout condition is presented as a ETIMEDOUT
> return code on the sensor.
>> - invalid value
> This is another area I'd be interested to hear more on, how would one
> go about defining when a value would be invalid? Or is this a simple,
> negative values are invalid for a sensor that should always return a
> positive value?
>>
>> "error retries": number of retries before declaring the sensor has an
>> error
> This would be great to have configurable per sensors, however a
> possible issue here would be allowing too many retries causing hwmon
> to take too long. So this should be capped or controlled in someway
> with the delay between reads as well. Right now a sensor is allowed to
> be retried 10x's with a 100ms delay between each attempt.
>>
>> Happy to hear any feedback.
>>
I like this idea. Hopefully the defaults can be kept the same as they
are today so users of
today's default settings wouldn't have to change their config files to
keep things the same.
Another thing we've been thinking about adding is the ability for
sensors to only be
read when the power is on, though there's still some invention
required as to how
to represent this state on D-Bus.
>> Regards,
>> Kun
>
More information about the openbmc
mailing list