Notion of Flakiness in Sensor Readings

Brad Bishop bradleyb at fuzziesquirrel.com
Tue Oct 16 12:14:24 AEDT 2018



> On Oct 12, 2018, at 5:32 PM, Patrick Venture <venture at google.com> wrote:
> 
> On Fri, Oct 12, 2018 at 2:28 PM Matt Spinler
> <mspinler at linux.vnet.ibm.com> wrote:
>> 
>> 
>> 
>> On 10/12/2018 4:23 PM, Patrick Venture wrote:
>>> On Fri, Oct 12, 2018 at 2:09 PM James Feist <james.feist at linux.intel.com> wrote:
>>>> On 10/12/2018 01:53 PM, Patrick Venture wrote:
>>>>> Currently, there are a few approaches in phosphor-hwmon on how to
>>>>> handle sensors being present sometimes, or flaky on reads.  We've
>>>>> talked in the past about how to handle a read failure that should be
>>>>> ignored.  A real failure that is flaky or temporary.
>>>>> 
>>>>> I was thinking of just adding a property to the Sensor.Value interface
>>>>> to report information.  Basically, someone reading the value needs to
>>>>> know if it's valid.  The idea of a range check for validity doesn't
>>>>> work as cleanly for this specific purpose, in my opinion.  The min/max
>>>>> can be used cleanly to determine the linearization though!
>>>>> 
>>>>> I was thinking perhaps a boolean, that you can check to see if the
>>>>> value should be used or trusted.  Then, I thought, perhaps, a enum
>>>>> with a series of states, starting with the states of "Valid',
>>>>> "Invalid."  Not being able to think of a third state, I fell back onto
>>>>> a boolean.
>> 
>> If the hwmon driver for your device provides a _fault attribute, and
>> that gets set, the code is already
>> set up to create an OperationalStatus interface on that sensor instance
>> and set the Functional
>> property to false.
> 
> I'd like to drop this such that the failure behavior can be returning
> a NaN double.

One benefit I see here is that everything is in the same DBus interface
(the same property even), but with OperationalStatus you have two and
thus more DBus calls, if you are polling?  Is that it or are there others?

> https://github.com/openbmc/phosphor-hwmon/blob/master/hwmonio.cpp#L127
> 
> For the drivers that don't have the fault attribute,

It seems like OperationalStatus could be extended to emulate easily here.

> and in the case where phosphor-host-ipmid doesn't check for that kind of
> failure.

So is there no value in consistency of interface?  A lot of things
can be ‘not working’ on a BMC besides sensors - do you want to model that
in the DBus API differently for each domain (which seems like the direction
you are headed)?

Further this model falls apart if you ever want something besides _working_
and _not_working_ for possible states.  I realize YAGNI, but 1 - I have seen that
in the past, 2 - it seems trivial to implement, and 3 - the down-side of
OperationalStatus is not apparent to me.

> 
>> 
>>>> For Valid / Invalid we've produced nan on D-Bus to know that the sensor
>>>> is in an invalid state (like power off for tach values). Although I'm
>>>> not sure how this would be produced using the int64 interface, for the
>>>> double std::nan will go over d-bus correctly.
>>> So if i just wait until we're using double, the issue basically
>>> resolves itself? :D :D
>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Patrick


More information about the openbmc mailing list