Notion of Flakiness in Sensor Readings
Matthew Barth
msbarth at linux.ibm.com
Tue Oct 16 00:39:57 AEDT 2018
On 10/12/2018 04:32 PM, Patrick Venture wrote:
> On Fri, Oct 12, 2018 at 2:28 PM Matt Spinler
> <mspinler at linux.vnet.ibm.com> wrote:
>>
>>
>>
>> On 10/12/2018 4:23 PM, Patrick Venture wrote:
>>> On Fri, Oct 12, 2018 at 2:09 PM James Feist <james.feist at linux.intel.com> wrote:
>>>> On 10/12/2018 01:53 PM, Patrick Venture wrote:
>>>>> Currently, there are a few approaches in phosphor-hwmon on how to
>>>>> handle sensors being present sometimes, or flaky on reads. We've
>>>>> talked in the past about how to handle a read failure that should be
>>>>> ignored. A real failure that is flaky or temporary.
>>>>>
>>>>> I was thinking of just adding a property to the Sensor.Value interface
>>>>> to report information. Basically, someone reading the value needs to
>>>>> know if it's valid. The idea of a range check for validity doesn't
>>>>> work as cleanly for this specific purpose, in my opinion. The min/max
>>>>> can be used cleanly to determine the linearization though!
>>>>>
>>>>> I was thinking perhaps a boolean, that you can check to see if the
>>>>> value should be used or trusted. Then, I thought, perhaps, a enum
>>>>> with a series of states, starting with the states of "Valid',
>>>>> "Invalid." Not being able to think of a third state, I fell back onto
>>>>> a boolean.
>>
>> If the hwmon driver for your device provides a _fault attribute, and
>> that gets set, the code is already
>> set up to create an OperationalStatus interface on that sensor instance
>> and set the Functional
>> property to false.
>
> I'd like to drop this such that the failure behavior can be returning
> a NaN double.
> https://github.com/openbmc/phosphor-hwmon/blob/master/hwmonio.cpp#L127
>
> For the drivers that don't have the fault attribute, and in the case
> where phosphor-host-ipmid doesn't check for that kind of failure.
>
IMHO, we should handle this by removing the sensor from dbus instead of
using a special value type if we're unable to use the OperationalStatus.
In either case, the code utilizing this sensor would need to
specifically handle this NaN value or handle having the sensor removed
from dbus. It seems more generically, to me, to have hwmon remove the
sensor from dbus instead of using a NaN double so this can be used
across other sensor drivers that may not return a double type.
>>
>>>> For Valid / Invalid we've produced nan on D-Bus to know that the sensor
>>>> is in an invalid state (like power off for tach values). Although I'm
>>>> not sure how this would be produced using the int64 interface, for the
>>>> double std::nan will go over d-bus correctly.
>>> So if i just wait until we're using double, the issue basically
>>> resolves itself? :D :D
>>>
>>>>> Thoughts?
>>>>>
>>>>> Patrick
>>>>>
>>
>
More information about the openbmc
mailing list