Notion of Flakiness in Sensor Readings

Patrick Venture venture at google.com
Tue Oct 16 13:15:33 AEDT 2018


On Mon, Oct 15, 2018 at 6:14 PM Brad Bishop <bradleyb at fuzziesquirrel.com> wrote:
>
>
>
> > On Oct 12, 2018, at 5:32 PM, Patrick Venture <venture at google.com> wrote:
> >
> > On Fri, Oct 12, 2018 at 2:28 PM Matt Spinler
> > <mspinler at linux.vnet.ibm.com> wrote:
> >>
> >>
> >>
> >> On 10/12/2018 4:23 PM, Patrick Venture wrote:
> >>> On Fri, Oct 12, 2018 at 2:09 PM James Feist <james.feist at linux.intel.com> wrote:
> >>>> On 10/12/2018 01:53 PM, Patrick Venture wrote:
> >>>>> Currently, there are a few approaches in phosphor-hwmon on how to
> >>>>> handle sensors being present sometimes, or flaky on reads.  We've
> >>>>> talked in the past about how to handle a read failure that should be
> >>>>> ignored.  A real failure that is flaky or temporary.
> >>>>>
> >>>>> I was thinking of just adding a property to the Sensor.Value interface
> >>>>> to report information.  Basically, someone reading the value needs to
> >>>>> know if it's valid.  The idea of a range check for validity doesn't
> >>>>> work as cleanly for this specific purpose, in my opinion.  The min/max
> >>>>> can be used cleanly to determine the linearization though!
> >>>>>
> >>>>> I was thinking perhaps a boolean, that you can check to see if the
> >>>>> value should be used or trusted.  Then, I thought, perhaps, a enum
> >>>>> with a series of states, starting with the states of "Valid',
> >>>>> "Invalid."  Not being able to think of a third state, I fell back onto
> >>>>> a boolean.
> >>
> >> If the hwmon driver for your device provides a _fault attribute, and
> >> that gets set, the code is already
> >> set up to create an OperationalStatus interface on that sensor instance
> >> and set the Functional
> >> property to false.
> >
> > I'd like to drop this such that the failure behavior can be returning
> > a NaN double.
>
> One benefit I see here is that everything is in the same DBus interface
> (the same property even), but with OperationalStatus you have two and
> thus more DBus calls, if you are polling?  Is that it or are there others?
>
> > https://github.com/openbmc/phosphor-hwmon/blob/master/hwmonio.cpp#L127
> >
> > For the drivers that don't have the fault attribute,
>
> It seems like OperationalStatus could be extended to emulate easily here.
>
> > and in the case where phosphor-host-ipmid doesn't check for that kind of
> > failure.
>
> So is there no value in consistency of interface?  A lot of things
> can be ‘not working’ on a BMC besides sensors - do you want to model that
> in the DBus API differently for each domain (which seems like the direction
> you are headed)?
>
> Further this model falls apart if you ever want something besides _working_
> and _not_working_ for possible states.  I realize YAGNI, but 1 - I have seen that
> in the past, 2 - it seems trivial to implement, and 3 - the down-side of
> OperationalStatus is not apparent to me.

I have no issue with OperationalStatus for this -- I believe Matt's
comment was that it's available gratis if the driver supports it.
Since this driver may not (doesn't, but i haven't checked recently), I
haven't dug back into that use case.

The idea of a sensor falling off and being added to the dbus every
other time the sensor needs to be read falls over for at least one key
reason: 1) You can't read the sensor on demand anymore.

I brought all this up because I want to get away from -errno patches,
and I want the error path to follow the natural state of things.
Reading a dbus double and seeing NaN() may be enough of an error, and
you can check OperationalStatus or other interfaces for more
information if desired.

>
> >
> >>
> >>>> For Valid / Invalid we've produced nan on D-Bus to know that the sensor
> >>>> is in an invalid state (like power off for tach values). Although I'm
> >>>> not sure how this would be produced using the int64 interface, for the
> >>>> double std::nan will go over d-bus correctly.
> >>> So if i just wait until we're using double, the issue basically
> >>> resolves itself? :D :D
> >>>
> >>>>> Thoughts?
> >>>>>
> >>>>> Patrick


More information about the openbmc mailing list