[PATCH linux dev-4.10] Revert "drivers/hwmon/occ: Add temperature fault attribute and VRM temp alarm"

Brad Bishop bradleyb at fuzziesquirrel.com
Thu Oct 5 13:01:36 AEDT 2017


> On Oct 4, 2017, at 11:04 AM, Eddie James <eajames at linux.vnet.ibm.com> wrote:
> 
> 
> 
> On 10/03/2017 11:38 PM, Joel Stanley wrote:
>> On Wed, Oct 4, 2017 at 4:44 AM, Eddie James <eajames at linux.vnet.ibm.com> wrote:
>>> From: "Edward A. James" <eajames at us.ibm.com>
>>> 
>>> This reverts commit e55423ee10a5057338d24383c00e813436a126ea.
>>> 
>>> Apologies for pushing this up so early... Userspace applications aren't
>>> ready for this change. The hwmon polling application cannot accept EGAIN
>>> yet, and we can't be returning apparent errors if the sensor is
>>> temporarily unavailable.
>> Can you go into some more detail here please.
>> 
>> It looks like these changes are within the hwmon ABI, so userspace
>> that supports hwmon devices shouldn't break.
> 
> Sure. With this change, the driver returns -EAGAIN when reading temperature sensors that show 0 temperature. Temperature 0 is how the OCC indicates that the sensor is not available at the moment. So, we should retry in a bit. But the hwmon application currently sees any negative errno as an error,

Actually the userspace retries for a configurable period on eio, etimedout, 
ebadmsg, eagain, and exnio.  It also exits cleanly on enoent.

If any of these errnos don’t go away after the configured number of retry
attempts, an error is logged.  Any other errnos and an error is logged
immediately. If this list needs improvement I’d love to hear about it.

ebadmsg and enxio are observed when i2c devices are unplugged with transfers
in various stages of flight.  They occur just before the driver is unbound
after the presence gpio toggle on these devices is noticed and processed
(killing the hwmon userspace daemon cleanly).

eio, etimedout seem to be bus type errors that appear somewhat infrequently,
but occur nevertheless.

we all know what eagain means…

It isn’t so much that the hwmon userspace doesn’t support this change,
its just the resulting behavior difference at the other end of the
hwmon abi <-> dbus api translation is not optimal, right now.

1 - Without this change, the hwmon userspace reads a value of 0 out of
the sysfs attribute and happily reports that as the temp at the DBus level.

With this change, the hwmon userspace would read the attribute, get an
error and retry for a bit.  After a number of retries, the error is
logged and the sensor is marked faulted at a dbus level.

The issue is the consumer of the sensor at the dbus level.  Today, the
dbus consumer happens to translate the sensor value of zero into the
desired behavior.  If we submit this change the sensor will be put
into faulted state and the consuming application doesn’t have support
for that yet.

We have open issues to enhance the application to support faulted sensors,
its just not implemented yet.

I hope this helps a decision to be made.

-brad


> and will not retry on -EAGAIN. So now we get errors when we don't want them (for now), basically.
> 
> Thanks,
> Eddie


More information about the openbmc mailing list