Dealing with a sensor which doesn't have valid reading until host is powered up

Guenter Roeck groeck at google.com
Tue Sep 1 09:54:27 AEST 2020


On Mon, Aug 31, 2020 at 3:09 PM Alex Qiu <xqiu at google.com> wrote:
>
> Hi James,
>
> I just came through this doc (https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html). Looks like that it's a terrible idea for hwmon driver to return EAGAIN for dbus-sensors. With that, I think the proper fix is also to use other errno instead in our driver, and this caveat should be probably documented somewhere.
>
> Hi Guenter,
>
> Is it reasonable for hwmon drivers to return EAGAIN? Is it something that has special meaning and should be avoided in hwmon drivers?
>

Not sure how to relate the link above with -EAGAIN, but ... -EAGAIN
might trigger userspace to try again immediately, which would
potentially be quite bad. We had seen that effect at a previous
company, where it ended up overwhelming userspace. So I am not
entirely in favor of it. How about -ENODATA ? that might make more
sense unless the problem is known to be a short term glitch.

Thanks,
Guenter

> Thank you!
>
> - Alex Qiu
>
>
> On Mon, Aug 31, 2020 at 2:32 PM Alex Qiu <xqiu at google.com> wrote:
>>
>> Hi James,
>>
>> I think BiosPist power state might not suffice, because the host needs to load firmware onto the device in order to enable the sensors at a certain stage in the OS boot, which is very close to boot completion.
>>
>> However, we can tolerate the fan being noisy before boot completion, and I believe the root cause the issue is the HwmonTempSensor freezes once the control flow hitting boost::asio::async_read_until (https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempSensor.cpp#L92). Do you know if this function has something special to do with a file that can have errno EAGAIN? Based on that, replacing the errno in the driver with sth other than EAGAIN also seems to be a viable fix.
>>
>> Thanks!
>>
>> - Alex Qiu
>>
>>
>>
>> - Alex Qiu
>>
>>
>> On Fri, Aug 28, 2020 at 10:54 AM James Feist <james.feist at linux.intel.com> wrote:
>>>
>>> On 8/28/2020 9:43 AM, Alex Qiu wrote:
>>> > Hi James,
>>> >
>>> > Thx for the reply! So right now, one thing is that the sensor is not
>>> > dependent on the power state of the host solely, but also dependent on
>>> > the boot progress of the host.
>>>
>>> Would the BiosPost power state not suffice?
>>>
>>> > And the more serious issue is that
>>> > returning EAGAIN from the driver freezes the sensor, which is what I'm
>>> > debugging right now. Do we have special treatment on errno returned by
>>> > the driver? Thx.
>>>
>>> I ran into a similar issue with the CPUSensor and this was my fix:
>>> https://github.com/openbmc/dbus-sensors/commit/c22b842bfa8cfe798d83f99fa7aa9f142278c21d#diff-ccbe0562fe1d501b4c1c42d967a02ea0
>>>
>>> I haven't hit this issue with hwmon sensor though.
>>>
>>> >
>>> > - Alex Qiu
>>> >
>>> >
>>> > On Fri, Aug 28, 2020 at 9:38 AM James Feist <james.feist at linux.intel.com
>>> > <mailto:james.feist at linux.intel.com>> wrote:
>>> >
>>> >     On 8/27/2020 2:49 PM, Alex Qiu wrote:
>>> >      > Hi James,
>>> >      >
>>> >      > After some debugging, I realized that the code I pointed out earlier
>>> >      > wasn't the root cause. Update is that, the HwmonTempSensor stops
>>> >      > updating after the hwmon driver returns EAGAIN as errno. I'll keep
>>> >      > debugging...
>>> >      >
>>> >      > - Alex Qiu
>>> >      >
>>> >      >
>>> >      > On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu at google.com
>>> >     <mailto:xqiu at google.com>
>>> >      > <mailto:xqiu at google.com <mailto:xqiu at google.com>>> wrote:
>>> >      >
>>> >      >     Hi James and OpenBMC community,
>>> >      >
>>> >      >     We have a sensor for HwmonTempSensor which doesn't have a valid
>>> >      >     reading until the host is fully booted. Before it's becoming
>>> >     alive
>>> >      >     and useful, it's getting disabled in code
>>> >      >
>>> >       (https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266)
>>> >      >     because of errors thrown up by the hwmon driver. Ideally, the
>>> >      >     thermal control loop should kick the fan to fail safe mode
>>> >     until no
>>> >      >     more errors are observed.
>>> >      >
>>> >      >     Any suggestions on how we should handle this kind of sensor
>>> >     properly?
>>> >
>>> >     For what its worth we use the PowerState property that has options of
>>> >     power on or BiosPost to disable scanning when the state is invalid:
>>> >     https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208
>>> >
>>> >
>>> >      >
>>> >      >     Thank you!
>>> >      >
>>> >      >     - Alex Qiu
>>> >      >
>>> >


More information about the openbmc mailing list