Dealing with a sensor which doesn't have valid reading until host is powered up
Alex Qiu
xqiu at google.com
Tue Sep 1 09:58:08 AEST 2020
Hi Guenter,
FYI, the boot_asio call freezes on EAGAIN, even the driver later recovers
to a normal state, which can be verified by reading the hwmon file directly.
I'm switching to ENODATA and ENOMSG. Thanks!
- Alex Qiu
On Mon, Aug 31, 2020 at 4:54 PM Guenter Roeck <groeck at google.com> wrote:
> On Mon, Aug 31, 2020 at 3:09 PM Alex Qiu <xqiu at google.com> wrote:
> >
> > Hi James,
> >
> > I just came through this doc (
> https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/overview/posix/stream_descriptor.html).
> Looks like that it's a terrible idea for hwmon driver to return EAGAIN for
> dbus-sensors. With that, I think the proper fix is also to use other errno
> instead in our driver, and this caveat should be probably documented
> somewhere.
> >
> > Hi Guenter,
> >
> > Is it reasonable for hwmon drivers to return EAGAIN? Is it something
> that has special meaning and should be avoided in hwmon drivers?
> >
>
> Not sure how to relate the link above with -EAGAIN, but ... -EAGAIN
> might trigger userspace to try again immediately, which would
> potentially be quite bad. We had seen that effect at a previous
> company, where it ended up overwhelming userspace. So I am not
> entirely in favor of it. How about -ENODATA ? that might make more
> sense unless the problem is known to be a short term glitch.
>
> Thanks,
> Guenter
>
> > Thank you!
> >
> > - Alex Qiu
> >
> >
> > On Mon, Aug 31, 2020 at 2:32 PM Alex Qiu <xqiu at google.com> wrote:
> >>
> >> Hi James,
> >>
> >> I think BiosPist power state might not suffice, because the host needs
> to load firmware onto the device in order to enable the sensors at a
> certain stage in the OS boot, which is very close to boot completion.
> >>
> >> However, we can tolerate the fan being noisy before boot completion,
> and I believe the root cause the issue is the HwmonTempSensor freezes once
> the control flow hitting boost::asio::async_read_until (
> https://github.com/openbmc/dbus-sensors/blob/master/src/HwmonTempSensor.cpp#L92).
> Do you know if this function has something special to do with a file that
> can have errno EAGAIN? Based on that, replacing the errno in the driver
> with sth other than EAGAIN also seems to be a viable fix.
> >>
> >> Thanks!
> >>
> >> - Alex Qiu
> >>
> >>
> >>
> >> - Alex Qiu
> >>
> >>
> >> On Fri, Aug 28, 2020 at 10:54 AM James Feist <
> james.feist at linux.intel.com> wrote:
> >>>
> >>> On 8/28/2020 9:43 AM, Alex Qiu wrote:
> >>> > Hi James,
> >>> >
> >>> > Thx for the reply! So right now, one thing is that the sensor is not
> >>> > dependent on the power state of the host solely, but also dependent
> on
> >>> > the boot progress of the host.
> >>>
> >>> Would the BiosPost power state not suffice?
> >>>
> >>> > And the more serious issue is that
> >>> > returning EAGAIN from the driver freezes the sensor, which is what
> I'm
> >>> > debugging right now. Do we have special treatment on errno returned
> by
> >>> > the driver? Thx.
> >>>
> >>> I ran into a similar issue with the CPUSensor and this was my fix:
> >>>
> https://github.com/openbmc/dbus-sensors/commit/c22b842bfa8cfe798d83f99fa7aa9f142278c21d#diff-ccbe0562fe1d501b4c1c42d967a02ea0
> >>>
> >>> I haven't hit this issue with hwmon sensor though.
> >>>
> >>> >
> >>> > - Alex Qiu
> >>> >
> >>> >
> >>> > On Fri, Aug 28, 2020 at 9:38 AM James Feist <
> james.feist at linux.intel.com
> >>> > <mailto:james.feist at linux.intel.com>> wrote:
> >>> >
> >>> > On 8/27/2020 2:49 PM, Alex Qiu wrote:
> >>> > > Hi James,
> >>> > >
> >>> > > After some debugging, I realized that the code I pointed out
> earlier
> >>> > > wasn't the root cause. Update is that, the HwmonTempSensor
> stops
> >>> > > updating after the hwmon driver returns EAGAIN as errno. I'll
> keep
> >>> > > debugging...
> >>> > >
> >>> > > - Alex Qiu
> >>> > >
> >>> > >
> >>> > > On Tue, Aug 25, 2020 at 5:49 PM Alex Qiu <xqiu at google.com
> >>> > <mailto:xqiu at google.com>
> >>> > > <mailto:xqiu at google.com <mailto:xqiu at google.com>>> wrote:
> >>> > >
> >>> > > Hi James and OpenBMC community,
> >>> > >
> >>> > > We have a sensor for HwmonTempSensor which doesn't have a
> valid
> >>> > > reading until the host is fully booted. Before it's
> becoming
> >>> > alive
> >>> > > and useful, it's getting disabled in code
> >>> > >
> >>> > (
> https://github.com/openbmc/dbus-sensors/blob/master/include/sensor.hpp#L266
> )
> >>> > > because of errors thrown up by the hwmon driver. Ideally,
> the
> >>> > > thermal control loop should kick the fan to fail safe mode
> >>> > until no
> >>> > > more errors are observed.
> >>> > >
> >>> > > Any suggestions on how we should handle this kind of
> sensor
> >>> > properly?
> >>> >
> >>> > For what its worth we use the PowerState property that has
> options of
> >>> > power on or BiosPost to disable scanning when the state is
> invalid:
> >>> >
> https://github.com/openbmc/dbus-sensors/blob/f27a55c775383a3fb1ac655f3eda785f6845f214/src/HwmonTempMain.cpp#L208
> >>> >
> >>> >
> >>> > >
> >>> > > Thank you!
> >>> > >
> >>> > > - Alex Qiu
> >>> > >
> >>> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20200831/230f46c1/attachment.htm>
More information about the openbmc
mailing list