phosphor-hwmon bottleneck potential

Patrick Venture venture at google.com
Wed May 10 11:46:33 AEST 2017


Just FYI, I've been working on the fan driver to improve its speed.  The
driver can be sped up a bit to provide an answer sooner.  Interestingly,
and this may need some level of accounting even in your systems, I'm seeing
erroneous values periodically.  So, 10 readings of fan A in a row and
1/10th will be something super low.  So any controller you implement should
consider that the data may be invalid,.., while the fan is still present
and functioning.  I plan to check and have a super tight loop if necessary
if I'm seeing bad numbers, and if two in a row, maybe flag the fan as bad
or something along those lines.

To avoid read conflicts or issues, i'll be using phosphor-hwmon to read,
however, to avoid "waiting" I'll be listening for the property updates in a
background thread.  -- and having phosphor-hwmon not wait 1 second.  The
algorithm I'm using wasn't written by me but rather by someone who
understands thermal stuff, and it's written to be a very tight loop on
controlling the fans.

i haven't tried to make the wait time in phosphor-hwmon configurable yet.
It's just not been a priority.  But I'd rather not maintain a hacked
version.  So, hopefully sometime in a couple weeks I can submit it for
review.

Since phosphor-hwmon doesn't allow writing non-fan speeds and I don't want
to write a conversion calibration thing, I'll be writing directly to sysfs
as pwm.  The algorithm I"m using outputs in pwm, so it's a bit of a
shortest-path.  However, I could include as a configuration option in the
fan control itself whether to update over dbus or sysfs and so on, such
that it could be handled either way.  I don't want the configuration to be
too cumbersome though.

i think one of the biggest steps away from something we can easily share
will be the OEM commands I need. -- more or less.  I could put them in a
separate daemon and maybe I will later (once it's all working) and then it
is more shareable.  The host needs to be able to send down thermal data
that the BMC can't read as well as take over control from the algorithm.  I
mentioned the manual control, and y'all indicated it was something you
might require.  But being it's outside the scope of IPMI's preset of
commands, can OEM commands be shared?

I'm still designing my controller to use multiple threads, although no
longer to read, because yeah, it has one status register ... but, one
background thread can maintain the latest fan speed updates received over
dbus.  Because phosphor-hwmon only responds to dbus messages once per loop,
I need to make my information retrieval asynchronous -- hence the
background thread listening for updates.  My loop(s) will just assume the
information they are reading is fresh.

One thing I'm still working on designing is how to cleanly handle chassis
temperature differences.  We have requirements about the difference in
temperature between incoming and exiting air.  All the PID loops will run
and compute their goal PWM and basically maximize that and then check
against the specially listed thermal margin, and then try to drive the fans
there...  ...  I'm not convinced I like it yet.

I hope to have something mostly done these next two weeks.  It isn't built
from the fan-presence, but I'd like to show you guys what it looks like if
it's at that stage and see if I can save you guys time even though maybe it
can't all be used -- or maybe we can come up with ways of splitting it
apart cleanly such that it's a matter of configuration or providing X, Y,
or Z.

Regards,
Patrick

On Fri, May 5, 2017 at 1:35 PM, Patrick Williams <patrick at stwcx.xyz> wrote:

> On Fri, May 05, 2017 at 11:37:06AM -0700, Nancy Yuen wrote:
> > 1. General design issue if time between reads of a sensor is dependent on
> > the number of sensors in the system.
>
> I don't see any general design issue with phosphor-hwmon or dbus being
> talked about here.  There is one specific driver that has a pretty large
> scaling factor no matter how you read sensor values out of it.  We've
> talked elsewhere in the thread about a potential driver change to
> improve it.  Most hwmon drivers do not have any issue.
>
> > And drivers can fail, due to miss behaving code or hardware.
>
> hwmon drivers should never block userspace forever.  If they do that is
> a serious driver bug.  We could be defensive against it by enhancing
> phosphor-hwmon to use non-blocking IO, assuming the kernel supports it,
> but that seems like a lot of code for a non-existent problem.
>
> Hardware failures are already handled by the hwmon drivers, reported
> back to phosphor-hwmon as errnos on read, and dealt with.
>
> > In this design, sensor report could be
> > significantly delayed if one sensor/driver were bad or misbehaving.
>
> You have no difference in the problem of one sensor reading not working
> in any of these three potential designs:
>     1. One big loop that reads each hwmon sysfs entry for the whole
>        system in sequence.
>     2. One big program with N threads that, with a stampeding herd,
>        attempt to read all hwmon sysfs at the same instant.
>     3. M processes with N hwmon sysfs reads in sequence.
>
> In any of these designs, if the driver delays your read for 8 seconds
> your data is delayed and stale.  I think our expectation is that a
> fan-control algorithm is using dbus signals to keep track of the most
> recent value and if it doesn't have up-to-date data by the time it wants
> to make a decision it would deal with it in whatever way it sees fit.
> Likely, either using old data or treating that sensor as in-error.
>
> If you chose design #2 and then expanded on it by adding a thermal
> control loop in yet another thread, you'd still have the exact same
> problems to deal with.  It just is now all in one process using shm to
> communicate the cached sensor values instead of using dbus between
> processes.
>
> --
> Patrick Williams
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20170509/ba61bafd/attachment-0001.html>


More information about the openbmc mailing list