thermald for OpenBMC

Patrick Venture venture at google.com
Tue Apr 18 13:20:34 AEST 2017


Patrick,

>> I'm working on a thermal control loop that'll operate within the openbmc
>> framework(s) and wanted to provide a somewhat high level overview for
>> thoughts.

> We should connect you with Matt Spinler (mspinler) and Matt Barth
> (msbarth) on IRC.  They have been working on implementing the "IBM fan
> control algorithm" but I suspect there is a significant amount of
> overlap.  Our intention was that you'd be able to reuse our
> implementation and insert a different (low-level detailed) algorithm.

Definitely.  I know there's a Google algorithm we use for thermal control
that's based on proportional–integral–derivative.  I'll ping them on IRC to
get a peek at their design, roadmap and timeline.  It's also possible
because of our specific data center requirements based on configurations,
it may be more work to plug-in a different low-level algorithm.  But
without seeing the design, it's impossible to say.

>> The general design is to have a daemon that reads fans and temperatures
>> from dbus (reaching out to phosphor-hwmon) as well as being able to
receive
>> temperatures and other sensor information over an OEM IPMI command.

> Sounds good.  This is how it is suppose to work.

Good.  Yeah.  I'll end up running some performance experiments to make sure
things are handled quickly enough going through dbus for everything, but
I'm sure it will be reasonably quick.

> For the IPMI commands, the expectation would be that either the IPMI
> provider or an application fed by the IPMI provider for these OEM
> commands would implement the same xyz.openbmc_project.Sensor.Value
> interface as the phosphor-hwmon.  This way the thermal algorithm really
> doesn't need to know where the data comes from.

Right.  I just need to verify the exact design of that information
required.  Discussions today indicated I'd be provided with the temperature
margin for the fastest device and the slowest (in terms of thermal
adjustment) per zone.  The YAML definition will need to allow for
indicating whether a sensor is available to the BMC or is "outside."

>> The system will support zones defined (yes, probably in YAML).  A zone
will
>> have at least one exclusion fan, and at least one thermal sensor.  The
>> thermal sensor can be shared.  There will be defaults provided in this
>> configuration to act as fallbacks.

> There is some code available to define zones via YAML.  Matt Spinler can
> point you at these.

Ok.

>> The thermal loop will be margin based and attempt to drive the fans to
>> maintain the temperature within operating temperature of the zones.  Each
>> zone will be independently managed.

> These sounds very similar to what their intended design is as well.  For
> a zone there is a lower-threshold and an upper-threshold.  When the
> temperature is above the upper-threshold, the fan speed is increased and
> the fans are decreased when the temperature is below the
> lower-threshold.  Again, the Matts can give you details on what the "IBM
> fan control algorithm" design is.

That's the basic idea.

>> Because not all thermal sensors can necessarily be ready by the BMC, we
>> need a method of getting that information from the host.  From a previous
>> project, we have the notion of sending thermal margins for slow and quick
>> (heat change) devices to a controller.

> Is this the Host->BMC via IPMI you mentioned earlier or does the BMC
> need to actively query the host in some cases?  Hopefully it is always
> one direction.

The plan is for Host->BMC only.  The host just feeds thermal information on
a cycle to the BMC for those sensors out of reach.

I'm very interested in seeing the design doc, or any code that exists, and
especially a timeline.

Regards,
Patrick

On Mon, Apr 17, 2017 at 7:31 PM, Patrick Williams <patrick at stwcx.xyz> wrote:

> Patrick,
>
> On Mon, Apr 17, 2017 at 01:21:29PM -0700, Patrick Venture wrote:
> > I'm working on a thermal control loop that'll operate within the openbmc
> > framework(s) and wanted to provide a somewhat high level overview for
> > thoughts.
>
> We should connect you with Matt Spinler (mspinler) and Matt Barth
> (msbarth) on IRC.  They have been working on implementing the "IBM fan
> control algorithm" but I suspect there is a significant amount of
> overlap.  Our intention was that you'd be able to reuse our
> implementation and insert a different (low-level detailed) algorithm.
>
> > The general design is to have a daemon that reads fans and temperatures
> > from dbus (reaching out to phosphor-hwmon) as well as being able to
> receive
> > temperatures and other sensor information over an OEM IPMI command.
>
> Sounds good.  This is how it is suppose to work.
>
> For the IPMI commands, the expectation would be that either the IPMI
> provider or an application fed by the IPMI provider for these OEM
> commands would implement the same xyz.openbmc_project.Sensor.Value
> interface as the phosphor-hwmon.  This way the thermal algorithm really
> doesn't need to know where the data comes from.
>
> > The system will support zones defined (yes, probably in YAML).  A zone
> will
> > have at least one exclusion fan, and at least one thermal sensor.  The
> > thermal sensor can be shared.  There will be defaults provided in this
> > configuration to act as fallbacks.
>
> There is some code available to define zones via YAML.  Matt Spinler can
> point you at these.
>
> > The thermal loop will be margin based and attempt to drive the fans to
> > maintain the temperature within operating temperature of the zones.  Each
> > zone will be independently managed.
>
> These sounds very similar to what their intended design is as well.  For
> a zone there is a lower-threshold and an upper-threshold.  When the
> temperature is above the upper-threshold, the fan speed is increased and
> the fans are decreased when the temperature is below the
> lower-threshold.  Again, the Matts can give you details on what the "IBM
> fan control algorithm" design is.
>
> > Because not all thermal sensors can necessarily be ready by the BMC, we
> > need a method of getting that information from the host.  From a previous
> > project, we have the notion of sending thermal margins for slow and quick
> > (heat change) devices to a controller.
>
> Is this the Host->BMC via IPMI you mentioned earlier or does the BMC
> need to actively query the host in some cases?  Hopefully it is always
> one direction.
>
> --
> Patrick Williams
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20170417/aa6718aa/attachment.html>


More information about the openbmc mailing list