[Skiboot] [PATCH 2/2] fsp-sensor: rework device tree for sensors
Stewart Smith
stewart at linux.vnet.ibm.com
Tue Mar 8 14:54:15 AEDT 2016
Cédric Le Goater <clg at fr.ibm.com> writes:
> On 03/07/2016 08:04 AM, Stewart Smith wrote:
>> Cédric Le Goater <clg at fr.ibm.com> writes:
>>> The current code in OPAL exposing the FSP sensors in the device tree
>>> is very SPCN-centric which makes it difficult to add new sensors
>>> fitting with the ibmpowernv Linux driver. This patch proposes some
>>> improvements on the way the device tree is created.
>>>
>>> The logic behind the node creation is preserved. The DMA sensor buffer
>>> is parsed, looping on the PRS command modifiers and entries while
>>> nodes are being created under the "ibm,opal/sensors/" directory. The
>>> code now splits the creation under separate routines, one for each
>>> modifier, and use the same old pattern for names :
>>>
>>> <resource class name>#<index>-attribute/
>>>
>>> Each resource node is compatible with :
>>>
>>> "ibm,opal-sensor-<resource classname>"
>>>
>>> There is a mapping to be done between the attributes of a same
>>> resource and the PRS command used to collect them. This adds some
>>> complexity in the code when creating the node and when building a
>>> request for the FSP.
>>>
>>> For instance, the status of a FSP sensor which can be returned by one
>>> or more PRS command modifiers. For power supply and fans, we choose
>>> the PRS modifier (and not DATA) to return the AC_FAULTED bit. For the
>>> ambient temperature, there is no other choice than to use the DATA
>>> modifier. The status bits being :
>>>
>>> PRS PARAM/DATA
>>> Modifier Modifier
>>>
>>> 0x0010 ON SUPPORTED
>>> 0x0008 ON
>>>
>>> 0x0004 AC FAULTED EM ALERT
>>> 0x0002 FAULTED FAULTED
>>> 0x0001 PRESENT PRESENT
>>>
>>> we only keep bits[1-2] to reflect the fault status to Linux.
>>>
>>> Another significant change is that the power consumption is now
>>> reported for each power supply and not as a whole like before. A Tuleta
>>> can have up to four distinct power supplies so it seems an interesting
>>> resource to report independently.
>>>
>>> Currently, we handle the "power-supply", "cooling-fan" and "amb-temp"
>>> resource classes. More exist in the specs but they have not showed up
>>> on the Tuleta I used.
>>>
>>> Signed-off-by: Cédric Le Goater <clg at fr.ibm.com>
>>> ---
>>> hw/fsp/fsp-sensor.c | 489 +++++++++++++++++++++++++++++-----------------------
>>> 1 file changed, 278 insertions(+), 211 deletions(-)
>>
>> Sorry for the delay in looking at this closely...
>
> Thanks for doing so.
>
>> I notice a couple of things:
>> - we seem to go from amb-temp#1-data to amb-temp#24576-data -
>> intentional?
>
>
> Yes. The previous implementation used to increment an index (see
> routine get_index() and array prids[]) to identify the resource
> for which the device tree was exposing a sensor node. This is
> error prone and does not fit very well in the linux model.
>
> The hwmon Linux driver was changed last year to accommodate its
> algorithm for newer sensors (core and mem buffer temps) and it
> can now take any (unique) number. So the patch changes the index
> to the resource id given by the FSP (See struct sensor_header
> in the previous patch) which is unique for a given resource class.
>
> I could have used a '%x'. I will give a try, it would be a little
> better but for the naming we are stuck with :
>
> <resource class>#<id>-<attribute name>
>
> as the initial version of the firmware used this scheme in the
> device tree.
>
>
>
>> - IT seems that DTS sensors follow doc/device-tree/ibm,opal/sensors.txt
>> yet the fsp-sensors come out differently:
>> dts: type at addr
>> fsp: type#id-type
>>
>> This should likely be documented *why* at least.
>
> OK. I will. The reason of the change was a more flexible naming to
> expose more sensors. Getting rid of the initial naming was not
> possible at that time as firmwares had already been shipped. I would
> have clearly preferred to kill the initial version.
:)
What did we ship with the old one again? I think I investigated this
before, so it's probabyl somewhere in the mail archive at least...
We could probably phase it out though, on P9 we could not add the old
style, as all kernels running there will have support for new style, and
then it'll completely go in $time
>> - when applying this patch, I seem to get 4 more sensors!?
>
> which ones ? :)
> I have noticed that recent IBM firmwares (FW840) expose 2 more
> temperature sensors. So that adds a couple.
I was booting with same FSP firmware, so should have otherwise been
identical.... I've attached (compressed) device trees of before/after.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p86-before.dts.xz
Type: application/x-xz
Size: 39228 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/skiboot/attachments/20160308/43ab344a/attachment-0002.xz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p86-after.dts.xz
Type: application/x-xz
Size: 39320 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/skiboot/attachments/20160308/43ab344a/attachment-0003.xz>
-------------- next part --------------
> The patch above adds sensors for the each PSU which were gathered
> in one sensor before. However, the PSU faults were exposed in different
> sensors. The patches keeps that.
>
> So on a s822, you should see these changes,
>
> before patch :
>
> # ls -1 /proc/device-tree/ibm,opal/sensors | egrep 'amb-temp|power'
> amb-temp#1-data
> amb-temp#1-thrs
> amb-temp#2-data
> amb-temp#2-thrs
> amb-temp#3-data
> amb-temp#3-thrs
> power#1-data
> power-supply#1-faulted
> power-supply#2-faulted
>
> after :
>
> # ls -1 /proc/device-tree/ibm,opal/sensors | egrep 'amb-temp|power'
> amb-temp#24576-data
> amb-temp#24576-faulted
> amb-temp#24576-thrs
> amb-temp#24577-data
> amb-temp#24577-faulted
> amb-temp#24577-thrs
> amb-temp#24578-data
> amb-temp#24578-faulted
> amb-temp#24578-thrs
> power#4096-data
> power#4096-faulted
> power#4097-data
> power#4097-faulted
>
> on a s824, you would have 2 more PSUs:
>
> power#4098-data
> power#4098-faulted
> power#4099-data
> power#4099-faulted
>
> and finally, on a s824, the final result would be with the sensors command :
>
> # sensors
> ibmpowernv-isa-0000
> Adapter: ISA adapter
> fan1: 3520 RPM (min = 0 RPM)
> fan2: 3195 RPM (min = 0 RPM)
> fan3: 3515 RPM (min = 0 RPM)
> fan4: 3202 RPM (min = 0 RPM)
> fan5: 3461 RPM (min = 0 RPM)
> fan6: 3187 RPM (min = 0 RPM)
> fan7: 3488 RPM (min = 0 RPM)
> fan8: 3210 RPM (min = 0 RPM)
> fan9: 5008 RPM (min = 0 RPM)
> fan10: 5000 RPM (min = 0 RPM)
> fan11: 5000 RPM (min = 0 RPM)
> fan12: 5008 RPM (min = 0 RPM)
> temp1: +24.0°C (high = +0.0°C)
> temp2: +25.0°C (high = +0.0°C)
> temp3: +25.0°C (high = +0.0°C)
> Core 0-7: +50.0°C
> Core 8-15: +48.0°C
> Core 16-23: +49.0°C
> Core 24-31: +46.0°C
> Core 32-39: +41.0°C
> Core 40-47: +41.0°C
> Core 48-55: +42.0°C
> Core 56-63: +41.0°C
> Core 64-71: +40.0°C
> Core 72-79: +42.0°C
> Core 80-87: +38.0°C
> Core 88-95: +42.0°C
> Core 96-103: +47.0°C
> Core 104-111: +46.0°C
> Core 112-119: +47.0°C
> Core 120-127: +47.0°C
> power1: 169.00 W
> power2: 152.00 W
> power3: 159.00 W
> power4: 152.00 W
It seems that I do get extra faulted sensors:
amb-temp#24576-faulted {
amb-temp#24578-faulted {
amb-temp#24577-faulted {
which correspond to the amb-temp-data and thrs.
and for power:
$ xzcat p86-after.dts.xz |egrep 'power\#'
power#4096-data {
power#4097-data {
power#4097-faulted {
power#4096-faulted {
$ xzcat p86-before.dts.xz |egrep 'power\#'
power#1-data {
So... I think that looks about right? IF you agree, I'll merge these in today.
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list