[Skiboot] [PATCH 2/2] fsp-sensor: rework device tree for sensors
Cédric Le Goater
clg at fr.ibm.com
Tue Mar 8 19:22:48 AEDT 2016
On 03/08/2016 04:54 AM, Stewart Smith wrote:
> Cédric Le Goater <clg at fr.ibm.com> writes:
>> On 03/07/2016 08:04 AM, Stewart Smith wrote:
>>> Cédric Le Goater <clg at fr.ibm.com> writes:
>>>> The current code in OPAL exposing the FSP sensors in the device tree
>>>> is very SPCN-centric which makes it difficult to add new sensors
>>>> fitting with the ibmpowernv Linux driver. This patch proposes some
>>>> improvements on the way the device tree is created.
>>>>
>>>> The logic behind the node creation is preserved. The DMA sensor buffer
>>>> is parsed, looping on the PRS command modifiers and entries while
>>>> nodes are being created under the "ibm,opal/sensors/" directory. The
>>>> code now splits the creation under separate routines, one for each
>>>> modifier, and use the same old pattern for names :
>>>>
>>>> <resource class name>#<index>-attribute/
>>>>
>>>> Each resource node is compatible with :
>>>>
>>>> "ibm,opal-sensor-<resource classname>"
>>>>
>>>> There is a mapping to be done between the attributes of a same
>>>> resource and the PRS command used to collect them. This adds some
>>>> complexity in the code when creating the node and when building a
>>>> request for the FSP.
>>>>
>>>> For instance, the status of a FSP sensor which can be returned by one
>>>> or more PRS command modifiers. For power supply and fans, we choose
>>>> the PRS modifier (and not DATA) to return the AC_FAULTED bit. For the
>>>> ambient temperature, there is no other choice than to use the DATA
>>>> modifier. The status bits being :
>>>>
>>>> PRS PARAM/DATA
>>>> Modifier Modifier
>>>>
>>>> 0x0010 ON SUPPORTED
>>>> 0x0008 ON
>>>>
>>>> 0x0004 AC FAULTED EM ALERT
>>>> 0x0002 FAULTED FAULTED
>>>> 0x0001 PRESENT PRESENT
>>>>
>>>> we only keep bits[1-2] to reflect the fault status to Linux.
>>>>
>>>> Another significant change is that the power consumption is now
>>>> reported for each power supply and not as a whole like before. A Tuleta
>>>> can have up to four distinct power supplies so it seems an interesting
>>>> resource to report independently.
>>>>
>>>> Currently, we handle the "power-supply", "cooling-fan" and "amb-temp"
>>>> resource classes. More exist in the specs but they have not showed up
>>>> on the Tuleta I used.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg at fr.ibm.com>
>>>> ---
>>>> hw/fsp/fsp-sensor.c | 489 +++++++++++++++++++++++++++++-----------------------
>>>> 1 file changed, 278 insertions(+), 211 deletions(-)
>>>
>>> Sorry for the delay in looking at this closely...
>>
>> Thanks for doing so.
>>
>>> I notice a couple of things:
>>> - we seem to go from amb-temp#1-data to amb-temp#24576-data -
>>> intentional?
>>
>>
>> Yes. The previous implementation used to increment an index (see
>> routine get_index() and array prids[]) to identify the resource
>> for which the device tree was exposing a sensor node. This is
>> error prone and does not fit very well in the linux model.
>>
>> The hwmon Linux driver was changed last year to accommodate its
>> algorithm for newer sensors (core and mem buffer temps) and it
>> can now take any (unique) number. So the patch changes the index
>> to the resource id given by the FSP (See struct sensor_header
>> in the previous patch) which is unique for a given resource class.
>>
>> I could have used a '%x'. I will give a try, it would be a little
>> better but for the naming we are stuck with :
>>
>> <resource class>#<id>-<attribute name>
>>
>> as the initial version of the firmware used this scheme in the
>> device tree.
>>
>>
>>
>>> - IT seems that DTS sensors follow doc/device-tree/ibm,opal/sensors.txt
>>> yet the fsp-sensors come out differently:
>>> dts: type at addr
>>> fsp: type#id-type
>>>
>>> This should likely be documented *why* at least.
>>
>> OK. I will. The reason of the change was a more flexible naming to
>> expose more sensors. Getting rid of the initial naming was not
>> possible at that time as firmwares had already been shipped. I would
>> have clearly preferred to kill the initial version.
>
> :)
>
> What did we ship with the old one again? I think I investigated this
> before, so it's probabyl somewhere in the mail archive at least...
>
> We could probably phase it out though, on P9 we could not add the old
> style, as all kernels running there will have support for new style, and
> then it'll completely go in $time
I will dig in history to see which branch we can cut, if possible.
>>> - when applying this patch, I seem to get 4 more sensors!?
>>
>> which ones ? :)
>
>> I have noticed that recent IBM firmwares (FW840) expose 2 more
>> temperature sensors. So that adds a couple.
>
> I was booting with same FSP firmware, so should have otherwise been
> identical.... I've attached (compressed) device trees of before/after.
>
>
>
>
>
>> The patch above adds sensors for the each PSU which were gathered
>> in one sensor before. However, the PSU faults were exposed in different
>> sensors. The patches keeps that.
>>
>> So on a s822, you should see these changes,
>>
>> before patch :
>>
>> # ls -1 /proc/device-tree/ibm,opal/sensors | egrep 'amb-temp|power'
>> amb-temp#1-data
>> amb-temp#1-thrs
>> amb-temp#2-data
>> amb-temp#2-thrs
>> amb-temp#3-data
>> amb-temp#3-thrs
>> power#1-data
>> power-supply#1-faulted
>> power-supply#2-faulted
>>
>> after :
>>
>> # ls -1 /proc/device-tree/ibm,opal/sensors | egrep 'amb-temp|power'
>> amb-temp#24576-data
>> amb-temp#24576-faulted
>> amb-temp#24576-thrs
>> amb-temp#24577-data
>> amb-temp#24577-faulted
>> amb-temp#24577-thrs
>> amb-temp#24578-data
>> amb-temp#24578-faulted
>> amb-temp#24578-thrs
>> power#4096-data
>> power#4096-faulted
>> power#4097-data
>> power#4097-faulted
>>
>> on a s824, you would have 2 more PSUs:
>>
>> power#4098-data
>> power#4098-faulted
>> power#4099-data
>> power#4099-faulted
>>
>> and finally, on a s824, the final result would be with the sensors command :
>>
>> # sensors
>> ibmpowernv-isa-0000
>> Adapter: ISA adapter
>> fan1: 3520 RPM (min = 0 RPM)
>> fan2: 3195 RPM (min = 0 RPM)
>> fan3: 3515 RPM (min = 0 RPM)
>> fan4: 3202 RPM (min = 0 RPM)
>> fan5: 3461 RPM (min = 0 RPM)
>> fan6: 3187 RPM (min = 0 RPM)
>> fan7: 3488 RPM (min = 0 RPM)
>> fan8: 3210 RPM (min = 0 RPM)
>> fan9: 5008 RPM (min = 0 RPM)
>> fan10: 5000 RPM (min = 0 RPM)
>> fan11: 5000 RPM (min = 0 RPM)
>> fan12: 5008 RPM (min = 0 RPM)
>> temp1: +24.0°C (high = +0.0°C)
>> temp2: +25.0°C (high = +0.0°C)
>> temp3: +25.0°C (high = +0.0°C)
>> Core 0-7: +50.0°C
>> Core 8-15: +48.0°C
>> Core 16-23: +49.0°C
>> Core 24-31: +46.0°C
>> Core 32-39: +41.0°C
>> Core 40-47: +41.0°C
>> Core 48-55: +42.0°C
>> Core 56-63: +41.0°C
>> Core 64-71: +40.0°C
>> Core 72-79: +42.0°C
>> Core 80-87: +38.0°C
>> Core 88-95: +42.0°C
>> Core 96-103: +47.0°C
>> Core 104-111: +46.0°C
>> Core 112-119: +47.0°C
>> Core 120-127: +47.0°C
>> power1: 169.00 W
>> power2: 152.00 W
>> power3: 159.00 W
>> power4: 152.00 W
>
> It seems that I do get extra faulted sensors:
> amb-temp#24576-faulted {
> amb-temp#24578-faulted {
> amb-temp#24577-faulted {
Yes. The patch is adding these nodes to be consistent with other sensors.
> which correspond to the amb-temp-data and thrs.
>
> and for power:
>
> $ xzcat p86-after.dts.xz |egrep 'power\#'
> power#4096-data {
> power#4097-data {
> power#4097-faulted {
> power#4096-faulted {
> $ xzcat p86-before.dts.xz |egrep 'power\#'
> power#1-data {
yes. the patch adds 2 distincts psus with a faulted node for each.
> So... I think that looks about right? IF you agree, I'll merge
> these in today.
Looks good.
Thanks,
C.
More information about the Skiboot
mailing list