[Skiboot] hwmon sensors cleanup

Oliver oohall at gmail.com
Fri Nov 24 16:51:13 AEDT 2017


Hi all,

Currently we provide a lot of sensors via the device-tree which are
plugged into the kernel's hwmon interface via the powernv-sensor
driver. Unfortunately a lot of these sensors are... not very useful.
For example, take the output `sensors` on a (pass 1) romulus:

> Chip 0 VOLTDROOPCNTC04 0:    +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC05 4:    +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC06 8:    +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC07 12:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC12 16:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC13 20:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC14 24:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC15 28:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC16 32:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC17 36:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC18 40:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC19 44:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC20 48:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC21 52:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC22 56:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTC23 60:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC02 64:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC03 68:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC04 72:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC05 76:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC06 80:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC07 84:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC08 88:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC09 92:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC14 96:   +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC15 100:  +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC18 104:  +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC19 108:  +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC20 112:  +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC21 116:  +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC22 120:  +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTC23 124:  +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTQ0:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTQ1:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTQ2:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTQ3:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTQ4:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 0 VOLTDROOPCNTQ5:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTQ0:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTQ1:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTQ2:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTQ3:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTQ4:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)
> Chip 8 VOLTDROOPCNTQ5:       +0.00 V  (lowest =  +0.00 V, highest =  +0.00 V)

I don't think we should have a hwmon sensor for these. Maybe we can
use that information
to add a secondary attribute to something?

> Chip 0 Vdd Remote Sense:     +6.51 V  (lowest =  +6.45 V, highest =  +8.31 V)
> Chip 0 Vdn Remote Sense:     +9.01 V  (lowest =  +9.01 V, highest =  +9.01 V)
> Chip 8 Vdd Remote Sense:     +8.04 V  (lowest =  +6.50 V, highest =  +8.07 V)
> Chip 8 Vdn Remote Sense:     +9.01 V  (lowest =  +9.01 V, highest =  +9.01 V)
Are these the point of load voltages?

> Chip 0 Vdd:                  +6.56 V  (lowest =  +6.56 V, highest =  +8.36 V)
> Chip 0 Vdn:                  +9.02 V  (lowest =  +9.02 V, highest =  +9.02 V)
> Chip 8 Vdd:                  +8.12 V  (lowest =  +6.67 V, highest =  +8.12 V)
> Chip 8 Vdn:                  +9.02 V  (lowest =  +9.02 V, highest =  +9.02 V)
The point of supply voltages?

why are these duplicated?

> Core 0:                      +37.0°C
> Core 4:                      +37.0°C
> Core 8:                      +36.0°C
> Core 12:                     +37.0°C
> Core 16:                     +38.0°C
> Core 20:                     +37.0°C
> Core 24:                     +37.0°C
> Core 28:                     +37.0°C
> Core 32:                     +37.0°C
> Core 36:                     +37.0°C
> Core 40:                     +36.0°C
> Core 44:                     +36.0°C
> Core 48:                     +37.0°C
> Core 52:                     +37.0°C
> Core 56:                     +35.0°C
> Core 60:                     +35.0°C
> Core 64:                     +35.0°C
> Core 68:                     +37.0°C
> Core 72:                     +37.0°C
> Core 76:                     +39.0°C
> Core 80:                     +35.0°C
> Core 84:                     +34.0°C
> Core 88:                     +36.0°C
> Core 92:                     +36.0°C
> Core 96:                     +37.0°C
> Core 100:                    +38.0°C
> Core 104:                    +36.0°C
> Core 108:                    +37.0°C
> Core 112:                    +37.0°C
> Core 116:                    +35.0°C
> Core 120:                    +36.0°C
> Core 124:                    +37.0°C
> Chip 0 Core 0:               +36.0°C  (lowest = +34.0°C, highest = +44.0°C)
> Chip 0 Core 4:               +37.0°C  (lowest = +34.0°C, highest = +45.0°C)
> Chip 0 Core 8:               +35.0°C  (lowest = +34.0°C, highest = +45.0°C)
> Chip 0 Core 12:              +35.0°C  (lowest = +34.0°C, highest = +44.0°C)
> Chip 0 Core 16:              +37.0°C  (lowest = +35.0°C, highest = +47.0°C)
> Chip 0 Core 20:              +34.0°C  (lowest = +33.0°C, highest = +45.0°C)
> Chip 0 Core 24:              +36.0°C  (lowest = +34.0°C, highest = +45.0°C)
> Chip 0 Core 28:              +36.0°C  (lowest = +35.0°C, highest = +45.0°C)
> Chip 0 Core 32:              +37.0°C  (lowest = +35.0°C, highest = +45.0°C)
> Chip 0 Core 36:              +35.0°C  (lowest = +33.0°C, highest = +44.0°C)
> Chip 0 Core 40:              +35.0°C  (lowest = +33.0°C, highest = +43.0°C)
> Chip 0 Core 44:              +36.0°C  (lowest = +34.0°C, highest = +44.0°C)
> Chip 0 Core 48:              +36.0°C  (lowest = +35.0°C, highest = +45.0°C)
> Chip 0 Core 52:              +35.0°C  (lowest = +34.0°C, highest = +46.0°C)
> Chip 0 Core 56:              +33.0°C  (lowest = +32.0°C, highest = +43.0°C)
> Chip 0 Core 60:              +34.0°C  (lowest = +34.0°C, highest = +44.0°C)
> Chip 8 Core 64:              +35.0°C  (lowest = +32.0°C, highest = +42.0°C)
> Chip 8 Core 68:              +35.0°C  (lowest = +33.0°C, highest = +42.0°C)
> Chip 8 Core 72:              +36.0°C  (lowest = +34.0°C, highest = +43.0°C)
> Chip 8 Core 76:              +37.0°C  (lowest = +34.0°C, highest = +44.0°C)
> Chip 8 Core 80:              +35.0°C  (lowest = +33.0°C, highest = +43.0°C)
> Chip 8 Core 84:              +34.0°C  (lowest = +32.0°C, highest = +41.0°C)
> Chip 8 Core 88:              +35.0°C  (lowest = +31.0°C, highest = +42.0°C)
> Chip 8 Core 92:              +35.0°C  (lowest = +32.0°C, highest = +41.0°C)
> Chip 8 Core 96:              +36.0°C  (lowest = +33.0°C, highest = +43.0°C)
> Chip 8 Core 100:             +37.0°C  (lowest = +34.0°C, highest = +43.0°C)
> Chip 8 Core 104:             +36.0°C  (lowest = +33.0°C, highest = +42.0°C)
> Chip 8 Core 108:             +35.0°C  (lowest = +32.0°C, highest = +42.0°C)
> Chip 8 Core 112:             +34.0°C  (lowest = +29.0°C, highest = +41.0°C)
> Chip 8 Core 116:             +34.0°C  (lowest = +29.0°C, highest = +42.0°C)
> Chip 8 Core 120:             +36.0°C  (lowest = +33.0°C, highest = +43.0°C)
> Chip 8 Core 124:             +35.0°C  (lowest = +32.0°C, highest = +42.0°C)

So we have two sets of temperature sensors. One set is from measuring
the per-core DTS directly, and the other set is from the OCC measuring
the same per-core DTS. We should probably not be doubling up here.

> Chip 0 DIMM 0 :              +38.0°C  (lowest = +37.0°C, highest = +38.0°C)
> Chip 0 DIMM 1 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 2 :              +40.0°C  (lowest = +39.0°C, highest = +40.0°C)
> Chip 0 DIMM 3 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 4 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 5 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 6 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 7 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 8 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 9 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 10 :              +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 11 :              +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 12 :             +40.0°C  (lowest = +40.0°C, highest = +41.0°C)
> Chip 0 DIMM 13 :              +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 DIMM 14 :             +40.0°C  (lowest = +39.0°C, highest = +41.0°C)
> Chip 0 DIMM 15 :              +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 0 :              +36.0°C  (lowest = +35.0°C, highest = +36.0°C)
> Chip 8 DIMM 1 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 2 :              +36.0°C  (lowest = +35.0°C, highest = +36.0°C)
> Chip 8 DIMM 3 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 4 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 5 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 6 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 7 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 8 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 9 :               +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 10 :              +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 11 :              +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 12 :             +35.0°C  (lowest = +35.0°C, highest = +35.0°C)
> Chip 8 DIMM 13 :              +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 DIMM 14 :             +35.0°C  (lowest = +35.0°C, highest = +36.0°C)
> Chip 8 DIMM 15 :              +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)

A lot of these are zero because the DIMM itself isn't populated. Witherspoon and
Zaius don't even have sockets for 16 DIMMS on each socket, so why even
report them
here?

> Chip 0 Nest:                 +35.0°C  (lowest = +34.0°C, highest = +40.0°C)
> Chip 8 Nest:                 +37.0°C  (lowest = +35.0°C, highest = +42.0°C)

Could we report this as the chip overall temperature?

> Chip 0 GPU 0 :                +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 GPU 1 :                +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 GPU 2 :                +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 GPU 0 MEM:             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 GPU 1 MEM:             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 0 GPU 2 MEM:             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 GPU 0 :                +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 GPU 1 :                +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 GPU 2 :                +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 GPU 0 MEM:             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 GPU 1 MEM:             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
> Chip 8 GPU 2 MEM:             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)

I think that these should probably be witherspoon specific. If we have support

> Chip 0 TEMPVDD:              +40.0°C  (lowest = +39.0°C, highest = +45.0°C)
> Chip 8 TEMPVDD:              +38.0°C  (lowest = +37.0°C, highest = +42.0°C)

What is this even? Temperature of the Vdn regulator? What about Vdn?

> Chip 0 Memory:                0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> Chip 8 Memory:                0.00 W  (lowest =   0.00 W, highest =   0.00 W)
I'm not sure how you would even fill this out. Average DIMM temperature?

> Chip 8 :                     61.00 W  (lowest =  47.00 W, highest = 126.00 W)
> Chip 0 :                     48.00 W  (lowest =  45.00 W, highest = 129.00 W)
Seems useful.

> Chip 0 Vdd:                  13.00 W  (lowest =  10.00 W, highest =  93.00 W)
> Chip 0 Vdn:                  16.00 W  (lowest =  16.00 W, highest =  18.00 W)
> Chip 8 Vdd:                  26.00 W  (lowest =  12.00 W, highest =  90.00 W)
> Chip 8 Vdn:                  16.00 W  (lowest =  16.00 W, highest =  19.00 W)
This does make sense, but there isn't a whole lot of point to.

> Chip 0 GPU:                   0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> Chip 8 GPU:                   0.00 W  (lowest =   0.00 W, highest =   0.00 W)
where did this one come from?

> System:                       0.00 W  (lowest =   0.00 W, highest =   0.00 W)
This should probably not be zero.

> APSS 0 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 1 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 2 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 3 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 4 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 5 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 6 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 7 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 8 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 9 :                      0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 10 :                     0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 11 :                     0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 12 :                     0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 13 :                     0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 14 :                     0.00 W  (lowest =   0.00 W, highest =   0.00 W)
> APSS 15 :                     0.00 W  (lowest =   0.00 W, highest =   0.00 W)

Should these even be here? I don't believe pass2 romulus even has an
APSS so it shouldn't be appearing here. Really we should have more
useful names for each APSS channel.

> Chip 0 Vdd:                  +2.05 A  (lowest =  +1.36 A, highest = +11.64 A)
> Chip 0 Vdn:                  +1.85 A  (lowest =  +1.81 A, highest =  +2.05 A)
> Chip 8 Vdd:                  +3.00 A  (lowest =  +1.52 A, highest = +11.54 A)
> Chip 8 Vdn:                  +1.89 A  (lowest =  +1.82 A, highest =  +2.16 A)

Maybe we should move towards a white-listing approach rather than just
throwing in as much stuff as possible.

Oliver


More information about the Skiboot mailing list