[Skiboot] hwmon sensors cleanup
Oliver
oohall at gmail.com
Fri Nov 24 16:51:13 AEDT 2017
Hi all,
Currently we provide a lot of sensors via the device-tree which are
plugged into the kernel's hwmon interface via the powernv-sensor
driver. Unfortunately a lot of these sensors are... not very useful.
For example, take the output `sensors` on a (pass 1) romulus:
> Chip 0 VOLTDROOPCNTC04 0: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC05 4: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC06 8: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC07 12: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC12 16: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC13 20: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC14 24: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC15 28: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC16 32: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC17 36: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC18 40: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC19 44: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC20 48: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC21 52: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC22 56: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTC23 60: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC02 64: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC03 68: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC04 72: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC05 76: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC06 80: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC07 84: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC08 88: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC09 92: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC14 96: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC15 100: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC18 104: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC19 108: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC20 112: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC21 116: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC22 120: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTC23 124: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTQ0: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTQ1: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTQ2: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTQ3: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTQ4: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 0 VOLTDROOPCNTQ5: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTQ0: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTQ1: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTQ2: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTQ3: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTQ4: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
> Chip 8 VOLTDROOPCNTQ5: +0.00 V (lowest = +0.00 V, highest = +0.00 V)
I don't think we should have a hwmon sensor for these. Maybe we can
use that information
to add a secondary attribute to something?
> Chip 0 Vdd Remote Sense: +6.51 V (lowest = +6.45 V, highest = +8.31 V)
> Chip 0 Vdn Remote Sense: +9.01 V (lowest = +9.01 V, highest = +9.01 V)
> Chip 8 Vdd Remote Sense: +8.04 V (lowest = +6.50 V, highest = +8.07 V)
> Chip 8 Vdn Remote Sense: +9.01 V (lowest = +9.01 V, highest = +9.01 V)
Are these the point of load voltages?
> Chip 0 Vdd: +6.56 V (lowest = +6.56 V, highest = +8.36 V)
> Chip 0 Vdn: +9.02 V (lowest = +9.02 V, highest = +9.02 V)
> Chip 8 Vdd: +8.12 V (lowest = +6.67 V, highest = +8.12 V)
> Chip 8 Vdn: +9.02 V (lowest = +9.02 V, highest = +9.02 V)
The point of supply voltages?
why are these duplicated?
> Core 0: +37.0°C
> Core 4: +37.0°C
> Core 8: +36.0°C
> Core 12: +37.0°C
> Core 16: +38.0°C
> Core 20: +37.0°C
> Core 24: +37.0°C
> Core 28: +37.0°C
> Core 32: +37.0°C
> Core 36: +37.0°C
> Core 40: +36.0°C
> Core 44: +36.0°C
> Core 48: +37.0°C
> Core 52: +37.0°C
> Core 56: +35.0°C
> Core 60: +35.0°C
> Core 64: +35.0°C
> Core 68: +37.0°C
> Core 72: +37.0°C
> Core 76: +39.0°C
> Core 80: +35.0°C
> Core 84: +34.0°C
> Core 88: +36.0°C
> Core 92: +36.0°C
> Core 96: +37.0°C
> Core 100: +38.0°C
> Core 104: +36.0°C
> Core 108: +37.0°C
> Core 112: +37.0°C
> Core 116: +35.0°C
> Core 120: +36.0°C
> Core 124: +37.0°C
> Chip 0 Core 0: +36.0°C (lowest = +34.0°C, highest = +44.0°C)
> Chip 0 Core 4: +37.0°C (lowest = +34.0°C, highest = +45.0°C)
> Chip 0 Core 8: +35.0°C (lowest = +34.0°C, highest = +45.0°C)
> Chip 0 Core 12: +35.0°C (lowest = +34.0°C, highest = +44.0°C)
> Chip 0 Core 16: +37.0°C (lowest = +35.0°C, highest = +47.0°C)
> Chip 0 Core 20: +34.0°C (lowest = +33.0°C, highest = +45.0°C)
> Chip 0 Core 24: +36.0°C (lowest = +34.0°C, highest = +45.0°C)
> Chip 0 Core 28: +36.0°C (lowest = +35.0°C, highest = +45.0°C)
> Chip 0 Core 32: +37.0°C (lowest = +35.0°C, highest = +45.0°C)
> Chip 0 Core 36: +35.0°C (lowest = +33.0°C, highest = +44.0°C)
> Chip 0 Core 40: +35.0°C (lowest = +33.0°C, highest = +43.0°C)
> Chip 0 Core 44: +36.0°C (lowest = +34.0°C, highest = +44.0°C)
> Chip 0 Core 48: +36.0°C (lowest = +35.0°C, highest = +45.0°C)
> Chip 0 Core 52: +35.0°C (lowest = +34.0°C, highest = +46.0°C)
> Chip 0 Core 56: +33.0°C (lowest = +32.0°C, highest = +43.0°C)
> Chip 0 Core 60: +34.0°C (lowest = +34.0°C, highest = +44.0°C)
> Chip 8 Core 64: +35.0°C (lowest = +32.0°C, highest = +42.0°C)
> Chip 8 Core 68: +35.0°C (lowest = +33.0°C, highest = +42.0°C)
> Chip 8 Core 72: +36.0°C (lowest = +34.0°C, highest = +43.0°C)
> Chip 8 Core 76: +37.0°C (lowest = +34.0°C, highest = +44.0°C)
> Chip 8 Core 80: +35.0°C (lowest = +33.0°C, highest = +43.0°C)
> Chip 8 Core 84: +34.0°C (lowest = +32.0°C, highest = +41.0°C)
> Chip 8 Core 88: +35.0°C (lowest = +31.0°C, highest = +42.0°C)
> Chip 8 Core 92: +35.0°C (lowest = +32.0°C, highest = +41.0°C)
> Chip 8 Core 96: +36.0°C (lowest = +33.0°C, highest = +43.0°C)
> Chip 8 Core 100: +37.0°C (lowest = +34.0°C, highest = +43.0°C)
> Chip 8 Core 104: +36.0°C (lowest = +33.0°C, highest = +42.0°C)
> Chip 8 Core 108: +35.0°C (lowest = +32.0°C, highest = +42.0°C)
> Chip 8 Core 112: +34.0°C (lowest = +29.0°C, highest = +41.0°C)
> Chip 8 Core 116: +34.0°C (lowest = +29.0°C, highest = +42.0°C)
> Chip 8 Core 120: +36.0°C (lowest = +33.0°C, highest = +43.0°C)
> Chip 8 Core 124: +35.0°C (lowest = +32.0°C, highest = +42.0°C)
So we have two sets of temperature sensors. One set is from measuring
the per-core DTS directly, and the other set is from the OCC measuring
the same per-core DTS. We should probably not be doubling up here.
> Chip 0 DIMM 0 : +38.0°C (lowest = +37.0°C, highest = +38.0°C)
> Chip 0 DIMM 1 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 2 : +40.0°C (lowest = +39.0°C, highest = +40.0°C)
> Chip 0 DIMM 3 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 4 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 5 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 6 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 7 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 8 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 9 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 10 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 11 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 12 : +40.0°C (lowest = +40.0°C, highest = +41.0°C)
> Chip 0 DIMM 13 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 DIMM 14 : +40.0°C (lowest = +39.0°C, highest = +41.0°C)
> Chip 0 DIMM 15 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 0 : +36.0°C (lowest = +35.0°C, highest = +36.0°C)
> Chip 8 DIMM 1 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 2 : +36.0°C (lowest = +35.0°C, highest = +36.0°C)
> Chip 8 DIMM 3 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 4 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 5 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 6 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 7 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 8 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 9 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 10 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 11 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 12 : +35.0°C (lowest = +35.0°C, highest = +35.0°C)
> Chip 8 DIMM 13 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 DIMM 14 : +35.0°C (lowest = +35.0°C, highest = +36.0°C)
> Chip 8 DIMM 15 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
A lot of these are zero because the DIMM itself isn't populated. Witherspoon and
Zaius don't even have sockets for 16 DIMMS on each socket, so why even
report them
here?
> Chip 0 Nest: +35.0°C (lowest = +34.0°C, highest = +40.0°C)
> Chip 8 Nest: +37.0°C (lowest = +35.0°C, highest = +42.0°C)
Could we report this as the chip overall temperature?
> Chip 0 GPU 0 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 GPU 1 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 GPU 2 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 GPU 0 MEM: +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 GPU 1 MEM: +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 0 GPU 2 MEM: +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 GPU 0 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 GPU 1 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 GPU 2 : +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 GPU 0 MEM: +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 GPU 1 MEM: +0.0°C (lowest = +0.0°C, highest = +0.0°C)
> Chip 8 GPU 2 MEM: +0.0°C (lowest = +0.0°C, highest = +0.0°C)
I think that these should probably be witherspoon specific. If we have support
> Chip 0 TEMPVDD: +40.0°C (lowest = +39.0°C, highest = +45.0°C)
> Chip 8 TEMPVDD: +38.0°C (lowest = +37.0°C, highest = +42.0°C)
What is this even? Temperature of the Vdn regulator? What about Vdn?
> Chip 0 Memory: 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> Chip 8 Memory: 0.00 W (lowest = 0.00 W, highest = 0.00 W)
I'm not sure how you would even fill this out. Average DIMM temperature?
> Chip 8 : 61.00 W (lowest = 47.00 W, highest = 126.00 W)
> Chip 0 : 48.00 W (lowest = 45.00 W, highest = 129.00 W)
Seems useful.
> Chip 0 Vdd: 13.00 W (lowest = 10.00 W, highest = 93.00 W)
> Chip 0 Vdn: 16.00 W (lowest = 16.00 W, highest = 18.00 W)
> Chip 8 Vdd: 26.00 W (lowest = 12.00 W, highest = 90.00 W)
> Chip 8 Vdn: 16.00 W (lowest = 16.00 W, highest = 19.00 W)
This does make sense, but there isn't a whole lot of point to.
> Chip 0 GPU: 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> Chip 8 GPU: 0.00 W (lowest = 0.00 W, highest = 0.00 W)
where did this one come from?
> System: 0.00 W (lowest = 0.00 W, highest = 0.00 W)
This should probably not be zero.
> APSS 0 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 1 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 2 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 3 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 4 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 5 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 6 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 7 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 8 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 9 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 10 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 11 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 12 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 13 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 14 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
> APSS 15 : 0.00 W (lowest = 0.00 W, highest = 0.00 W)
Should these even be here? I don't believe pass2 romulus even has an
APSS so it shouldn't be appearing here. Really we should have more
useful names for each APSS channel.
> Chip 0 Vdd: +2.05 A (lowest = +1.36 A, highest = +11.64 A)
> Chip 0 Vdn: +1.85 A (lowest = +1.81 A, highest = +2.05 A)
> Chip 8 Vdd: +3.00 A (lowest = +1.52 A, highest = +11.54 A)
> Chip 8 Vdn: +1.89 A (lowest = +1.82 A, highest = +2.16 A)
Maybe we should move towards a white-listing approach rather than just
throwing in as much stuff as possible.
Oliver
More information about the Skiboot
mailing list