[Help] How to debug 'sometimes the specific sensors get no reading after AC cycle the system' issue?

Jacky Lee (TPI) jacky.lee at flex.com
Wed Dec 4 15:44:03 AEDT 2024


Hi sir,

We have an Intel Birch Stream platform, and our BMC FW developer are implementing OpenBMC onto it with a DC-SCM module, the BMC chip is ASPEED AST2600 and the RoT is ASPEED AST1060.

We got an issue that sometimes the specific sensors get no reading after AC power cycle the system, the failure rate is about 12%, below is the example log:

CPU1_PVCCA_EHV | 00h | ok | 0.1 | 0.39 Amps
CPU1_PVCCA_EHV | 01h | ok | 0.1 | 2 Amps
CPU1_PVCCD0 | 02h | ok | 0.1 | 0.16 Amps
CPU1_PVCCD0 | 03h | ok | 0.1 | 2 Amps
CPU1_PVCCFA_EHV_ | 04h | ok | 0.1 | 3.90 Amps
CPU1_PVCCFA_EHV_ | 05h | ok | 0.1 | 29 Amps
CPU1_PVCCINF | 06h | ok | 0.1 | 1.25 Amps
CPU1_PVCCINF | 07h | ok | 0.1 | 17 Amps
CPU1_PVNN | 08h | ok | 0.1 | 0.08 Amps
CPU1_PVNN | 09h | ok | 0.1 | 1 Amps
CPU1_VCCIN | 0Ah | ok | 0.1 | 255 Amps
CPU2_PVCCA_EHV | 0Bh | ok | 0.1 | 0.39 Amps
CPU2_PVCCA_EHV | 0Ch | ok | 0.1 | 2 Amps
CPU2_PVCCD0 | 0Dh | ok | 0.1 | 0.31 Amps
CPU2_PVCCD0 | 0Eh | ok | 0.1 | 3 Amps
CPU2_PVCCFA_EHV_ | 0Fh | ok | 0.1 | 4.76 Amps
CPU2_PVCCFA_EHV_ | 10h | ok | 0.1 | 26 Amps
CPU2_PVCCINF | 11h | ok | 0.1 | 1.09 Amps
CPU2_PVCCINF | 12h | ok | 0.1 | 15 Amps
CPU2_PVNN | 13h | ok | 0.1 | 0.08 Amps
CPU2_PVNN | 14h | ok | 0.1 | 1 Amps
CPU2_VCCIN | 15h | ok | 0.1 | 255 Amps
FAN0_INLET_PWM | 16h | ok | 0.1 | 29.79 unspecifi
FAN0_OUTLET_PWM | 17h | ok | 0.1 | 29.79 unspecifi
FAN1_INLET_PWM | 18h | ok | 0.1 | 29.79 unspecifi
FAN1_OUTLET_PWM | 19h | ok | 0.1 | 29.79 unspecifi
FAN2_INLET_PWM | 1Ah | ok | 0.1 | 29.79 unspecifi
FAN2_OUTLET_PWM | 1Bh | ok | 0.1 | 29.79 unspecifi
FAN3_INLET_PWM | 1Ch | ok | 0.1 | 29.79 unspecifi
FAN3_OUTLET_PWM | 1Dh | ok | 0.1 | 29.79 unspecifi
FAN0_INLET_TACH | 1Eh | ok | 0.1 | 5292 RPM
FAN0_OUTLET_TACH | 1Fh | ok | 0.1 | 4508 RPM
FAN1_INLET_TACH | 20h | ok | 0.1 | 5390 RPM
FAN1_OUTLET_TACH | 21h | ok | 0.1 | 4508 RPM
FAN2_INLET_TACH | 22h | ok | 0.1 | 5390 RPM
FAN2_OUTLET_TACH | 23h | ok | 0.1 | 4606 RPM
FAN3_INLET_TACH | 24h | ok | 0.1 | 5390 RPM
FAN3_OUTLET_TACH | 25h | ok | 0.1 | 4606 RPM
CPU1_PVCCA_EHV | 26h | ok | 0.1 | 0 Watts
CPU1_PVCCA_EHV | 27h | ok | 0.1 | 0 Watts
CPU1_PVCCFA_EHV_ | 28h | ok | 0.1 | 59 Watts
CPU1_PVCCFA_EHV_ | 29h | ok | 0.1 | 47.20 Watts
CPU1_PVCCINF | 2Ah | ok | 0.1 | 11.80 Watts
CPU1_PVCCINF | 2Bh | ok | 0.1 | 11.80 Watts
CPU1_VCCIN | 2Ch | ok | 0.1 | 82.60 Watts
CPU1_VCCIN | 2Dh | ok | 0.1 | 70.80 Watts
CPU2_PVCCA_EHV | 2Eh | ok | 0.1 | 0 Watts
CPU2_PVCCA_EHV | 2Fh | ok | 0.1 | 0 Watts
CPU2_PVCCFA_EHV_ | 30h | ok | 0.1 | 47.20 Watts
CPU2_PVCCFA_EHV_ | 31h | ok | 0.1 | 47.20 Watts
CPU2_PVCCINF | 32h | ok | 0.1 | 11.80 Watts
CPU2_PVCCINF | 33h | ok | 0.1 | 11.80 Watts
CPU2_VCCIN | 34h | ok | 0.1 | 82.60 Watts
CPU2_VCCIN | 35h | ok | 0.1 | 70.80 Watts
Cpu_Power_Averag | 36h | ok | 0.1 | 124 Watts
Cpu_Power_Averag | 37h | ns | 0.1 | No Reading
Cpu_Power_Cap_CP | 38h | ok | 0.1 | 0 Watts
Cpu_Power_Cap_CP | 39h | ns | 0.1 | No Reading
Dimm_Power_Avera | 3Ah | ok | 0.1 | 300 Watts
Dimm_Power_Avera | 3Bh | ns | 0.1 | No Reading
Dimm_Power_Cap_C | 3Ch | ok | 0.1 | 0 Watts
CPU1_PVCCA_Contr | 3Eh | ok | 0.1 | 34 degrees C
CPU1_PVCCA_EHV | 3Fh | ok | 0.1 | 34 degrees C
CPU1_PVCCD0 | 40h | ok | 0.1 | 42 degrees C
CPU1_PVCCFA_Cont | 41h | ok | 0.1 | 43 degrees C
CPU1_PVCCFA_EHV_ | 42h | ok | 0.1 | 44 degrees C
CPU1_VCCIN | 43h | ok | 0.1 | 49 degrees C
CPU2_PVCCA_Contr | 44h | ok | 0.1 | 33 degrees C
CPU2_PVCCA_EHV | 45h | ok | 0.1 | 33 degrees C
CPU2_PVCCD0 | 46h | ok | 0.1 | 42 degrees C
CPU2_PVCCFA_Cont | 47h | ok | 0.1 | 42 degrees C
CPU2_PVCCFA_EHV_ | 48h | ok | 0.1 | 44 degrees C
CPU2_VCCIN | 49h | ok | 0.1 | 51 degrees C
DIMM_A1_CPU1 | 4Ah | ns | 0.1 | No Reading
DIMM_A1_CPU2 | 4Bh | ok | 0.1 | 36 degrees C
DIMM_A2_CPU1 | 4Ch | ns | 0.1 | No Reading
DIMM_A2_CPU2 | 4Dh | ok | 0.1 | 36 degrees C
DIMM_B1_CPU1 | 4Eh | ok | 0.1 | 36 degrees C
DIMM_B1_CPU2 | 4Fh | ok | 0.1 | 36 degrees C
DIMM_B2_CPU1 | 50h | ok | 0.1 | 36 degrees C
DIMM_B2_CPU2 | 51h | ok | 0.1 | 36 degrees C
DIMM_C1_CPU1 | 52h | ok | 0.1 | 35 degrees C
DIMM_C1_CPU2 | 53h | ok | 0.1 | 36 degrees C
DIMM_C2_CPU1 | 54h | ok | 0.1 | 35 degrees C
DIMM_C2_CPU2 | 55h | ok | 0.1 | 36 degrees C
DIMM_D1_CPU1 | 56h | ok | 0.1 | 34 degrees C
DIMM_D1_CPU2 | 57h | ok | 0.1 | 36 degrees C
DIMM_D2_CPU1 | 58h | ok | 0.1 | 34 degrees C
DIMM_D2_CPU2 | 59h | ok | 0.1 | 36 degrees C
DIMM_E1_CPU1 | 5Ah | ok | 0.1 | 34 degrees C
DIMM_E1_CPU2 | 5Bh | ok | 0.1 | 35 degrees C
DIMM_E2_CPU1 | 5Ch | ok | 0.1 | 34 degrees C
DIMM_E2_CPU2 | 5Dh | ok | 0.1 | 35 degrees C
DIMM_F1_CPU1 | 5Eh | ok | 0.1 | 32 degrees C
DIMM_F1_CPU2 | 5Fh | ok | 0.1 | 34 degrees C
DIMM_F2_CPU1 | 60h | ok | 0.1 | 32 degrees C
DIMM_F2_CPU2 | 61h | ok | 0.1 | 34 degrees C
DIMM_G1_CPU1 | 62h | ok | 0.1 | 37 degrees C
DIMM_G1_CPU2 | 63h | ok | 0.1 | 35 degrees C
DIMM_G2_CPU1 | 64h | ok | 0.1 | 37 degrees C
DIMM_G2_CPU2 | 65h | ok | 0.1 | 35 degrees C
DIMM_H1_CPU1 | 66h | ok | 0.1 | 37 degrees C
DIMM_H1_CPU2 | 67h | ok | 0.1 | 35 degrees C
DIMM_H2_CPU1 | 68h | ok | 0.1 | 37 degrees C
DIMM_H2_CPU2 | 69h | ok | 0.1 | 35 degrees C
DIMM_I1_CPU1 | 6Ah | ok | 0.1 | 36 degrees C
DIMM_I1_CPU2 | 6Bh | ok | 0.1 | 35 degrees C
DIMM_I2_CPU1 | 6Ch | ok | 0.1 | 36 degrees C
DIMM_I2_CPU2 | 6Dh | ok | 0.1 | 35 degrees C
DIMM_J1_CPU1 | 6Eh | ok | 0.1 | 35 degrees C
DIMM_J1_CPU2 | 6Fh | ok | 0.1 | 35 degrees C
DIMM_J2_CPU1 | 70h | ok | 0.1 | 35 degrees C
DIMM_J2_CPU2 | 71h | ok | 0.1 | 35 degrees C
DIMM_K1_CPU1 | 72h | ok | 0.1 | 35 degrees C
DIMM_K1_CPU2 | 73h | ok | 0.1 | 34 degrees C
DIMM_K2_CPU1 | 74h | ok | 0.1 | 35 degrees C
DIMM_K2_CPU2 | 75h | ok | 0.1 | 34 degrees C
DIMM_L1_CPU1 | 76h | ok | 0.1 | 35 degrees C
DIMM_L1_CPU2 | 77h | ok | 0.1 | 34 degrees C
DIMM_L2_CPU1 | 78h | ok | 0.1 | 35 degrees C
DIMM_L2_CPU2 | 79h | ok | 0.1 | 34 degrees C
DTS_CPU1 | 7Ah | ok | 0.1 | 57 degrees C
DTS_CPU2 | 7Bh | ns | 0.1 | No Reading
Die_CPU1 | 7Ch | ok | 0.1 | 57 degrees C
Die_CPU2 | 7Dh | ns | 0.1 | No Reading
T_DBB_U44 | 7Eh | ok | 0.1 | 28 degrees C
T_DCSCMB_U91 | 7Fh | ok | 0.1 | 30 degrees C
T_FIOB_U1 | 80h | ok | 0.1 | 30 degrees C
T_MB_U30 | 81h | ok | 0.1 | 40 degrees C
T_MB_U31 | 82h | ok | 0.1 | 39 degrees C
T_MB_U32 | 83h | ok | 0.1 | 29 degrees C
T_MB_U33 | 84h | ok | 0.1 | 29 degrees C
T_NVME_E3S_1 | 85h | ok | 0.1 | 26.89 degrees C
T_NVME_E3S_2 | 86h | ok | 0.1 | 26.89 degrees C
T_NVME_E3S_3 | 87h | ok | 0.1 | 26.89 degrees C
T_NVME_E3S_4 | 88h | ok | 0.1 | 26.89 degrees C
T_NVME_E3S_5 | 89h | ok | 0.1 | 26.89 degrees C
T_NVME_E3S_6 | 8Ah | ok | 0.1 | 26.89 degrees C
T_NVME_E3S_7 | 8Bh | ok | 0.1 | 27.89 degrees C
T_NVME_E3S_8 | 8Ch | ok | 0.1 | 27.89 degrees C
T_NVME_M2_0 | 8Dh | ok | 0.1 | 44.82 degrees C
T_NVME_M2_1 | 8Eh | ok | 0.1 | 45.82 degrees C
T_PDB_U10 | 8Fh | ok | 0.1 | 41 degrees C
T_PDB_U11 | 90h | ok | 0.1 | 41 degrees C
CPU1_PVCCA_EHV | 91h | ok | 0.1 | 11.80 Volts
CPU1_PVCCA_EHV | 92h | ok | 0.1 | 2 Volts
CPU1_PVCCD0 | 93h | ok | 0.1 | 1 Volts
CPU1_PVCCD1 | 94h | ok | 0.1 | 1 Volts
CPU1_PVCCD | 95h | ok | 0.1 | 11.80 Volts
CPU1_PVCCFA_EHV_ | 96h | ok | 0.1 | 11.80 Volts
CPU1_PVCCFA_EHV_ | 97h | ok | 0.1 | 2 Volts
CPU1_PVCCINF | 98h | ok | 0.1 | 1 Volts
CPU1_PVNN | 99h | ok | 0.1 | 1 Volts
CPU1_VCCIN | 9Ah | ok | 0.1 | 2 Volts
CPU2_PVCCA_EHV | 9Bh | ok | 0.1 | 11.80 Volts
CPU2_PVCCA_EHV | 9Ch | ok | 0.1 | 2 Volts
CPU2_PVCCD0 | 9Dh | ok | 0.1 | 1 Volts
CPU2_PVCCD1 | 9Eh | ok | 0.1 | 1 Volts
CPU2_PVCCD | 9Fh | ok | 0.1 | 11.80 Volts
CPU2_PVCCFA_EHV_ | A0h | ok | 0.1 | 11.80 Volts
CPU2_PVCCFA_EHV_ | A1h | ok | 0.1 | 2 Volts
CPU2_PVCCINF | A2h | ok | 0.1 | 1 Volts
CPU2_PVNN | A3h | ok | 0.1 | 1 Volts
CPU2_VCCIN | A4h | ok | 0.1 | 2 Volts
V_DCSCMB_P1V05_U | A5h | ok | 0.1 | 1.05 Volts
V_DCSCMB_P1V0 | A6h | ok | 0.1 | 1.00 Volts
V_DCSCMB_P3V3_RG | A7h | ok | 0.1 | 3.29 Volts
V_DCSCMB_P3V3_ST | A8h | ok | 0.1 | 3.29 Volts
V_DCSCMB_P12V_AU | A9h | ok | 0.1 | 12.20 Volts
V_HPM_P1V0_AUX | AAh | ok | 0.1 | 0.99 Volts
V_HPM_P1V1_AUX | ABh | ok | 0.1 | 1.09 Volts
V_HPM_P1V2_MAX10 | ACh | ok | 0.1 | 1.20 Volts
V_HPM_P1V8_AUX | ADh | ok | 0.1 | 1.78 Volts
V_HPM_P2V5_MAX10 | AEh | ok | 0.1 | 2.47 Volts
V_HPM_P3V3 | AFh | ok | 0.1 | 3.27 Volts
V_HPM_P3V3_AUX | B0h | ok | 0.1 | 3.27 Volts
V_HPM_P5V_AUX | B1h | ok | 0.1 | 2.79 Volts
V_HPM_P12V | B2h | ok | 0.1 | 12.18 Volts
V_HPM_P12V_AUX | B3h | ok | 0.1 | 12.18 Volts
V_HPM_P12V_STBY | B4h | ok | 0.1 | 11.92 Volts
V_HPM_PVCC3V3_AU | B5h | ok | 0.1 | 3.27 Volts

And our EE thought that it is not a HW issue and request our BMC FW developer to debug it. We have also tried to exchange both CPU1/2 location either the DIMM module, but the issue still goes with the slot, not the CPU or DIMM itself. Also, when this issue happened, it would be always happen unless you AC power cycle the system.

Because this issue only happened with AC cycle the system, it could not be reproduced with DC power cycling test which the BMC FW has not to reboot its firmware OS, so we think it is possible to cause by BMC firmware issue, but we don't know how to debug it thru BMC firmware even the console log, we need your help to provide some directions on debugging it, thank you.

BTW, the OS we used on the system is Rocky Linux 9.4, and the sensor list was captured from the OS thru ipmitool during the test.



Best regards,
Jacky Lee
[cid:image001.png at 01DB464A.31816750]
2F, No.6, Sec.1, Jhongsing Rd., Wugu
Township, New Taipei 248, Taiwan (R.O.C.)
Tel(TW): 886-2-89771415
Fax(TW): 886-2-89769773
E-mail: Jacky.Lee at flex.com<mailto:Jacky.Lee at flex.com>

Legal Disclaimer :
The information contained in this message may be privileged and confidential. 
It is intended to be read only by the individual or entity to whom it is addressed 
or by their designee. If the reader of this message is not the intended recipient, 
you are on notice that any distribution of this message, in any form, 
is strictly prohibited. If you have received this message in error, 
please immediately notify the sender and delete or destroy any copy of this message!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20241204/61879803/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 6557 bytes
Desc: image001.png
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20241204/61879803/attachment-0001.png>


More information about the openbmc mailing list