[Help] How to debug 'sometimes the specific sensors get no reading after AC cycle the system' issue?

Bills, Jason M jason.m.bills at linux.intel.com
Tue Dec 10 04:08:37 AEDT 2024



On 12/3/2024 9:44 PM, Jacky Lee (TPI) wrote:
> Hi sir,
> 
> We have an Intel Birch Stream platform, and our BMC FW developer are 
> implementing OpenBMC onto it with a DC-SCM module, the BMC chip is 
> ASPEED AST2600 and the RoT is ASPEED AST1060.
> 
> We got an issue that sometimes the specific sensors get no reading after 
> AC power cycle the system, the failure rate is about 12%, below is the 
> example log:
> 
> CPU1_PVCCA_EHV | 00h | ok | 0.1 | 0.39 Amps
> CPU1_PVCCA_EHV | 01h | ok | 0.1 | 2 Amps
> CPU1_PVCCD0 | 02h | ok | 0.1 | 0.16 Amps
> CPU1_PVCCD0 | 03h | ok | 0.1 | 2 Amps
> CPU1_PVCCFA_EHV_ | 04h | ok | 0.1 | 3.90 Amps
> CPU1_PVCCFA_EHV_ | 05h | ok | 0.1 | 29 Amps
> CPU1_PVCCINF | 06h | ok | 0.1 | 1.25 Amps
> CPU1_PVCCINF | 07h | ok | 0.1 | 17 Amps
> CPU1_PVNN | 08h | ok | 0.1 | 0.08 Amps
> CPU1_PVNN | 09h | ok | 0.1 | 1 Amps
> CPU1_VCCIN | 0Ah | ok | 0.1 | 255 Amps
> CPU2_PVCCA_EHV | 0Bh | ok | 0.1 | 0.39 Amps
> CPU2_PVCCA_EHV | 0Ch | ok | 0.1 | 2 Amps
> CPU2_PVCCD0 | 0Dh | ok | 0.1 | 0.31 Amps
> CPU2_PVCCD0 | 0Eh | ok | 0.1 | 3 Amps
> CPU2_PVCCFA_EHV_ | 0Fh | ok | 0.1 | 4.76 Amps
> CPU2_PVCCFA_EHV_ | 10h | ok | 0.1 | 26 Amps
> CPU2_PVCCINF | 11h | ok | 0.1 | 1.09 Amps
> CPU2_PVCCINF | 12h | ok | 0.1 | 15 Amps
> CPU2_PVNN | 13h | ok | 0.1 | 0.08 Amps
> CPU2_PVNN | 14h | ok | 0.1 | 1 Amps
> CPU2_VCCIN | 15h | ok | 0.1 | 255 Amps
> FAN0_INLET_PWM | 16h | ok | 0.1 | 29.79 unspecifi
> FAN0_OUTLET_PWM | 17h | ok | 0.1 | 29.79 unspecifi
> FAN1_INLET_PWM | 18h | ok | 0.1 | 29.79 unspecifi
> FAN1_OUTLET_PWM | 19h | ok | 0.1 | 29.79 unspecifi
> FAN2_INLET_PWM | 1Ah | ok | 0.1 | 29.79 unspecifi
> FAN2_OUTLET_PWM | 1Bh | ok | 0.1 | 29.79 unspecifi
> FAN3_INLET_PWM | 1Ch | ok | 0.1 | 29.79 unspecifi
> FAN3_OUTLET_PWM | 1Dh | ok | 0.1 | 29.79 unspecifi
> FAN0_INLET_TACH | 1Eh | ok | 0.1 | 5292 RPM
> FAN0_OUTLET_TACH | 1Fh | ok | 0.1 | 4508 RPM
> FAN1_INLET_TACH | 20h | ok | 0.1 | 5390 RPM
> FAN1_OUTLET_TACH | 21h | ok | 0.1 | 4508 RPM
> FAN2_INLET_TACH | 22h | ok | 0.1 | 5390 RPM
> FAN2_OUTLET_TACH | 23h | ok | 0.1 | 4606 RPM
> FAN3_INLET_TACH | 24h | ok | 0.1 | 5390 RPM
> FAN3_OUTLET_TACH | 25h | ok | 0.1 | 4606 RPM
> CPU1_PVCCA_EHV | 26h | ok | 0.1 | 0 Watts
> CPU1_PVCCA_EHV | 27h | ok | 0.1 | 0 Watts
> CPU1_PVCCFA_EHV_ | 28h | ok | 0.1 | 59 Watts
> CPU1_PVCCFA_EHV_ | 29h | ok | 0.1 | 47.20 Watts
> CPU1_PVCCINF | 2Ah | ok | 0.1 | 11.80 Watts
> CPU1_PVCCINF | 2Bh | ok | 0.1 | 11.80 Watts
> CPU1_VCCIN | 2Ch | ok | 0.1 | 82.60 Watts
> CPU1_VCCIN | 2Dh | ok | 0.1 | 70.80 Watts
> CPU2_PVCCA_EHV | 2Eh | ok | 0.1 | 0 Watts
> CPU2_PVCCA_EHV | 2Fh | ok | 0.1 | 0 Watts
> CPU2_PVCCFA_EHV_ | 30h | ok | 0.1 | 47.20 Watts
> CPU2_PVCCFA_EHV_ | 31h | ok | 0.1 | 47.20 Watts
> CPU2_PVCCINF | 32h | ok | 0.1 | 11.80 Watts
> CPU2_PVCCINF | 33h | ok | 0.1 | 11.80 Watts
> CPU2_VCCIN | 34h | ok | 0.1 | 82.60 Watts
> CPU2_VCCIN | 35h | ok | 0.1 | 70.80 Watts
> Cpu_Power_Averag | 36h | ok | 0.1 | 124 Watts
> *Cpu_Power_Averag | 37h | ns | 0.1 | No Reading*
> Cpu_Power_Cap_CP | 38h | ok | 0.1 | 0 Watts
> *Cpu_Power_Cap_CP | 39h | ns | 0.1 | No Reading*
> Dimm_Power_Avera | 3Ah | ok | 0.1 | 300 Watts
> *Dimm_Power_Avera | 3Bh | ns | 0.1 | No Reading*
> Dimm_Power_Cap_C | 3Ch | ok | 0.1 | 0 Watts
> CPU1_PVCCA_Contr | 3Eh | ok | 0.1 | 34 degrees C
> CPU1_PVCCA_EHV | 3Fh | ok | 0.1 | 34 degrees C
> CPU1_PVCCD0 | 40h | ok | 0.1 | 42 degrees C
> CPU1_PVCCFA_Cont | 41h | ok | 0.1 | 43 degrees C
> CPU1_PVCCFA_EHV_ | 42h | ok | 0.1 | 44 degrees C
> CPU1_VCCIN | 43h | ok | 0.1 | 49 degrees C
> CPU2_PVCCA_Contr | 44h | ok | 0.1 | 33 degrees C
> CPU2_PVCCA_EHV | 45h | ok | 0.1 | 33 degrees C
> CPU2_PVCCD0 | 46h | ok | 0.1 | 42 degrees C
> CPU2_PVCCFA_Cont | 47h | ok | 0.1 | 42 degrees C
> CPU2_PVCCFA_EHV_ | 48h | ok | 0.1 | 44 degrees C
> CPU2_VCCIN | 49h | ok | 0.1 | 51 degrees C
> *DIMM_A1_CPU1 | 4Ah | ns | 0.1 | No Reading*
> DIMM_A1_CPU2 | 4Bh | ok | 0.1 | 36 degrees C
> *DIMM_A2_CPU1 | 4Ch | ns | 0.1 | No Reading*
> DIMM_A2_CPU2 | 4Dh | ok | 0.1 | 36 degrees C
> DIMM_B1_CPU1 | 4Eh | ok | 0.1 | 36 degrees C
> DIMM_B1_CPU2 | 4Fh | ok | 0.1 | 36 degrees C
> DIMM_B2_CPU1 | 50h | ok | 0.1 | 36 degrees C
> DIMM_B2_CPU2 | 51h | ok | 0.1 | 36 degrees C
> DIMM_C1_CPU1 | 52h | ok | 0.1 | 35 degrees C
> DIMM_C1_CPU2 | 53h | ok | 0.1 | 36 degrees C
> DIMM_C2_CPU1 | 54h | ok | 0.1 | 35 degrees C
> DIMM_C2_CPU2 | 55h | ok | 0.1 | 36 degrees C
> DIMM_D1_CPU1 | 56h | ok | 0.1 | 34 degrees C
> DIMM_D1_CPU2 | 57h | ok | 0.1 | 36 degrees C
> DIMM_D2_CPU1 | 58h | ok | 0.1 | 34 degrees C
> DIMM_D2_CPU2 | 59h | ok | 0.1 | 36 degrees C
> DIMM_E1_CPU1 | 5Ah | ok | 0.1 | 34 degrees C
> DIMM_E1_CPU2 | 5Bh | ok | 0.1 | 35 degrees C
> DIMM_E2_CPU1 | 5Ch | ok | 0.1 | 34 degrees C
> DIMM_E2_CPU2 | 5Dh | ok | 0.1 | 35 degrees C
> DIMM_F1_CPU1 | 5Eh | ok | 0.1 | 32 degrees C
> DIMM_F1_CPU2 | 5Fh | ok | 0.1 | 34 degrees C
> DIMM_F2_CPU1 | 60h | ok | 0.1 | 32 degrees C
> DIMM_F2_CPU2 | 61h | ok | 0.1 | 34 degrees C
> DIMM_G1_CPU1 | 62h | ok | 0.1 | 37 degrees C
> DIMM_G1_CPU2 | 63h | ok | 0.1 | 35 degrees C
> DIMM_G2_CPU1 | 64h | ok | 0.1 | 37 degrees C
> DIMM_G2_CPU2 | 65h | ok | 0.1 | 35 degrees C
> DIMM_H1_CPU1 | 66h | ok | 0.1 | 37 degrees C
> DIMM_H1_CPU2 | 67h | ok | 0.1 | 35 degrees C
> DIMM_H2_CPU1 | 68h | ok | 0.1 | 37 degrees C
> DIMM_H2_CPU2 | 69h | ok | 0.1 | 35 degrees C
> DIMM_I1_CPU1 | 6Ah | ok | 0.1 | 36 degrees C
> DIMM_I1_CPU2 | 6Bh | ok | 0.1 | 35 degrees C
> DIMM_I2_CPU1 | 6Ch | ok | 0.1 | 36 degrees C
> DIMM_I2_CPU2 | 6Dh | ok | 0.1 | 35 degrees C
> DIMM_J1_CPU1 | 6Eh | ok | 0.1 | 35 degrees C
> DIMM_J1_CPU2 | 6Fh | ok | 0.1 | 35 degrees C
> DIMM_J2_CPU1 | 70h | ok | 0.1 | 35 degrees C
> DIMM_J2_CPU2 | 71h | ok | 0.1 | 35 degrees C
> DIMM_K1_CPU1 | 72h | ok | 0.1 | 35 degrees C
> DIMM_K1_CPU2 | 73h | ok | 0.1 | 34 degrees C
> DIMM_K2_CPU1 | 74h | ok | 0.1 | 35 degrees C
> DIMM_K2_CPU2 | 75h | ok | 0.1 | 34 degrees C
> DIMM_L1_CPU1 | 76h | ok | 0.1 | 35 degrees C
> DIMM_L1_CPU2 | 77h | ok | 0.1 | 34 degrees C
> DIMM_L2_CPU1 | 78h | ok | 0.1 | 35 degrees C
> DIMM_L2_CPU2 | 79h | ok | 0.1 | 34 degrees C
> DTS_CPU1 | 7Ah | ok | 0.1 | 57 degrees C
> *DTS_CPU2 | 7Bh | ns | 0.1 | No Reading*
> Die_CPU1 | 7Ch | ok | 0.1 | 57 degrees C
> *Die_CPU2 | 7Dh | ns | 0.1 | No Reading*
> T_DBB_U44 | 7Eh | ok | 0.1 | 28 degrees C
> T_DCSCMB_U91 | 7Fh | ok | 0.1 | 30 degrees C
> T_FIOB_U1 | 80h | ok | 0.1 | 30 degrees C
> T_MB_U30 | 81h | ok | 0.1 | 40 degrees C
> T_MB_U31 | 82h | ok | 0.1 | 39 degrees C
> T_MB_U32 | 83h | ok | 0.1 | 29 degrees C
> T_MB_U33 | 84h | ok | 0.1 | 29 degrees C
> T_NVME_E3S_1 | 85h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_2 | 86h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_3 | 87h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_4 | 88h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_5 | 89h | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_6 | 8Ah | ok | 0.1 | 26.89 degrees C
> T_NVME_E3S_7 | 8Bh | ok | 0.1 | 27.89 degrees C
> T_NVME_E3S_8 | 8Ch | ok | 0.1 | 27.89 degrees C
> T_NVME_M2_0 | 8Dh | ok | 0.1 | 44.82 degrees C
> T_NVME_M2_1 | 8Eh | ok | 0.1 | 45.82 degrees C
> T_PDB_U10 | 8Fh | ok | 0.1 | 41 degrees C
> T_PDB_U11 | 90h | ok | 0.1 | 41 degrees C
> CPU1_PVCCA_EHV | 91h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCA_EHV | 92h | ok | 0.1 | 2 Volts
> CPU1_PVCCD0 | 93h | ok | 0.1 | 1 Volts
> CPU1_PVCCD1 | 94h | ok | 0.1 | 1 Volts
> CPU1_PVCCD | 95h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCFA_EHV_ | 96h | ok | 0.1 | 11.80 Volts
> CPU1_PVCCFA_EHV_ | 97h | ok | 0.1 | 2 Volts
> CPU1_PVCCINF | 98h | ok | 0.1 | 1 Volts
> CPU1_PVNN | 99h | ok | 0.1 | 1 Volts
> CPU1_VCCIN | 9Ah | ok | 0.1 | 2 Volts
> CPU2_PVCCA_EHV | 9Bh | ok | 0.1 | 11.80 Volts
> CPU2_PVCCA_EHV | 9Ch | ok | 0.1 | 2 Volts
> CPU2_PVCCD0 | 9Dh | ok | 0.1 | 1 Volts
> CPU2_PVCCD1 | 9Eh | ok | 0.1 | 1 Volts
> CPU2_PVCCD | 9Fh | ok | 0.1 | 11.80 Volts
> CPU2_PVCCFA_EHV_ | A0h | ok | 0.1 | 11.80 Volts
> CPU2_PVCCFA_EHV_ | A1h | ok | 0.1 | 2 Volts
> CPU2_PVCCINF | A2h | ok | 0.1 | 1 Volts
> CPU2_PVNN | A3h | ok | 0.1 | 1 Volts
> CPU2_VCCIN | A4h | ok | 0.1 | 2 Volts
> V_DCSCMB_P1V05_U | A5h | ok | 0.1 | 1.05 Volts
> V_DCSCMB_P1V0 | A6h | ok | 0.1 | 1.00 Volts
> V_DCSCMB_P3V3_RG | A7h | ok | 0.1 | 3.29 Volts
> V_DCSCMB_P3V3_ST | A8h | ok | 0.1 | 3.29 Volts
> V_DCSCMB_P12V_AU | A9h | ok | 0.1 | 12.20 Volts
> V_HPM_P1V0_AUX | AAh | ok | 0.1 | 0.99 Volts
> V_HPM_P1V1_AUX | ABh | ok | 0.1 | 1.09 Volts
> V_HPM_P1V2_MAX10 | ACh | ok | 0.1 | 1.20 Volts
> V_HPM_P1V8_AUX | ADh | ok | 0.1 | 1.78 Volts
> V_HPM_P2V5_MAX10 | AEh | ok | 0.1 | 2.47 Volts
> V_HPM_P3V3 | AFh | ok | 0.1 | 3.27 Volts
> V_HPM_P3V3_AUX | B0h | ok | 0.1 | 3.27 Volts
> V_HPM_P5V_AUX | B1h | ok | 0.1 | 2.79 Volts
> V_HPM_P12V | B2h | ok | 0.1 | 12.18 Volts
> V_HPM_P12V_AUX | B3h | ok | 0.1 | 12.18 Volts
> V_HPM_P12V_STBY | B4h | ok | 0.1 | 11.92 Volts
> V_HPM_PVCC3V3_AU | B5h | ok | 0.1 | 3.27 Volts
> 
> And our EE thought that it is not a HW issue and request our BMC FW 
> developer to debug it. We have also tried to exchange both CPU1/2 
> location either the DIMM module, but the issue still goes with the slot, 
> not the CPU or DIMM itself. Also, when this issue happened, it would be 
> always happen unless you AC power cycle the system.
> 
> Because this issue only happened with AC cycle the system, it could not 
> be reproduced with DC power cycling test which the BMC FW has not to 
> reboot its firmware OS, so we think it is possible to cause by BMC 
> firmware issue, but we don't know how to debug it thru BMC firmware even 
> the console log, we need your help to provide some directions on 
> debugging it, thank you.
> 
> BTW, the OS we used on the system is Rocky Linux 9.4, and the sensor 
> list was captured from the OS thru ipmitool during the test.
> 
> Best regards,
> *Jacky Lee*

Hi Jacky,

For issues related to Intel platforms, you can directly reach out to 
your Intel support representative for assistance.

Thanks,
-Jason

> 
> 
> 2F, No.6, Sec.1, Jhongsing Rd., Wugu
> 
> Township, New Taipei 248, Taiwan (R.O.C.)
> Tel(TW): 886-2-89771415
> 
> Fax(TW): 886-2-89769773
> 
> E-mail: Jacky.Lee at flex.com <mailto:Jacky.Lee at flex.com>
> 
> Legal Disclaimer :
> The information contained in this message may be privileged and 
> confidential.
> It is intended to be read only by the individual or entity to whom it is 
> addressed
> or by their designee. If the reader of this message is not the intended 
> recipient,
> you are on notice that any distribution of this message, in any form,
> is strictly prohibited. If you have received this message in error,
> please immediately notify the sender and delete or destroy any copy of 
> this message!



More information about the openbmc mailing list