Wedge400 (AST2520) OpenBMC stuck at reboot
rentao.bupt at gmail.com
Thu Sep 22 08:08:42 AEST 2022
Recently I noticed a few Wedge400 (AST2520A2) units stuck after "reboot"
command. It's hard to reproduce (affecting ~1 out of 1,000 units), but
once it happens, I have to power cycle the chassis to recover OpenBMC.
I checked aspeed_wdt.c and manually played with watchdog registers, but
everything looks normal to me. Did anyone hit the similar error before?
Any suggestions which area I should look into?
Below are the last few lines of logs before OpenBMC hangs:
INIT: Sending processes configured via /etc/inittab the TERM signal
Stopping OpenBSD Secure Shell server: sshdstopped /usr/sbin/sshd (pid 7397 1189)
Stopping ntpd: done
stopping rsyslogd ... done
Stopping random number generator daemon.
Deconfiguring network interfaces... done.
Sending all processes the TERM signal...
rackmond: Got request exit[ 528.383133] watchdog: watchdog0: watchdog did not stop!
Sending all processes the KILL signal...
Unmounting remote filesystems...
Unmounting local filesystems...
Rebooting... [ 529.725009] reboot: Restarting system
More information about the openbmc