Wedge400 (AST2520) OpenBMC stuck at reboot

Lei Yu yulei.sh at bytedance.com
Thu Sep 22 12:10:01 AEST 2022


We hit a similar but different issue about BMC stuck.
It occurs when running host DC cycle test, and when the issue occurs:
1. The BMC hangs, and the aspeed's heartbeat is off
2. If the wdt2 is enabled, the wdt2 will fire and aspeed chip will
reset and reboot into the seconds flash.
3. If the wdt2 is disabled, the BMC just hangs and we have to power
cycle the chassis.

We could not find the root cause, but it's likely related to a patch:
https://lore.kernel.org/openbmc/20201221223225.14723-2-jae.hyun.yoo@linux.intel.com/
If we revert the patch, the issue could not be reproduced anymore.

On Thu, Sep 22, 2022 at 6:09 AM Tao Ren <rentao.bupt at gmail.com> wrote:
>
> Hi there,
>
> Recently I noticed a few Wedge400 (AST2520A2) units stuck after "reboot"
> command. It's hard to reproduce (affecting ~1 out of 1,000 units), but
> once it happens, I have to power cycle the chassis to recover OpenBMC.
>
> I checked aspeed_wdt.c and manually played with watchdog registers, but
> everything looks normal to me. Did anyone hit the similar error before?
> Any suggestions which area I should look into?
>
> Below are the last few lines of logs before OpenBMC hangs:
>
> bmc-oob login:
> INIT: Sending processes configured via /etc/inittab the TERM signal
> Stopping OpenBSD Secure Shell server: sshdstopped /usr/sbin/sshd (pid 7397 1189)
> Stopping ntpd: done
> stopping rsyslogd ... done
> Stopping random number generator daemon.
> Deconfiguring network interfaces... done.
> Sending all processes the TERM signal...
> rackmond[1747]: Got request exit[  528.383133] watchdog: watchdog0: watchdog did not stop!
> Sending all processes the KILL signal...
> Unmounting remote filesystems...
> Deactivating swap...
> Unmounting local filesystems...
> Rebooting... [  529.725009] reboot: Restarting system
>
>
> Cheers,
>
> Tao



-- 
BRs,
Lei YU


More information about the openbmc mailing list