Wedge400 (AST2520) OpenBMC stuck at reboot

Tao Ren rentao.bupt at gmail.com
Thu Sep 22 16:21:24 AEST 2022


Hi Lei,

Thank you for the quick response! The symptom is quite similar to my
Wedge400 problem, but CONFIG_VIDEO_ASPEED is not enabled in my kconfig,
so it might be caused by different component(s) in my environment..


Cheers,

Tao

On Thu, Sep 22, 2022 at 10:10:01AM +0800, Lei Yu wrote:
> We hit a similar but different issue about BMC stuck.
> It occurs when running host DC cycle test, and when the issue occurs:
> 1. The BMC hangs, and the aspeed's heartbeat is off
> 2. If the wdt2 is enabled, the wdt2 will fire and aspeed chip will
> reset and reboot into the seconds flash.
> 3. If the wdt2 is disabled, the BMC just hangs and we have to power
> cycle the chassis.
> 
> We could not find the root cause, but it's likely related to a patch:
> https://lore.kernel.org/openbmc/20201221223225.14723-2-jae.hyun.yoo@linux.intel.com/
> If we revert the patch, the issue could not be reproduced anymore.
> 
> On Thu, Sep 22, 2022 at 6:09 AM Tao Ren <rentao.bupt at gmail.com> wrote:
> >
> > Hi there,
> >
> > Recently I noticed a few Wedge400 (AST2520A2) units stuck after "reboot"
> > command. It's hard to reproduce (affecting ~1 out of 1,000 units), but
> > once it happens, I have to power cycle the chassis to recover OpenBMC.
> >
> > I checked aspeed_wdt.c and manually played with watchdog registers, but
> > everything looks normal to me. Did anyone hit the similar error before?
> > Any suggestions which area I should look into?
> >
> > Below are the last few lines of logs before OpenBMC hangs:
> >
> > bmc-oob login:
> > INIT: Sending processes configured via /etc/inittab the TERM signal
> > Stopping OpenBSD Secure Shell server: sshdstopped /usr/sbin/sshd (pid 7397 1189)
> > Stopping ntpd: done
> > stopping rsyslogd ... done
> > Stopping random number generator daemon.
> > Deconfiguring network interfaces... done.
> > Sending all processes the TERM signal...
> > rackmond[1747]: Got request exit[  528.383133] watchdog: watchdog0: watchdog did not stop!
> > Sending all processes the KILL signal...
> > Unmounting remote filesystems...
> > Deactivating swap...
> > Unmounting local filesystems...
> > Rebooting... [  529.725009] reboot: Restarting system
> >
> >
> > Cheers,
> >
> > Tao
> 
> 
> 
> -- 
> BRs,
> Lei YU


More information about the openbmc mailing list