The Power9 host booting problem with OpenBMC kernel 5.7.x

Alexander A. Filippov a.filippov at yadro.com
Wed Aug 12 23:59:32 AEST 2020


On Wed, Aug 12, 2020 at 08:56:16AM +0000, Joel Stanley wrote:
> Thanks for the response. I've merged the two threads, and I have a
> candidate for a fix.
> 
> On Tue, 11 Aug 2020 at 18:33, Alexander A. Filippov
> <a.filippov at yadro.com> wrote:
> > With the kerenl 5.8 the host is still not booting.
> > I've checked on both machines and they have very different results:
> >  - On the machine with two CPUs the issue is still reproduced.
> >    I see no difference, neither in the behavior, nor in the logs.
> >  - On the machine with one CPU the failure happens due the PNOR flash.
> >    It looks like this:
> 
> >
> > I've noticed that the kernel 5.8 detect the flash driver incorrectly:
> > mx25l51245g instead of mx66l51235f.
> > It happens on both machines and I don't understand why it leads to the problems
> > on only one of them.
> 
> I found upstream v5.8 has a regression in the spi-nor driver on
> aspeed. I've put a revert of the patch that caused the regression on
> the list, but it requires some more investigation to find a proper
> fix:
> 
>  https://patchwork.ozlabs.org/project/openbmc/patch/20200812035847.2352277-1-joel@jms.id.au/
> 

Yes, this solves the problem with the flash drives.
They are still reported other model names, but work properly.


> On Tue, 11 Aug 2020 at 11:54, Artem Senichev <artemsen at gmail.com> wrote:
> > > My guess is it's something to do with the timekeeping, irq or rcu
> > > code. All areas of complexity!
> > >
> >
> > We had similar behaviour in P8 when tried to use ColdFire FSI:
> > https://github.com/openbmc/openbmc/issues/3433
> >
> > In this issue, htop shows 100% load of one CPU on the host and it is not an OS
> > task. Looks like FSI doesn't stop working and fully loads one core.
> 
> I think we have an issue with the irq polarity of the vuart device.
> Did you notice an excessive number of lpc_serirq interrupts on the
> host (check /proc/interrupts)?

You are right, lpc_serirq_mux1 is 183507008 after the host OS has just booted.

> 
> Try doing this on your BMC before booting your host:
> 
> root at bmc:~# echo 0 >
> /sys/devices/platform/ahb/ahb:apb/1e787000.serial/sirq_polarity
>

Yes, after this the both hosts work properly.

Thanks for your help.

> If that fixes it we can make a change to the device tree to make the
> setting permanent.
> 
> Cheers,
> 
> Joel

--
Regards,
Alexander


More information about the openbmc mailing list