[Skiboot] [PATCH] fast-reboot: move pci_reset error handling into fast-reboot code

Sat Feb 3 17:16:28 AEDT 2018

* Nicholas Piggin <npiggin at gmail.com> [2018-02-03 15:36:19]:

> On Fri, 2 Feb 2018 23:35:05 +0530
> Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com> wrote:
> 
> > * Nicholas Piggin <npiggin at gmail.com> [2018-02-02 15:46:24]:
> > 
> > > pci_reset() currently does a platform reboot if it fails. It
> > > should not know about fast-reboot at this level, so instead have
> > > it return an error, and the fast reboot caller will do the
> > > platform reboot.
> > > 
> > > The code essentially does the same thing, but flexibility is
> > > improved. Ideally the fast reboot code should perform pci_reset
> > > and all such fail-able operations before the CPU resets itself
> > > and destroys its own stack. That's not the case now, but that
> > > should be the goal.
> > > 
> > > Signed-off-by: Nicholas Piggin <npiggin at gmail.com>  
> > 
> > Acked-by: Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com>
> > 
> > > ---
> > >  core/fast-reboot.c | 13 ++++++++++++-
> > >  core/pci.c         | 11 +++++------
> > >  include/pci.h      |  2 +-
> > >  3 files changed, 18 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/core/fast-reboot.c b/core/fast-reboot.c
> > > index 1c76c0891..382e781ae 100644
> > > --- a/core/fast-reboot.c
> > > +++ b/core/fast-reboot.c
> > > @@ -348,7 +348,18 @@ void __noreturn fast_reboot_entry(void)
> > >  	}
> > >  
> > >  	/* Remove all PCI devices */
> > > -	pci_reset();
> > > +	if (pci_reset()) {
> > > +		prlog(PR_NOTICE, "RESET: Fast reboot failed to reset PCI\n");
> > > +
> > > +		/*
> > > +		 * Can't return to caller here because we're past no-return.
> > > +		 * Attempt an IPL here which is what the caller would do.
> > > +		 */
> > > +		if (platform.cec_reboot)
> > > +			platform.cec_reboot();
> > > +		for (;;)
> > > +			;
> > > +	}  
> > 
> > The code change improves error handling.
> > 
> > 
> > >  
> > >  	ipmi_set_fw_progress_sensor(IPMI_FW_PCI_INIT);
> > >  
> > > diff --git a/core/pci.c b/core/pci.c
> > > index 0809521f8..494a33a45 100644
> > > --- a/core/pci.c
> > > +++ b/core/pci.c
> > > @@ -1668,7 +1668,7 @@ static void __pci_reset(struct list_head *list)
> > >  	}
> > >  }
> > >  
> > > -void pci_reset(void)
> > > +int64_t pci_reset(void)
> > >  {
> > >  	unsigned int i;
> > >  	struct pci_slot *slot;
> > > @@ -1695,11 +1695,9 @@ void pci_reset(void)
> > >  				rc = slot->ops.run_sm(slot);
> > >  			}
> > >  			if (rc < 0) {
> > > -				PCIERR(phb, 0, "Complete reset failed, aborting"
> > > -				               "fast reboot (rc=%lld)\n", rc);
> > > -				if (platform.cec_reboot)
> > > -					platform.cec_reboot();
> > > -				while (true) {}
> > > +				PCIERR(phb, 0, "Complete reset failed "
> > > +				               "(rc=%lld)\n", rc);
> > > +				return rc;  
> > 
> > However...
> > 
> > If we abort at the PCI link that fails and return at this point, we
> > are giving up too early. If subsequent links can get re-enabled then
> > fast-reboot can be made to work.
> > 
> > [  141.208697990,7] PHB#0003[0:3]: LINK: Link is stable
> > [  141.208763671,7] PHB#0003[0:3]: LINK: Card [15b3:1013] Degraded Retry:disabled
> > [  141.208832155,7] PHB#0003[0:3]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3
> > [  141.208872220,7] PHB#0003[0:3]: LINK: Width Train:x08 PHB:x16 DEV:x16 *
> > [  141.214051722,3] PCI-SLOT-0000000000000003 Invalid state 00000000
> > [  141.215181558,3] PHB#0003:00:00.0 Complete reset failed (rc=-6)
> > [  141.216418848,5] RESET: Fast reboot failed to reset PCI
> > [  141.217638534,6] IPMI: sending chassis control request 0x03
> > [  141.217676559,6] BT: seq 0x1c netfn 0x00 cmd 0x02: Message sent to host
> > [  141.482616079,7] MBOX-FLASH: Adjusting the window
> > [  141.482676812,7] LPC-MBOX: Sending BMC interrupt
> > [  142.246839786,7] MBOX-FLASH: Adjusting the window
> > [  142.247414703,7] LPC-MBOX: Sending BMC interrupt
> > [  142.349862574,6] BT: seq 0x1c netfn 0x00 cmd 0x02: IPMI MSG done
> > [  143.015709854,6] STB: BOOTKERNEL verified
> > [  143.015768123,7] LPC-MBOX: Sending BMC interrupt
> > [  143.118218291,7] blocklevel_read: 0x0        0x31c03b48      0x30
> > [  143.118283317,7] blocklevel_raw_read: 0x0    0x31c03b48      0x30
> > [  143.118352801,7] MBOX-FLASH: Adjusting the window
> > [  143.118382662,7] LPC-MBOX: Sending BMC interrupt
> > [  143.220835377,7] FFS: Partition map size: 0x1000
> > [  143.221135902,7] blocklevel_read: 0x0        0x30a3f218      0x1000
> > [  143.221166425,7] blocklevel_raw_read: 0x0    0x30a3f218      0x1000
> > [  143.223081918,7] FLASH: No ROOTFS partition
> > 
> > On one of the failing system where one of the PHB link does not train,
> > I am able to skip by continuing this loop and successfully complete
> > the fast-reset.
> > 
> > In this specific case there are no devices attached to that PHB link.
> > We need a way to understand why the link reset fails and skip it if
> > this is not due to failing hardware and proceed with reset/train of
> > all other links and complete the loop.
> 
> Yes, we need to get someone to understand what is happening with the
> PCI device.
> 
> Did you get to the bottom of why we're getting the state machine into
> an invalid state there? Presumably that should not happen even if we
> are getting errors back from the hardware it should not go into unknown
> state?

Not yet, I have two PHB bus/slot that gets us to error our at
different stages.  I do not yet understand the state machine.  Let me
get so more test runs and try to understand and report.

> > 
> > diff --git a/core/pci.c b/core/pci.c
> > index 0809521f..f708b1a0 100644
> > --- a/core/pci.c
> > +++ b/core/pci.c
> > @@ -1695,11 +1695,8 @@ void pci_reset(void)
> >                                 rc = slot->ops.run_sm(slot);
> >                         }
> >                         if (rc < 0) {
> > -                               PCIERR(phb, 0, "Complete reset failed, aborting"
> > +                               PCIERR(phb, 0, "Complete reset failed, CONTINUING "
> >                                                "fast reboot (rc=%lld)\n", rc);
> > -                               if (platform.cec_reboot)
> > -                                       platform.cec_reboot();
> > -                               while (true) {}
> >                         }
> > 
> > The above hack makes fast-reboot work well on one of the test system.
> > We need a way to avoid treating all PCI reset fails as fatal and fallback
> > to full IPL.
> > 
> > I am using fast-reboot for runtime PState (OCC) reload and related
> > tests and hence making fast-reboot work is less than ideal scenarios
> > will be very helpful :)
> 
> Of course for production systems, default always needs to err on the side of
> caution and IPL if we hit some condition we don't understand. But I don't
> think it would be a problem to add an nvram config as an unsupported option
> to ignore as many errors as possible in fast reboot.

Good idea.  We can add a nvram setting to continue-on-error. Or
override the "experimental-fast-reset=feeling-lucky" with something
risky :)

experimental-fast-reset=feeling-adventurous ?

> Are you just using fast reboot to speed up testing cycles?

The use case is as follows: we can provide runtime parameter to OCC
for system test purpose.  OCC needs a soft reboot to make the settings
take effect and init various PState tables.  OPAL+Linux need to
re-read this and this is where fast-reboot helps.

When we fallback to full IPL, OCC is reloaded by hostboot and loses
full context and goes back to power-on settings and hence blows away
the runtime test parameters.  We are indeed building a chain of
soft-reboot at OCC level and then at OPAL+Linux level to test runtime
parameters to OCC and drive those from Linux using tools. 

--Vaidy