[Skiboot] [PATCH] hw/phb3: improve handling of PHB init failure

Gavin Shan gwshan at linux.vnet.ibm.com
Fri Oct 21 17:27:55 AEDT 2016


On Fri, Oct 21, 2016 at 04:51:46PM +1100, Andrew Donnellan wrote:
>PHB initialisation in phb3_init_hw() can fail for a number of reasons, such
>as DLP reset timeout, root complex config space init failure, etc. When
>this occurs, an error message is printed to the console and the PHB's state
>is set to PHB3_STATE_BROKEN.
>
>However, currently both phb3_create() and phb3_creset() will continue on
>trying to set up the PHB or reset it respectively. This doesn't seem like a
>particularly good idea.
>
>Make phb3_init_hw() return OPAL_HARDWARE on failure, and make phb3_creset()
>and phb3_create() abort appropriately on failure.
>

[cc stable tag removed]

>Signed-off-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
>

Andrew, I remove the stable label to avoid it's parsed to inappropriate mail
address when you're going to reply this thread :)

>---
>
>Gavin: in phb3_create() is returning before the platform.pci_setup_phb()
>call the right thing to do in this case?
>---
> hw/phb3.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
>diff --git a/hw/phb3.c b/hw/phb3.c
>index 1c09ffe..cdfc38a 100644
>--- a/hw/phb3.c
>+++ b/hw/phb3.c
>@@ -38,7 +38,7 @@
> /* Enable this to disable error interrupts for debug purposes */
> #undef DISABLE_ERR_INTS
>
>-static void phb3_init_hw(struct phb3 *p, bool first_init);
>+static int64_t phb3_init_hw(struct phb3 *p, bool first_init);
>
> #define PHBDBG(p, fmt, a...)	prlog(PR_DEBUG, "PHB#%04x: " fmt, \
> 				      (p)->phb.opal_id, ## a)
>@@ -2481,6 +2481,7 @@ static int64_t phb3_creset(struct pci_slot *slot)
> {
> 	struct phb3 *p = phb_to_phb3(slot->phb);
> 	uint64_t cqsts, val;
>+	int64_t rc;
>
> 	switch (slot->state) {
> 	case PHB3_SLOT_NORMAL:
>@@ -2545,7 +2546,9 @@ static int64_t phb3_creset(struct pci_slot *slot)
> 		 */
> 		p->flags &= ~PHB3_AIB_FENCED;
> 		p->flags &= ~PHB3_CAPP_RECOVERY;
>-		phb3_init_hw(p, false);
>+		rc = phb3_init_hw(p, false);
>+		if (rc)
>+			goto error;

There are two users: recovery on fenced PHB or frozen PE behind root
bus or root port [A], kdump kernel [B]. For [A], the change looks ok.
For [B], the PHB has been seen by kernel but it is left in uninitialized
state. It's likely not working when kernel tries to do PCI probing,
memory transaction, DMA etc. if I'm correct enough. 

>
> 		pci_slot_set_state(slot, PHB3_SLOT_CRESET_FRESET);
> 		return pci_slot_set_sm_timeout(slot, msecs_to_tb(100));
>@@ -4023,7 +4026,7 @@ static int64_t phb3_fixup_pec_inits(struct phb3 *p)
> 	return 0;
> }
>
>-static void phb3_init_hw(struct phb3 *p, bool first_init)
>+static int64_t phb3_init_hw(struct phb3 *p, bool first_init)
> {
> 	uint64_t val;
>
>@@ -4221,11 +4224,12 @@ static void phb3_init_hw(struct phb3 *p, bool first_init)
>
> 	PHBDBG(p, "Initialization complete\n");
>
>-	return;
>+	return OPAL_SUCCESS;
>
>  failed:
> 	PHBERR(p, "Initialization failed\n");
> 	p->state = PHB3_STATE_BROKEN;
>+	return OPAL_HARDWARE;
> }
>
> static void phb3_allocate_tables(struct phb3 *p)
>@@ -4413,6 +4417,7 @@ static void phb3_create(struct dt_node *np)
> 	struct proc_chip *chip;
> 	int opal_id;
> 	char *path;
>+	int64_t rc;
>
> 	assert(p);
>
>@@ -4531,7 +4536,9 @@ static void phb3_create(struct dt_node *np)
> 	register_irq_source(&phb3_lsi_irq_ops, p, p->base_lsi, 8);
>
> 	/* Get the HW up and running */
>-	phb3_init_hw(p, true);
>+	rc = phb3_init_hw(p, true);
>+	if (rc)
>+		return;

I'm not sure about fast-reboot. We're experiencing the code path
when skiboot boots. Ahead of this point, the PHB has been exposed
to upper layer (skiboot/core/pci.c). It means the upper layer can
start passing transactions to the PHB which might be in unitialized
state. It turns to be same question as above :-)

>
> 	/* Load capp microcode into capp unit */
> 	capp_load_ucode(p);
>-- 
>Andrew Donnellan              OzLabs, ADL Canberra
>andrew.donnellan at au1.ibm.com  IBM Australia Limited
>



More information about the Skiboot mailing list