[Skiboot] [PATCH] hw/phb3: improve handling of PHB init failure

Andrew Donnellan andrew.donnellan at au1.ibm.com
Fri Oct 21 18:07:47 AEDT 2016


On 21/10/16 17:27, Gavin Shan wrote:
>> diff --git a/hw/phb3.c b/hw/phb3.c
>> index 1c09ffe..cdfc38a 100644
>> --- a/hw/phb3.c
>> +++ b/hw/phb3.c
>> @@ -38,7 +38,7 @@
>> /* Enable this to disable error interrupts for debug purposes */
>> #undef DISABLE_ERR_INTS
>>
>> -static void phb3_init_hw(struct phb3 *p, bool first_init);
>> +static int64_t phb3_init_hw(struct phb3 *p, bool first_init);
>>
>> #define PHBDBG(p, fmt, a...)	prlog(PR_DEBUG, "PHB#%04x: " fmt, \
>> 				      (p)->phb.opal_id, ## a)
>> @@ -2481,6 +2481,7 @@ static int64_t phb3_creset(struct pci_slot *slot)
>> {
>> 	struct phb3 *p = phb_to_phb3(slot->phb);
>> 	uint64_t cqsts, val;
>> +	int64_t rc;
>>
>> 	switch (slot->state) {
>> 	case PHB3_SLOT_NORMAL:
>> @@ -2545,7 +2546,9 @@ static int64_t phb3_creset(struct pci_slot *slot)
>> 		 */
>> 		p->flags &= ~PHB3_AIB_FENCED;
>> 		p->flags &= ~PHB3_CAPP_RECOVERY;
>> -		phb3_init_hw(p, false);
>> +		rc = phb3_init_hw(p, false);
>> +		if (rc)
>> +			goto error;
>
> There are two users: recovery on fenced PHB or frozen PE behind root
> bus or root port [A], kdump kernel [B]. For [A], the change looks ok.
> For [B], the PHB has been seen by kernel but it is left in uninitialized
> state. It's likely not working when kernel tries to do PCI probing,
> memory transaction, DMA etc. if I'm correct enough.

This should leave the PHB state in PHB3_STATE_BROKEN, so a lot of things 
will fail and return OPAL_HARDWARE.

I'm not sure what the problem is here though... without this change, the 
PHB is *still* not initialised properly...

>
>>
>> 		pci_slot_set_state(slot, PHB3_SLOT_CRESET_FRESET);
>> 		return pci_slot_set_sm_timeout(slot, msecs_to_tb(100));
>> @@ -4023,7 +4026,7 @@ static int64_t phb3_fixup_pec_inits(struct phb3 *p)
>> 	return 0;
>> }
>>
>> -static void phb3_init_hw(struct phb3 *p, bool first_init)
>> +static int64_t phb3_init_hw(struct phb3 *p, bool first_init)
>> {
>> 	uint64_t val;
>>
>> @@ -4221,11 +4224,12 @@ static void phb3_init_hw(struct phb3 *p, bool first_init)
>>
>> 	PHBDBG(p, "Initialization complete\n");
>>
>> -	return;
>> +	return OPAL_SUCCESS;
>>
>>  failed:
>> 	PHBERR(p, "Initialization failed\n");
>> 	p->state = PHB3_STATE_BROKEN;
>> +	return OPAL_HARDWARE;
>> }
>>
>> static void phb3_allocate_tables(struct phb3 *p)
>> @@ -4413,6 +4417,7 @@ static void phb3_create(struct dt_node *np)
>> 	struct proc_chip *chip;
>> 	int opal_id;
>> 	char *path;
>> +	int64_t rc;
>>
>> 	assert(p);
>>
>> @@ -4531,7 +4536,9 @@ static void phb3_create(struct dt_node *np)
>> 	register_irq_source(&phb3_lsi_irq_ops, p, p->base_lsi, 8);
>>
>> 	/* Get the HW up and running */
>> -	phb3_init_hw(p, true);
>> +	rc = phb3_init_hw(p, true);
>> +	if (rc)
>> +		return;
>
> I'm not sure about fast-reboot. We're experiencing the code path
> when skiboot boots. Ahead of this point, the PHB has been exposed
> to upper layer (skiboot/core/pci.c). It means the upper layer can
> start passing transactions to the PHB which might be in unitialized
> state. It turns to be same question as above :-)

Hmm, I hadn't thought about fast reboot just yet...

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com  IBM Australia Limited



More information about the Skiboot mailing list