[Skiboot] [PATCH v2 4/5] hw/phb3: add host sync notifier to trigger creset/CAPP disable on kexec

Mon Jan 16 19:30:21 AEDT 2017

On 16/01/17 10:51, Gavin Shan wrote:
>> +	switch (slot->state) {
>> +	case PHB3_SLOT_NORMAL:
>> +		lock(&capi_lock);
>> +		rc = (chip->capp_phb3_attached_mask & (1 << p->index)) ?
>> +			OPAL_PHB_CAPI_MODE_CAPI :
>> +			OPAL_PHB_CAPI_MODE_PCIE;
>> +		unlock(&capi_lock);
>> +
>> +		if (rc == OPAL_PHB_CAPI_MODE_PCIE)
>> +			return true;
>> +
>> +		PHBINF(p, "PHB in CAPI mode, resetting\n");
>> +		p->flags &= ~PHB3_CAPP_RECOVERY;
>> +		phb3_creset(slot);
>> +		return false;
>> +	default:
>> +		rc = slot->ops.poll(slot);
>> +		return rc == OPAL_SUCCESS;
>
> The code seems incorrect. In opal_sync_host_reboot(), OPAL_BUSY_EVENT
> is returned to Linux if any notifier returns false. Linux delays 10ms
> and call opal_sync_host_reboot() again, meaning all notifiers will be
> triggered (including phb3_host_sync_reset()) again.
>
> When error is returned from slot->ops.poll(slot), the PHB might be put
> to non_PHB3_SLOT_NORMAL state. false is returned on the error. This
> function is called again and jump to default case. It's obviously not
> what we want. So the code would be:
>
> 		return rc <= OPAL_SUCCESS;

Good catch, will investigate further tomorrow.

>
> Anyway, I think opal_sync_host_reboot() needs enhancement to avoid
> triggering all notifier on failure from any of them. It's something
> out of scope though.

opal_sync_host_reboot() wasn't exactly designed with "failure" in mind - 
the reason I describe this as an abuse of the host sync notifier system 
is that the other notifiers just flush buffers rather than doing 
anything complex like this.

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com  IBM Australia Limited