[Skiboot] [PATCH v2 2/5] fast-reboot: creset PHBs on fast reboot

Mon Jan 16 19:32:16 AEDT 2017

On 16/01/17 10:20, Gavin Shan wrote:
> On Fri, Jan 13, 2017 at 04:09:38PM +1100, Andrew Donnellan wrote:
>> On fast reboot, perform a creset of all PHBs. This ensures that any PHBs
>> that are fenced will be working after the reboot.
>>
>> A later patch will disable CAPI mode during cresets - as such, PHBs in CAPI
>> mode will return to regular PCIe mode during a fast reboot.
>>
>> Signed-off-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
>>
>> ---
>>
>> This slows down fast reboot a bit. Could we do this in parallel? Should we
>> limit it only to fenced/CAPI mode PHBs? (Resetting all PHBs might make
>> things slightly more reliable, but probably only slightly...)
>> ---
>> core/pci.c | 20 ++++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>>
>> diff --git a/core/pci.c b/core/pci.c
>> index 05fd80f..bef42b4 100644
>> --- a/core/pci.c
>> +++ b/core/pci.c
>> @@ -1497,6 +1497,7 @@ static void __pci_reset(struct list_head *list)
>> void pci_reset(void)
>> {
>> 	unsigned int i;
>> +	struct pci_slot *slot;
>>
>> 	prlog(PR_NOTICE, "PCI: Clearing all devices...\n");
>>
>> @@ -1508,6 +1509,25 @@ void pci_reset(void)
>> 		if (!phb)
>> 			continue;
>> 		__pci_reset(&phb->devices);
>
> PCINOTICE(phb, 0, ...) indicates the message applies to root port,
> not PHB.

OK, will fix

>
>> +
>> +		slot = phb->slot;
>> +		if (!slot || !slot->ops.creset) {
>> +			PCINOTICE(phb, 0, "Can't do complete reset\n");
>> +		} else {
>> +			int rc = slot->ops.creset(slot);
>
> slot->ops.create() returns error or delay whose data type is int64_t.

Good catch, will fix

>
>> +			while (rc > 0) {
>> +				time_wait(rc);
>> +				rc = slot->ops.poll(slot);
>> +			}
>> +			if (rc < 0) {
>> +				PCIERR(phb, 0, "Complete reset failed, aborting"
>> +				       	       "fast reboot\n");
>
> It might be worthy to print @rc.

Will add.

>
>> +				if (platform.cec_reboot)
>> +					platform.cec_reboot();
>> +				while (true) {}
>
> Rebooting the system in failing case seems not good idea. To
> remove the failing PHB to avoid probing on it afterwards would
> be better.

Hmm, I'm not entirely sure here - I see your point, though on the other 
hand, if a PHB has been left in a bad state somehow, and a user attempts 
to reboot, they're probably going to want a full IPL. Stewart, any thoughts?

Thanks for the review!

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com  IBM Australia Limited