[PATCH] powerpc/pseries: Avoid context switch in EEH reset if required

Gavin Shan gwshan at linux.vnet.ibm.com
Fri Jan 23 14:50:29 AEDT 2015

On Wed, Jan 21, 2015 at 10:53:38AM +1100, Gavin Shan wrote:
>On Wed, Jan 21, 2015 at 09:56:07AM +1100, Gavin Shan wrote:
>>On Tue, Jan 20, 2015 at 10:28:16AM +0100, Benjamin Herrenschmidt wrote:
>>>On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
>>>> On pseries platform, the EEH reset backend pseries_eeh_reset() can
>>>> be called in atomic context as follows. For this case, we should
>>>> call udelay() instead of msleep() to avoid context switching.
>>>>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>>>>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>>>>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>>>>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()
>>>It's not acceptable to introduce multi-millisecond delays at interrupt
>>>time. In fact, we should generally not use udelay in such context.
>>>I understand that this is an exceptional error handling case but it's
>>>still not right.
>>Yes, I agree it's unsafe to udelay for multi-milliseconds as the queued
>>works in atomic context is expected to be completed as soon as possible.
>>>Are there many other users of pci_set_pcie_reset_state() at interrupt
>>>time ? Can we have a discussion with the PCI folks as to whether that
>>>should be legal or not ?
>>>I'm tempted to require that it's made illegal.
>>Currently, there are 2 drivers calling this function: IPR and misc/genwqe.
>>Also, VFIO would call this function for IBM and Mellanox adapters in PowerKVM
>>repository. For now, IPR driver is the only one call this function in atomic
>>Sure, I'll send one email to confirm with PCI folks. I guess it's illegal
>>to call pci_set_pcie_reset_state() in atomic context. If it's the case,
>>I'm afraid Wendy has to change IPR driver to replace the reset timer with
>>something else (e.g. workqueue).
>Another way is to drop the hold/settle delays for pcibios_set_pcie_reset_state()
>and IPR relies on the timer interval to cover them. Wendy, could you please
>let me know if it would work for you or not?
>    Start reset timer;
>    Timer expires, assert the reset. Restart the timer with assert delay;
>    Timer expires, deassert the reset. Restart the timer with settle delay;
>    Timer expires, ready for subsequent works;

Brian King is the author of pci_set_pcie_reset_state() and as he said, it was
expected to be called in atomic context. So I'm going to come up with another
approach as above, caller of pci_set_pcie_reset_state() should take corresponding 
delays as what IPR driver is doing.

Messages from Brian for reference:

| The API has changed. I wrote the pci_set_pcie_reset_state API originally.
| When this API was put in place initially, it was perfectly legal to call it
| from an atomic context. Can you clarify why we have to have the delay in the
| pci_set_pcie_reset_state function? Shouldn't it be the responsibility of the
| callers to ensure a proper delay is used? This was always the case until
| recently.

So please ignore this patch and I'll send another one, which is implemented in
above approach.


>>>> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
>>>> Tested-by: Wen Xiong<wenxiong at linux.vnet.ibm.com>
>>>> ---
>>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> index a6c7e19..67623a3 100644
>>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>>>   */
>>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>  {
>>>> -	int config_addr;
>>>> -	int ret;
>>>> +	int config_addr, delay, ret;
>>>>  	/* Figure out PE address */
>>>>  	config_addr = pe->config_addr;
>>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>  	/* We need reset hold or settlement delay */
>>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>>>  	    option == EEH_RESET_HOT)
>>>> -		msleep(EEH_PE_RST_HOLD_TIME);
>>>> +		delay = EEH_PE_RST_HOLD_TIME;
>>>> +	else
>>>> +		delay = EEH_PE_RST_SETTLE_TIME;
>>>> +
>>>> +	if (in_atomic())
>>>> +		udelay(delay * 1000);
>>>>  	else
>>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>>>> +		msleep(delay);
>>>>  	return ret;
>>>>  }

More information about the Linuxppc-dev mailing list