[PATCH v2 09/30] cxlflash: Fix to stop interrupt processing on remove

Matthew R. Ochs mrochs at linux.vnet.ibm.com
Tue Sep 22 07:58:53 AEST 2015


> On Sep 21, 2015, at 6:33 AM, Tomas Henzl <thenzl at redhat.com> wrote:
> On 19.9.2015 01:26, Matthew R. Ochs wrote:
>>> On Sep 18, 2015, at 6:59 AM, Tomas Henzl <thenzl at redhat.com> wrote:
>>> On 17.9.2015 19:16, Matthew R. Ochs wrote:
>>>>> On Sep 17, 2015, at 7:38 AM, Tomas Henzl <thenzl at redhat.com> wrote:
>>>>> 
>>>>> On 16.9.2015 18:53, Matthew R. Ochs wrote:
>>>>>> Interrupt processing can run in parallel to a remove operation. This
>>>>>> can lead to a condition where the interrupt handler is processing with
>>>>>> memory that has been freed.
>>>>>> 
>>>>>> To avoid processing an interrupt while memory may be yanked, check for
>>>>>> removal while in the interrupt handler. Bail when removal is imminent.
>>>>>> 
>>>>>> Signed-off-by: Matthew R. Ochs <mrochs at linux.vnet.ibm.com>
>>>>>> Signed-off-by: Manoj N. Kumar <manoj at linux.vnet.ibm.com>
>>>>>> ---
>>>>>> drivers/scsi/cxlflash/common.h |  2 ++
>>>>>> drivers/scsi/cxlflash/main.c   | 21 +++++++++++++++------
>>>>>> 2 files changed, 17 insertions(+), 6 deletions(-)
>>>>>> 
>>>>>> diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
>>>>>> index 1abe4e0..03d2cc6 100644
>>>>>> --- a/drivers/scsi/cxlflash/common.h
>>>>>> +++ b/drivers/scsi/cxlflash/common.h
>>>>>> @@ -103,6 +103,8 @@ struct cxlflash_cfg {
>>>>>> 	enum cxlflash_lr_state lr_state;
>>>>>> 	int lr_port;
>>>>>> 
>>>>>> +	atomic_t remove_active;
>>>>>> +
>>>>>> 	struct cxl_afu *cxl_afu;
>>>>>> 
>>>>>> 	struct pci_pool *cxlflash_cmd_pool;
>>>>>> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
>>>>>> index 6e85c77..89ee648 100644
>>>>>> --- a/drivers/scsi/cxlflash/main.c
>>>>>> +++ b/drivers/scsi/cxlflash/main.c
>>>>>> @@ -892,6 +892,7 @@ static void cxlflash_remove(struct pci_dev *pdev)
>>>>>> 	spin_unlock_irqrestore(&cfg->tmf_waitq.lock, lock_flags);
>>>>>> 
>>>>>> 	cfg->state = STATE_FAILTERM;
>>>>>> +	atomic_inc(&cfg->remove_active);
>>>>> Hi Matthew,
>>>>> you could just call term_afu at this point, this way you don't
>>>>> need an additional check in all irq functions.
>>>>> Cheers,
>>>>> Tomas
>>>> Hi Tomas,
>>>> 
>>>> We actually do call term_afu() a few lines down from here. I don't follow
>>>> how moving it here would help things.
>>> When you disable ints sooner (that is what term_afu does ?) you'll get no
>>> more ints later isn't this what you want?
>> Correct, that's what we want.
>> 
>>>> The reason for the atomic was to provide something lightweight that we
>>>> could check _inside_ the processing loop for the read-response queue
>>>> handler. A check outside that loop doesn't really provide much in terms
>>>> of closing or narrowing down the window of when freed memory can be
>>>> accessed.
>>>> 
>>>> As David Laight correctly pointed out, this approach does not completely
>>>> close the window. We'd need something heavier to fully protect (e.g. a lock
>>>> to wrap around the entire loop). I will look into adding this in a future cycle
>>>> when I can adequately test.
>>> term_afu calls free_irq and this function
>>> does not return until any executing interrupts for have completed.
>>> This is the sync mechanism you need, it's lightweight
>>> (does not add an additional check to your irq functions)
>>> and closes the race window completely.
>> Thanks for clarifying!
>> 
>> I looked at this closer and you are correct, free_irq() guarantees not
>> to return until the interrupt handler has completed. The current location
>> of term_afu() is appropriate as the memory that the handler touches is
>> not freed until the very end [by free_mem() and scsi_host_put()] of the
>> remove. Thus we can simply ignore this patch (I'll remove it in a v3).
> 
> OK. In some future patch please reorganize the remove function so,
> that it follows the template described in Documentation/PCI/pci.txt :
> 	Disable the device from generating IRQs
> 	Release the IRQ (free_irq())
> 	Stop all DMA activity
> 	Release DMA buffers (both streaming and coherent)
> 	Unregister from other subsystems (e.g. scsi or netdev)
> 	Release MMIO/IOP resources
> 	Disable the device

Will do.



More information about the Linuxppc-dev mailing list