[PATCH RFC v4 02/21] PCI: Fix race condition in pci_enable/disable_device()

Sergey Miroshnichenko s.miroshnichenko at yadro.com
Thu Mar 28 04:11:51 AEDT 2019


On 3/26/19 10:00 PM, Bjorn Helgaas wrote:
> [+cc Srinath, Marta, LKML]
> 
> On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote:
>>  CPU0                                      CPU1
>>
>>  pci_enable_device_mem()                   pci_enable_device_mem()
>>    pci_enable_bridge()                       pci_enable_bridge()
>>      pci_is_enabled()
>>        return false;
>>      atomic_inc_return(enable_cnt)
>>      Start actual enabling the bridge
>>      ...                                       pci_is_enabled()
>>      ...                                         return true;
>>      ...                                   Start memory requests <-- FAIL
>>      ...
>>      Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
>>
>> This patch protects the pci_enable/disable_device() and pci_enable_bridge()
>> with mutexes.
> 
> This is a subtle issue that we've tried to fix before, but we've never
> had a satisfactory solution, so I hope you've figured out the right
> fix.
> 
> I'll include some links to previous discussion.  This patch is very
> similar to [2], which we didn't actually apply.  We did apply the
> patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling
> upstream bridges"), but it caused the regressions reported in [4,5],
> so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while
> enabling upstream bridges"").
> 

Thanks for the links, I wasn't aware of these discussions and patches!

On PowerNV this issue is partially hidden by db2173198b95 ("powerpc/powernv/pci: Work
around races in PCI bridge enabling"), and on x86 BIOS pre-initializes all the bridges, so
it doesn't reproduce until hotplugging in a hotplugged bridge.

This patch is indeed similar to 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
bridges"), but instead of a single static mutex it adds per-device mutexes and prevents
the dev->enable_cnt from incrementing too early. So it's not needed anymore to carefully
select a moment safe enough to enable the device.

Serge

> I think the underlying design problem is that we have a driver for
> device B calling pci_enable_device(), and it is changing the state of
> device A (an upstream bridge).  The model generally is that a driver
> should only touch the device it is bound to.
> 
> It's tricky to get the locking right when several children of device A
> all need to operate on A.
> 
> That's all to say I'll have to think carefully about this particular
> patch, so I'll go on to the others and come back to this one.
> 
> Bjorn
> 
> [1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch
> 
> [2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch
> 
> [3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>     [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
> 
> [4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
>     [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently
> 
> [5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@kernel.dk/T/#u
>     iwlwifi firmware load broken in current -git
> 
> [6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
>     [RFC PATCH] nvme: avoid race-conditions when enabling devices
> 
>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko at yadro.com>
>> ---
>>  drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
>>  drivers/pci/probe.c |  1 +
>>  include/linux/pci.h |  1 +
>>  3 files changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index f006068be209..895201d4c9e6 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  	struct pci_dev *bridge;
>>  	int retval;
>>  
>> +	mutex_lock(&dev->enable_mutex);
>> +
>>  	bridge = pci_upstream_bridge(dev);
>>  	if (bridge)
>>  		pci_enable_bridge(bridge);
>> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  	if (pci_is_enabled(dev)) {
>>  		if (!dev->is_busmaster)
>>  			pci_set_master(dev);
>> +		mutex_unlock(&dev->enable_mutex);
>>  		return;
>>  	}
>>  
>> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
>>  		pci_err(dev, "Error enabling bridge (%d), continuing\n",
>>  			retval);
>>  	pci_set_master(dev);
>> +	mutex_unlock(&dev->enable_mutex);
>>  }
>>  
>>  static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  {
>>  	struct pci_dev *bridge;
>> +	/* Enable-locking of bridges is performed within the pci_enable_bridge() */
>> +	bool need_lock = !dev->subordinate;
>>  	int err;
>>  	int i, bars = 0;
>>  
>> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
>>  	}
>>  
>> -	if (atomic_inc_return(&dev->enable_cnt) > 1)
>> +	if (need_lock)
>> +		mutex_lock(&dev->enable_mutex);
>> +	if (pci_is_enabled(dev)) {
>> +		if (need_lock)
>> +			mutex_unlock(&dev->enable_mutex);
>>  		return 0;		/* already enabled */
>> +	}
>>  
>>  	bridge = pci_upstream_bridge(dev);
>>  	if (bridge)
>> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
>>  			bars |= (1 << i);
>>  
>>  	err = do_pci_enable_device(dev, bars);
>> -	if (err < 0)
>> -		atomic_dec(&dev->enable_cnt);
>> +	if (err >= 0)
>> +		atomic_inc(&dev->enable_cnt);
>> +	if (need_lock)
>> +		mutex_unlock(&dev->enable_mutex);
>>  	return err;
>>  }
>>  
>> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev)
>>  	if (dr)
>>  		dr->enabled = 0;
>>  
>> +	mutex_lock(&dev->enable_mutex);
>>  	dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
>>  		      "disabling already-disabled device");
>>  
>> -	if (atomic_dec_return(&dev->enable_cnt) != 0)
>> +	if (atomic_dec_return(&dev->enable_cnt) != 0) {
>> +		mutex_unlock(&dev->enable_mutex);
>>  		return;
>> +	}
>>  
>>  	do_pci_disable_device(dev);
>>  
>>  	dev->is_busmaster = 0;
>> +
>> +	mutex_unlock(&dev->enable_mutex);
>>  }
>>  EXPORT_SYMBOL(pci_disable_device);
>>  
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 2ec0df04e0dc..977a127ce791 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
>>  	INIT_LIST_HEAD(&dev->bus_list);
>>  	dev->dev.type = &pci_dev_type;
>>  	dev->bus = pci_bus_get(bus);
>> +	mutex_init(&dev->enable_mutex);
>>  
>>  	return dev;
>>  }
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 77448215ef5b..cb2760a31fe2 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -419,6 +419,7 @@ struct pci_dev {
>>  	unsigned int	no_vf_scan:1;		/* Don't scan for VFs after IOV enablement */
>>  	pci_dev_flags_t dev_flags;
>>  	atomic_t	enable_cnt;	/* pci_enable_device has been called */
>> +	struct mutex	enable_mutex;
>>  
>>  	u32		saved_config_space[16]; /* Config space saved at suspend time */
>>  	struct hlist_head saved_cap_space;
>> -- 
>> 2.20.1
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20190327/7688ede2/attachment-0001.sig>


More information about the Linuxppc-dev mailing list