[PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()

Marta Rybczynska mrybczyn at kalray.eu
Thu Aug 22 22:37:59 AEST 2019



----- On 16 Aug, 2019, at 18:50, Sergey Miroshnichenko s.miroshnichenko at yadro.com wrote:

> This is a yet another approach to fix an old [1-2] concurrency issue, when:
> - two or more devices are being hot-added into a bridge which was
>   initially empty;
> - a bridge with two or more devices is being hot-added;
> - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
> 
> The problem is that a bridge is reported as enabled before the MEM/IO bits
> are actually written to the PCI_COMMAND register, so another driver thread
> starts memory requests through the not-yet-enabled bridge:
> 
> CPU0                                        CPU1
> 
> pci_enable_device_mem()                     pci_enable_device_mem()
>   pci_enable_bridge()                         pci_enable_bridge()
>     pci_is_enabled()
>       return false;
>     atomic_inc_return(enable_cnt)
>     Start actual enabling the bridge
>     ...                                         pci_is_enabled()
>     ...                                           return true;
>     ...                                     Start memory requests <-- FAIL
>     ...
>     Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
> 
> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
> while enabling upstream bridges"), but adding a per-device mutexes and
> preventing the dev->enable_cnt from from incrementing early.
> 
> CC: Srinath Mannam <srinath.mannam at broadcom.com>
> CC: Marta Rybczynska <mrybczyn at kalray.eu>
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko at yadro.com>
> 
> [1]
> https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
>    [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
> 
> [2]
> https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
>    [RFC PATCH] nvme: avoid race-conditions when enabling devices
> ---
> drivers/pci/pci.c   | 26 ++++++++++++++++++++++----
> drivers/pci/probe.c |  1 +
> include/linux/pci.h |  1 +
> 3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1b27b5af3d55..e7f8c354e644 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
> 	struct pci_dev *bridge;
> 	int retval;
> 
> +	mutex_lock(&dev->enable_mutex);
> +
> 	bridge = pci_upstream_bridge(dev);
> 	if (bridge)
> 		pci_enable_bridge(bridge);
> @@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
> 	if (pci_is_enabled(dev)) {
> 		if (!dev->is_busmaster)
> 			pci_set_master(dev);
> +		mutex_unlock(&dev->enable_mutex);
> 		return;
> 	}
> 

This code is used by numerous drivers and when we've seen that issue I was wondering
if there are some use-cases when this (or pci_disable_device) is called with interrupts
disabled. It seems that it shouldn't be, but a BUG_ON or error when someone calls
it this way would be helpful when debugging.

Marta


More information about the Linuxppc-dev mailing list