[PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
Marta Rybczynska
mrybczyn at kalray.eu
Thu Aug 22 22:37:59 AEST 2019
----- On 16 Aug, 2019, at 18:50, Sergey Miroshnichenko s.miroshnichenko at yadro.com wrote:
> This is a yet another approach to fix an old [1-2] concurrency issue, when:
> - two or more devices are being hot-added into a bridge which was
> initially empty;
> - a bridge with two or more devices is being hot-added;
> - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
>
> The problem is that a bridge is reported as enabled before the MEM/IO bits
> are actually written to the PCI_COMMAND register, so another driver thread
> starts memory requests through the not-yet-enabled bridge:
>
> CPU0 CPU1
>
> pci_enable_device_mem() pci_enable_device_mem()
> pci_enable_bridge() pci_enable_bridge()
> pci_is_enabled()
> return false;
> atomic_inc_return(enable_cnt)
> Start actual enabling the bridge
> ... pci_is_enabled()
> ... return true;
> ... Start memory requests <-- FAIL
> ...
> Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
>
> Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
> similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
> while enabling upstream bridges"), but adding a per-device mutexes and
> preventing the dev->enable_cnt from from incrementing early.
>
> CC: Srinath Mannam <srinath.mannam at broadcom.com>
> CC: Marta Rybczynska <mrybczyn at kalray.eu>
> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko at yadro.com>
>
> [1]
> https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u
> [RFC PATCH v3] pci: Concurrency issue during pci enable bridge
>
> [2]
> https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u
> [RFC PATCH] nvme: avoid race-conditions when enabling devices
> ---
> drivers/pci/pci.c | 26 ++++++++++++++++++++++----
> drivers/pci/probe.c | 1 +
> include/linux/pci.h | 1 +
> 3 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 1b27b5af3d55..e7f8c354e644 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
> struct pci_dev *bridge;
> int retval;
>
> + mutex_lock(&dev->enable_mutex);
> +
> bridge = pci_upstream_bridge(dev);
> if (bridge)
> pci_enable_bridge(bridge);
> @@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
> if (pci_is_enabled(dev)) {
> if (!dev->is_busmaster)
> pci_set_master(dev);
> + mutex_unlock(&dev->enable_mutex);
> return;
> }
>
This code is used by numerous drivers and when we've seen that issue I was wondering
if there are some use-cases when this (or pci_disable_device) is called with interrupts
disabled. It seems that it shouldn't be, but a BUG_ON or error when someone calls
it this way would be helpful when debugging.
Marta
More information about the Linuxppc-dev
mailing list