cxl: fix nested locking hang during EEH hotplug
patch-notifications at ellerman.id.au
Mon Feb 27 21:11:01 AEDT 2017
On Mon, 2017-02-06 at 01:07:17 UTC, Andrew Donnellan wrote:
> Commit 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU
> not configured") introduced a rwsem to fix an invalid memory access that
> occurred when someone attempts to access the config space of an AFU on a
> vPHB whilst the AFU is deconfigured, such as during EEH recovery.
> It turns out that it's possible to run into a nested locking issue when EEH
> recovery fails and a full device hotplug is required.
> cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on
> configured_rwsem. When EEH recovery fails, the EEH code calls
> pci_hp_remove_devices() to remove the device, which in turn calls
> cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries
> to grab the writer lock that's already held.
> Standard rwsem semantics don't express what we really want to do here and
> don't allow for nested locking. Fix this by replacing the rwsem with an
> atomic_t which we can control more finely. Allow the AFU to be locked
> multiple times so long as there are no readers.
> Fixes: 14a3ae34bfd0 ("cxl: Prevent read/write to AFU config space while AFU not configured")
> Cc: stable at vger.kernel.org # v4.9+
> Signed-off-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
> Acked-by: Frederic Barrat <fbarrat at linux.vnet.ibm.com>
Applied to powerpc next, thanks.
More information about the Linuxppc-dev