[PATCH v6 25/37] cxlflash: Fix to prevent EEH recovery failure

Tomas Henzl thenzl at redhat.com
Sat Oct 24 00:54:10 AEDT 2015


On 21.10.2015 22:14, Matthew R. Ochs wrote:
> The process_sense() routine can perform a read capacity which
> can take some time to complete. If an EEH occurs while waiting
> on the read capacity, the EEH handler will wait to obtain the
> context's mutex in order to put the context in an error state.
> The EEH handler will sit and wait until the context is free,
> but this wait can potentially last forever (deadlock) if the
> scsi_execute() that performs the read capacity experiences a
> timeout and calls into the reset callback. When that occurs,
> the reset callback sees that the device is already being reset
> and waits for the reset to complete. This leaves two threads
> waiting on the other.
>
> To address this issue, make the context unavailable to new,
> non-system owned threads and release the context while calling
> into process_sense(). After returning from process_sense() the
> context mutex is reacquired and the context is made available
> again. The context can be safely moved to the error state if
> needed during the unavailable window as no other threads will
> hold its reference.
>
> Signed-off-by: Matthew R. Ochs <mrochs at linux.vnet.ibm.com>
> Signed-off-by: Manoj N. Kumar <manoj at linux.vnet.ibm.com>
> Reviewed-by: Brian King <brking at linux.vnet.ibm.com>
> Reviewed-by: Daniel Axtens <dja at axtens.net>

Reviewed-by: Tomas Henzl <thenzl at redhat.com>

Tomas



More information about the Linuxppc-dev mailing list