[PATCH v4 25/32] cxlflash: Fix to prevent EEH recovery failure
Daniel Axtens
dja at axtens.net
Tue Sep 29 11:25:50 AEST 2015
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
"Matthew R. Ochs" <mrochs at linux.vnet.ibm.com> writes:
> The process_sense() routine can perform a read capacity which
> can take some time to complete. If an EEH occurs while waiting
> on the read capacity, the EEH handler is unable to obtain the
> context's mutex in order to put the context in an error state.
> The EEH handler will sit and wait until the context is free,
> but this wait can last longer than the EEH handler tolerates,
> leading to a failed recovery.
I'm not quite clear on what you mean by the EEH handler timing
out. AFAIK there's nothing in eehd and the EEH core that times out if a
driver doesn't respond - indeed, it's pretty easy to hang eehd with a
misbehaving driver.
Are you referring to your own internal timeouts?
cxlflash_wait_for_pci_err_recovery and anything else that uses
CXLFLASH_PCI_ERROR_RECOVERY_TIMEOUT?
Regards,
Daniel
>
> To address this issue, make the context unavailable to new,
> non-system owned threads and release the context while calling
> into process_sense(). After returning from process_sense() the
> context mutex is reacquired and the context is made available
> again. The context can be safely moved to the error state if
> needed during the unavailable window as no other threads will
> hold its reference.
>
> Signed-off-by: Matthew R. Ochs <mrochs at linux.vnet.ibm.com>
> Signed-off-by: Manoj N. Kumar <manoj at linux.vnet.ibm.com>
> Reviewed-by: Brian King <brking at linux.vnet.ibm.com>
> ---
> drivers/scsi/cxlflash/superpipe.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
> index a6316f5..7283e83 100644
> --- a/drivers/scsi/cxlflash/superpipe.c
> +++ b/drivers/scsi/cxlflash/superpipe.c
> @@ -1787,12 +1787,21 @@ static int cxlflash_disk_verify(struct scsi_device *sdev,
> * inquiry (i.e. the Unit attention is due to the WWN changing).
> */
> if (verify->hint & DK_CXLFLASH_VERIFY_HINT_SENSE) {
> + /* Can't hold mutex across process_sense/read_cap16,
> + * since we could have an intervening EEH event.
> + */
> + ctxi->unavail = true;
> + mutex_unlock(&ctxi->mutex);
> rc = process_sense(sdev, verify);
> if (unlikely(rc)) {
> dev_err(dev, "%s: Failed to validate sense data (%d)\n",
> __func__, rc);
> + mutex_lock(&ctxi->mutex);
> + ctxi->unavail = false;
> goto out;
> }
> + mutex_lock(&ctxi->mutex);
> + ctxi->unavail = false;
> }
>
> switch (gli->mode) {
> --
> 2.1.0
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - https://gpgtools.org
iQIcBAEBCgAGBQJWCeieAAoJEPC3R3P2I92F+hMP/1OdLQCin+kKbOb9qxf952bH
DUAkmEhc0oD7xZFQI8HgDmHRxpes5HHxXtwXFsLgsr8QYG+aOIV568GXIZtTbrl0
aCFMqtKXZ6jVqv5L60r1tgzcWxmWdshMLd1op6t3BwA67nUc5Edcr94ePUyDDLj1
at335wCnxuGxn0kdB0Ud/lbPzTsgDPcuV6tCLy0o4J15KFOyFt9hCjO4nmL/wcIt
kmjyn5SHbdgje+73uaRQnXkli4wDA9x7x6/8wFgLspnOxgMEJgnHmm+HYbOXnHyX
nFFHw9+X2ETUcucVWuKNaFzW1vH+WJDteEZbjS7t7liJIkmIiZSFHyUTtVGdBkl1
FsWswA0pkzuGq94Wsb0nGtNHbsMw+WeWTcTlNN46DMG/wqz75iO3yMGK9MZuddSX
9jUokiM0kQvvfwAoujmvpMCVB4b2oseRRG4/yJ0lKSCcC8kETQTXgVHbT8oLmCdk
rUA0hxbbKzVQsDzw8s5HqYZjqHdLp3sDPeyukPeJl2CNhysrmnyHXpq8XgcLi3op
kbuuiR3z8UH3MW4BDpplnjhZ+5Wyw9cSI57vRF2Kr80NnU+5hBvftNh4rBneeny2
0gCDlPHDvB7Ks9HkcxkK9MW78FTgj50ePofS/dUUod4M9ohDd4MSwRKjpwQ+H3By
jmxnzfvWO/oTlL1D9+2W
=3BcU
-----END PGP SIGNATURE-----
More information about the Linuxppc-dev
mailing list