[PATCH v2 1/3] powerpc/eeh: Ignore error handlers in eeh_pe_reset_and_recover()
David Gibson
david at gibson.dropbear.id.au
Tue Apr 26 15:29:59 AEST 2016
On Fri, Apr 22, 2016 at 11:28:02PM +1000, Gavin Shan wrote:
> The function eeh_pe_reset_and_recover() is used to recover EEH
> error when the passthrough device are transferred to guest and
> backwards, meaning the device's driver is vfio-pci or none.
> When the driver is vfio-pci that provides error_detected() error
> handler only, the handler simply stops the guest and it's not
> expected behaviour. On the other hand, no error handlers will
> be called if we don't have a bound driver.
>
> This ignores all error handlers provided by device driver in
> eeh_pe_reset_and_recover() to avoid the exceptional behaviour.
>
> Fixes: 5cfb20b9 ("powerpc/eeh: Emulate EEH recovery for VFIO devices")
> Cc: stable at vger.kernel.org #v3.18+
> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
> Reviewed-by: Russell Currey <ruscur at russell.cc>
> ---
> arch/powerpc/kernel/eeh_driver.c | 11 +----------
> 1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index fb6207d..1c7d703 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -552,7 +552,7 @@ static int eeh_clear_pe_frozen_state(struct eeh_pe *pe,
>
> int eeh_pe_reset_and_recover(struct eeh_pe *pe)
> {
> - int result, ret;
> + int ret;
>
> /* Bail if the PE is being recovered */
> if (pe->state & EEH_PE_RECOVERING)
> @@ -564,9 +564,6 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
> /* Save states */
> eeh_pe_dev_traverse(pe, eeh_dev_save_state, NULL);
>
> - /* Report error */
> - eeh_pe_dev_traverse(pe, eeh_report_error, &result);
Ok, so after chatting to Gavin, I've made sense of this. The basic
thing here is that eeh_pe_reset_and_recover() should be discarding any
errors from before the reset, not reporting them - the whole point is
that we know things have gone bad, and we want to clear back to a good
state.
> /* Issue reset */
> ret = eeh_reset_pe(pe);
> if (ret) {
> @@ -581,15 +578,9 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
> return ret;
> }
>
> - /* Notify completion of reset */
> - eeh_pe_dev_traverse(pe, eeh_report_reset, &result);
However, it's not clear if removing the report of a reset makes sense.
There are no current users of reset notification IIUC, but if we're
going to remove the reset reporting, we should put that in a separate
patch with its own justification, and remove the other caller as well.
> /* Restore device state */
> eeh_pe_dev_traverse(pe, eeh_dev_restore_state, NULL);
>
> - /* Resume */
> - eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
And I'm not sure if it makes sense to remove the resume notification either.
> /* Clear recovery mode */
> eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20160426/6355304f/attachment.sig>
More information about the Linuxppc-dev
mailing list