[PATCH 2/8] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag

Sam Bobroff sbobroff at linux.ibm.com
Mon Apr 8 16:50:36 AEST 2019


On Wed, Mar 20, 2019 at 05:02:57PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 20/03/2019 13:58, Sam Bobroff wrote:
> > The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
> > use of driver callbacks in drivers that have been bound part way
> > through the recovery process. This is necessary to prevent later stage
> > handlers from being called when the earlier stage handlers haven't,
> > which can be confusing for drivers.
> 
> The flag is used from eeh_pe_report()->eeh_pe_report_edev which is
> called many times from eeh_handle_normal_event() (and you clear the flag
> here unconditionally) and once from eeh_handle_special_event() - so this
> is actually the only case now when the flag matters. Is my understanding
> correct? Also is not clearing the flag correct in that case? I do not
> quite understand eeh_handle_normal_event vs. eeh_handle_special_event
> business though.

I'm not sure I fully understand your question, but here's the situation:

* EEH is detected on a PCI device that has no driver bound but there is
  a driver that COULD bind.
* eeh_handle_normal_event() follows the "EEH: Reset with hotplug
  activity" path because the device doesn't (currently) have a driver
  that supports EEH.
* eeh_reset_device() removes the device (pci_hp_remove_devices()).
* eeh_reset_device() re-discovers the device with pci_hp_add_devices().
* As part of re-discovery the PCI subsystem will bind the available driver.
* eeh_handle_normal_event() calls eeh_report_resume() (via eeh_pe_report()).

If the (newly bound) driver has a resume() handler, then
eeh_report_resume() will call it and AFAIK this will cause a problem for
some drivers because their error_detected() handler wasn't called first.

The fix for this is to have eeh_add_device_late() set EEH_DEV_NO_HANDLER
so that we can detect that the device has been added DURING recovery,
and avoid calling it's handlers later.

I see what you mean about the eeh_handle_special_event() case, where
EEH_DEV_NO_HANDLER isn't cleared before calling eeh_pe_report(), and I
think it's a bug! I'll fix it in the next version.

(Cleaning up that flag is on my list. I don't think it's a very good
solution.)

> > 
> > However, the flag is set for all devices that are added after boot
> > time and only cleared at the end of the EEH recovery process. This
> > results in hot plugged devices erroneously having the flag set during
> > the first recovery after they are added (causing their driver's
> > handlers to be incorrectly ignored).
> > 
> > To remedy this, clear the flag at the beginning of recovery
> > processing. The flag is still cleared at the end of recovery
> > processing, although it is no longer really necessary.
> 
> Then may be remove that redundant clearing?

I don't really mind either way; clearing it when we are finished with
recovery seems "cleaner" to me but it doesn't have any function. (In
any case, I think I will eventually want to remove it.)

> > 
> > Signed-off-by: Sam Bobroff <sbobroff at linux.ibm.com>
> > ---
> >  arch/powerpc/kernel/eeh_driver.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index 6f3ee30565dd..4c34b9901f15 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> >  		result = PCI_ERS_RESULT_DISCONNECT;
> >  	}
> >  
> > +	eeh_for_each_pe(pe, tmp_pe)
> > +		eeh_pe_for_each_dev(tmp_pe, edev, tmp)
> > +			edev->mode &= ~EEH_DEV_NO_HANDLER;
> > +
> >  	/* Walk the various device drivers attached to this slot through
> >  	 * a reset sequence, giving each an opportunity to do what it needs
> >  	 * to accomplish the reset.  Each child gets a report of the
> > 
> 
> -- 
> Alexey
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20190408/0bc4981b/attachment.sig>


More information about the Linuxppc-dev mailing list