[PATCH] powerpc/eeh: Delay slot presence check once driver is notified about the pci error.
Mahesh J Salgaonkar
mahesh at linux.ibm.com
Thu Nov 25 16:34:23 AEDT 2021
On 2021-11-24 22:57:13 Wed, Oliver O'Halloran wrote:
> On Wed, Nov 24, 2021 at 7:45 PM Mahesh J Salgaonkar
> <mahesh at linux.ibm.com> wrote:
> >
> > No it doesn't. We will still do a presence check before the recovery
> > process starts. This patch moves the check after notifying the driver to
> > stop active I/O operations. If a presence check finds the device isn't
> > present, we will skip the EEH recovery. However, on a surprise hotplug,
> > the user will see the EEH messages on the console before it finds there
> > is nothing to recover.
>
> Suppressing the spurious EEH messages was part of why I added that
> check in the first place. If you want to defer the presence check
> until later you should move the stack trace printing, etc to after
> we've confirmed there are still devices present. Considering the
That will help suppressing the spurious EEH messages.
> motivation for this patch is to avoid spurious warnings from the
> driver I don't think printing spurious EEH messages is much of an
> improvement.
Agree.
>
> The other option would be returning an error from the pseries hotplug
> driver. IIRC that's what pnv_php / OPAL does if the PHB is fenced and
> we can't check the slot presence state.
Yeah. I can change rpaphp_get_sensor_state() to use
rtas_get_sensor_fast() variant which will return immediately with an
error on extended busy error. That way we don't need to move the slot
presence check at all. I did test that and it does fix the problem. But
I wasn't sure if that would have any implications on hotplug driver
behaviour. If pnv_php / OPAL does the same thing then this would be a
cleaner approach to fix this issue. Let me send out the patch with this
other option to fix the pseries hotplug driver instead.
Thanks,
-Mahesh.
--
Mahesh J Salgaonkar
More information about the Linuxppc-dev
mailing list