[PATCH] powerpc/eeh: Delay slot presence check once driver is notified about the pci error.

Mahesh J Salgaonkar mahesh at linux.ibm.com
Thu Nov 25 16:34:23 AEDT 2021


On 2021-11-24 22:57:13 Wed, Oliver O'Halloran wrote:
> On Wed, Nov 24, 2021 at 7:45 PM Mahesh J Salgaonkar
> <mahesh at linux.ibm.com> wrote:
> >
> > No it doesn't. We will still do a presence check before the recovery
> > process starts. This patch moves the check after notifying the driver to
> > stop active I/O operations. If a presence check finds the device isn't
> > present, we will skip the EEH recovery. However, on a surprise hotplug,
> > the user will see the EEH messages on the console before it finds there
> > is nothing to recover.
> 
> Suppressing the spurious EEH messages was part of why I added that
> check in the first place. If you want to defer the presence check
> until later you should move the stack trace printing, etc to after
> we've confirmed there are still devices present. Considering the

That will help suppressing the spurious EEH messages.

> motivation for this patch is to avoid spurious warnings from the
> driver I don't think printing spurious EEH messages is much of an
> improvement.

Agree.

> 
> The other option would be returning an error from the pseries hotplug
> driver. IIRC that's what pnv_php / OPAL does if the PHB is fenced and
> we can't check the slot presence state.

Yeah. I can change rpaphp_get_sensor_state() to use
rtas_get_sensor_fast() variant which will return immediately with an
error on extended busy error. That way we don't need to move the slot
presence check at all. I did test that and it does fix the problem. But
I wasn't sure if that would have any implications on hotplug driver
behaviour. If pnv_php / OPAL does the same thing then this would be a
cleaner approach to fix this issue. Let me send out the patch with this
other option to fix the pseries hotplug driver instead.

Thanks,
-Mahesh.

-- 
Mahesh J Salgaonkar


More information about the Linuxppc-dev mailing list