[PATCH v2 2/7] powerpc/kernel: Add uevents in EEH error/resume

Russell Currey ruscur at russell.cc
Tue Dec 19 15:59:36 AEDT 2017


On Mon, 2017-12-18 at 22:50 -0600, Bjorn Helgaas wrote:
> [+cc Keith, Gabriele, Dongdong]
> 
> On Mon, Dec 18, 2017 at 04:38:03PM -0600, Bryant G. Ly wrote:
> > Devices can go offline when EEH is reported. This patch adds
> > a change to the kernel object and lets udev know of error.
> > When device resumes a change is also set reporting device as
> > online. Therefore, EEH events are better propagated to user
> > space for devices in powerpc arch.
> 
> I'm on vacation and can't review this in detail, but I wonder if you
> can compare this with the uevents we emit for DPC, AER, and hotplug
> events (if any).  I hope we don't end up with userspace having to be
> aware of the differences between EEH, DPC, AER, etc.
> 
> From a very quick look, I only see a few uevents even mentioned in
> drivers/pci: KOBJ_ADD in __pci_hp_register() and KOBJ_CHANGE in the
> SR-IOV code.  I'm worried that we're missing some important uevents
> in
> the PCI core.  That's not an argument against what you're doing here;
> it just would be nice to fill in any missing pieces in the core also,
> and hopefully make them consistent with these EEH events.

I don't think this needs to be particularly complex, could we get away
with events for when devices do the following?

- begin recovery
- successfully recover
- fail recovery

It might be worthwhile sorting out some consistent, non-EEH-specific
naming, and then other device error recovery systems can do the same
later.

- Russell

> 
> > Signed-off-by: Bryant G. Ly <bryantly at linux.vnet.ibm.com>
> > Signed-off-by: Juan J. Alvarez <jjalvare at linux.vnet.ibm.com>


More information about the Linuxppc-dev mailing list