[PATCH v2 2/7] powerpc/kernel: Add uevents in EEH error/resume

Juan Alvarez jjalvare at linux.vnet.ibm.com
Thu Dec 21 14:04:10 AEDT 2017


On 12/18/17 10:59 PM, Russell Currey wrote:

> On Mon, 2017-12-18 at 22:50 -0600, Bjorn Helgaas wrote:
>> [+cc Keith, Gabriele, Dongdong]
>>
>> On Mon, Dec 18, 2017 at 04:38:03PM -0600, Bryant G. Ly wrote:
>>> Devices can go offline when EEH is reported. This patch adds
>>> a change to the kernel object and lets udev know of error.
>>> When device resumes a change is also set reporting device as
>>> online. Therefore, EEH events are better propagated to user
>>> space for devices in powerpc arch.
>> I'm on vacation and can't review this in detail, but I wonder if you
>> can compare this with the uevents we emit for DPC, AER, and hotplug
>> events (if any).  I hope we don't end up with userspace having to be
>> aware of the differences between EEH, DPC, AER, etc.
>>
>> From a very quick look, I only see a few uevents even mentioned in
>> drivers/pci: KOBJ_ADD in __pci_hp_register() and KOBJ_CHANGE in the
>> SR-IOV code.  I'm worried that we're missing some important uevents
>> in
>> the PCI core.  

The only place where I see the KOBJ_REMOVE being used is when the device is 
removed in pci_destroy_dev -> device_del whic will be called implicitly
in permanent failure path of EEH code

>> That's not an argument against what you're doing here;
>> it just would be nice to fill in any missing pieces in the core also,
>> and hopefully make them consistent with these EEH events.
> I don't think this needs to be particularly complex, could we get away
> with events for when devices do the following?
>
> - begin recovery
> - successfully recover
> - fail recovery

If there are no objections in the on going review of this patch 
I can change them to these names:

  - BEGIN_RECOVERY
  - SUCCESSFUL_RECOVERY
  - FAILED_RECOVERY

>
> It might be worthwhile sorting out some consistent, non-EEH-specific
> naming, and then other device error recovery systems can do the same
> later.
>
Do you have a more consistent naming in mind for these events? 

- Juan



More information about the Linuxppc-dev mailing list