[PATCH] powerpc/pseries/eeh: Fix get PE state translation
Narayana Murty N
nnmlinux at linux.ibm.com
Thu Dec 12 18:57:05 AEDT 2024
On 22/11/24 1:26 AM, Ritesh Harjani (IBM) wrote:
> Vaibhav Jain <vaibhav at linux.ibm.com> writes:
>
>> Hi Ritesh,
>>
>> Thanks for looking into this patch. My responses on behalf of Narayana
>> below:
>>
>> "Ritesh Harjani (IBM)" <ritesh.list at gmail.com> writes:
>>
>>> Narayana Murty N <nnmlinux at linux.ibm.com> writes:
>>>
>>>> The PE Reset State "0" obtained from RTAS calls
>>>> ibm_read_slot_reset_[state|state2] indicates that
>>>> the Reset is deactivated and the PE is not in the MMIO
>>>> Stopped or DMA Stopped state.
>>>>
>>>> With PE Reset State "0", the MMIO and DMA is allowed for
>>>> the PE.
>>> Looking at the PAPR spec - I do agree that it states the same. i.e.
>>> The "0" Initial PE state means the "Not Reset", "Load/Store allowed" &
>>> "DMA allowed" (Normal Operations).
>>>
>>>> The function pseries_eeh_get_state() is currently
>>>> not indicating that to the caller because of which the
>>>> drivers are unable to resume the MMIO and DMA activity.
>>> It's new to me, but could you help explain the user visible effect
>>> of what gets broken. Since this looks like pseries_eeh_get_state() has
>>> always been like this when it got first implemented.
>>> Is there also a unit test somewhere which you are testing?
>> Without this patch a userspace process performing VFIO EEH-Recovery wont
>> get the correct indication that EEH recovery is completed. Test code at
>> [2] has an example test case that uses VFIO to inject an EEH error on to
>> a pci-device and then waits on it to reach 'EEH_PE_STATE_NORMAL' state
>> . That state is never reached without this patch.
>>
>> [2] :
>> https://github.com/nnmwebmin/vfio-ppc-tests/commit/006d8fdc41a4
>>
> Right. Thanks for helping with that test code. It's much clearer now. So
> after the error inject and/or the PE hot reset, the PE is never reaching
> it's normal state. That is due to this kernel bug in the pseries EEH
> handling, where it fails to advertise the MMIO & DMA enabled capability
> flag back to the caller. This therefore can cause the userspace VFIO
> driver to incorrectly assume that MMIO/DMA operations cannot be done.
>
>>> IIUC eeh_pe_get_state() was implemented[1] for supporting EEH for VFIO PCI
>>> devices. i.e. the VFIO_EEH_PE_GET_STATE operation of VFIO EEH PE ioctl op
>>> uses pseries_eeh_get_state() helper to query PE state on pseries LPAR.
>>> So are you suggesting that EEH functionality for VFIO PCI device was
>>> never enabled/tested before on pseries?
>> VFIO-EEH had been broken for pseries for a quite some time and was
>> recently fixed in kernel. So this issue was probably not discovered
>> until recently when we started testing with userspace VFIO.
>>
> ohk right, then maybe we might have started testing it after the eeh
> error inject op was implemented for pseries here [1].
>
> [1]: https://lore.kernel.org/linuxppc-dev/20240909140220.529333-1-nnmlinux@linux.ibm.com/#t
>
>>> [1]: https://lore.kernel.org/all/1402364517-28561-3-git-send-email-gwshan@linux.vnet.ibm.com/
>>>
>>> Checking the powernv side of implementation I do see that it does
>>> enables the EEH_STATE_[MMIO|DMA]_ENABLED flags in the result mask for
>>> the callers. So doing the same for pseries eeh get state implementation
>>> does look like the right thing to do here IMO.
>>>
>>>> The patch fixes that by reflecting what is actually allowed.
>>> You say this is "fixes" so I am also assuming you are also looking for
>>> stable backports of this? If yes - could you please also add the "Fixes"
>>> tag and cc stable?
>> Yes, agree will re-send adding the fixes tag.
>>
> Yes and maybe let's also add some more context & information to the
> commit message from this discussion.
>
> -ritesh
yes Ritesh, added fixes tag and send it to the stable branch also.
>
>>> -ritesh
>>>
>>>> Signed-off-by: Narayana Murty N <nnmlinux at linux.ibm.com>
>>>> ---
>>>> arch/powerpc/platforms/pseries/eeh_pseries.c | 6 ++++--
>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> index 1893f66371fa..b12ef382fec7 100644
>>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> @@ -580,8 +580,10 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *delay)
>>>>
>>>> switch(rets[0]) {
>>>> case 0:
>>>> - result = EEH_STATE_MMIO_ACTIVE |
>>>> - EEH_STATE_DMA_ACTIVE;
>>>> + result = EEH_STATE_MMIO_ACTIVE |
>>>> + EEH_STATE_DMA_ACTIVE |
>>>> + EEH_STATE_MMIO_ENABLED |
>>>> + EEH_STATE_DMA_ENABLED;
>>>> break;
>>>> case 1:
>>>> result = EEH_STATE_RESET_ACTIVE |
>>>> --
>>>> 2.45.2
>> --
>> Cheers
>> ~ Vaibhav
Thank you Ritesh and Vaibhav for reviewing the patch. Please find the
changes addressed in
https://lore.kernel.org/all/20241212075044.10563-1-nnmlinux@linux.ibm.com/.
Regards,
Narayana Murty N
More information about the Linuxppc-dev
mailing list