[PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
Alexey Kardashevskiy
aik at ozlabs.ru
Tue Nov 20 17:55:41 AEDT 2018
On 20/11/2018 14:51, Sam Bobroff wrote:
> On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
>> Alexey Kardashevskiy <aik at ozlabs.ru> writes:
>>
>>> The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
>>> skiboot's NPU driver does not touch the pci_error_type parameter so
>>> it might have garbage but the powernv code analyzes it nevertheless.
>>>
>>> This initializes pcierr and fstate to zero in all call sites.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>>> ---
>>
>> Can we tag this with a Fixes? And seems like it should probably go to
>> stable, or can we not trigger this path on older kernels?
>>
>> cheers
>
> Hmm, it's triggered by use on an NPU PE so that would be any kernel that
> can run on P8 or later (AFAIK).
>
> It looks like the issue was present earlier, but the code was last
> touched when it was moved, in...
>
> 40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")
>
> ... which was back in v4.1.
The original commit is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c41a7f3f7593fe57
2013-06-20 13:21:09 +0800
===
powerpc/eeh: I/O chip EEH state retrieval
The patch adds I/O chip backend to retrieve the state for the
indicated PE. While the PE state is temperarily unavailable,
the upper layer (powernv platform) should return default delay
(1 second).
===
This was 3.10-rc5 (this is what that sha1's Makefile has).
But this was not the issue till skiboot decided not to zero these
parameters so "Fixes:" should point to what?
>
> Sam.
>
>>> Without this, this happens:
>>>
>>> pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
>>> EEH: PHB#6 failure detected, location: N/A
>>> CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
>>> Call Trace:
>>> [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4 (unreliable)
>>> [c000003fea9dfa00] [c000000000038480] eeh_dev_check_failure+0x1f0/0x5f0
>>> [c000003fea9dfaa0] [c0000000000a2768] pnv_pci_read_config+0x128/0x160
>>> [c000003fea9dfae0] [c0000000005d2b0c] pci_bus_read_config_dword+0x9c/0xf0
>>> [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
>>> [c000003fea9dfbc0] [c0000000005e0730] pci_dev_save_and_disable+0x70/0xa0
>>> [c000003fea9dfbf0] [c0000000005e4078] pci_try_reset_function+0x48/0xc0
>>> [c000003fea9dfc20] [c00800001cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
>>> [c000003fea9dfcf0] [c00800001ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
>>> [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
>>> [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
>>> [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
>>> [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
>>> EEH: Detected error on PHB#6
>>> EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 fail
>>> ures.
>>> EEH: Notify device drivers to shutdown
>>> EEH: Beginning: 'error_detected(IO frozen)'
>>> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
>>> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
>>> EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
>>> EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
>>> EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
>>> EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
>>> EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
>>> EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can recover'
>>> EEH: Collect temporary log
>>> pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
>>> EEH: Reset without hotplug activity
>>> iommu: Removing device 0006:00:01.0 from group 4
>>> iommu: Removing device 0006:00:01.1 from group 4
>>> iommu: Removing device 0006:00:02.0 from group 4
>>> iommu: Removing device 0006:00:02.1 from group 4
>>> pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
>>> pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
>>> EEH: Sleep 5s ahead of partial hotplug
>>> pci 0004:04 : [PE# 00] Setting up window#0 0..3fffffff pg=1000
>>> pci 0004:05 : [PE# 18] Setting up window#0 0..3fffffff pg=1000
>>> pci 0004:06 : [PE# 30] Setting up window#0 0..3fffffff pg=1000
>>> pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff pg=1000
>>> pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff pg=1000
>>> pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff pg=1000
>>> EEH: Beginning: 'slot_reset'
>>> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
>>> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
>>> EEH: Finished:'slot_reset' with aggregate recovery state:'none'
>>> EEH: Notify device driver to resume
>>> EEH: Beginning: 'resume'
>>> EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
>>> EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
>>> EEH: Finished:'resume'
>>> EEH: Recovery successful.
>>> ---
>>> arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
>>> arch/powerpc/platforms/powernv/pci-ioda.c | 4 ++--
>>> arch/powerpc/platforms/powernv/pci.c | 4 ++--
>>> 3 files changed, 8 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> index abc0be7..f380789 100644
>>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>> @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
>>> static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
>>> {
>>> struct pnv_phb *phb = pe->phb->private_data;
>>> - u8 fstate;
>>> - __be16 pcierr;
>>> + u8 fstate = 0;
>>> + __be16 pcierr = 0;
>>> s64 rc;
>>> int result = 0;
>>>
>>> @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
>>> static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
>>> {
>>> struct pnv_phb *phb = pe->phb->private_data;
>>> - u8 fstate;
>>> - __be16 pcierr;
>>> + u8 fstate = 0;
>>> + __be16 pcierr = 0;
>>> s64 rc;
>>> int result;
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index dd80744..72b5cc0 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
>>> static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
>>> {
>>> struct pnv_ioda_pe *slave, *pe;
>>> - u8 fstate, state;
>>> - __be16 pcierr;
>>> + u8 fstate = 0, state;
>>> + __be16 pcierr = 0;
>>> s64 rc;
>>>
>>> /* Sanity check on PE number */
>>> diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
>>> index 13aef23..db230a35 100644
>>> --- a/arch/powerpc/platforms/powernv/pci.c
>>> +++ b/arch/powerpc/platforms/powernv/pci.c
>>> @@ -602,8 +602,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
>>> static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
>>> {
>>> struct pnv_phb *phb = pdn->phb->private_data;
>>> - u8 fstate;
>>> - __be16 pcierr;
>>> + u8 fstate = 0;
>>> + __be16 pcierr = 0;
>>> unsigned int pe_no;
>>> s64 rc;
>>>
>>> --
>>> 2.17.1
>>
--
Alexey
More information about the Linuxppc-dev
mailing list