[PATCH kernel] powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status

Sam Bobroff sbobroff at linux.ibm.com
Tue Nov 20 14:51:24 AEDT 2018


On Tue, Nov 20, 2018 at 01:51:06PM +1100, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik at ozlabs.ru> writes:
> 
> > The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
> > skiboot's NPU driver does not touch the pci_error_type parameter so
> > it might have garbage but the powernv code analyzes it nevertheless.
> >
> > This initializes pcierr and fstate to zero in all call sites.
> >
> > Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> > ---
> 
> Can we tag this with a Fixes? And seems like it should probably go to
> stable, or can we not trigger this path on older kernels?
> 
> cheers

Hmm, it's triggered by use on an NPU PE so that would be any kernel that
can run on P8 or later (AFAIK).

It looks like the issue was present earlier, but the code was last
touched when it was moved, in...

40ae5f693f6a ("powerpc/powernv: Drop PHB operation get_state()")

... which was back in v4.1.

Sam.

> > Without this, this happens:
> >
> > pnv_eeh_get_phb_diag: Failure -7 getting PHB#6 diag-data
> > EEH: PHB#6 failure detected, location: N/A
> > CPU: 23 PID: 5939 Comm: qemu-system-ppc Not tainted 4.19.0-le_f5a7bb7_aikATfstn1-p1 torvalds#106
> > Call Trace:
> > [c000003fea9df9c0] [c000000000a990ec] dump_stack+0xb0/0xf4 (unreliable)
> > [c000003fea9dfa00] [c000000000038480] eeh_dev_check_failure+0x1f0/0x5f0
> > [c000003fea9dfaa0] [c0000000000a2768] pnv_pci_read_config+0x128/0x160
> > [c000003fea9dfae0] [c0000000005d2b0c] pci_bus_read_config_dword+0x9c/0xf0
> > [c000003fea9dfb40] [c0000000005df3d4] pci_save_state+0x64/0x250
> > [c000003fea9dfbc0] [c0000000005e0730] pci_dev_save_and_disable+0x70/0xa0
> > [c000003fea9dfbf0] [c0000000005e4078] pci_try_reset_function+0x48/0xc0
> > [c000003fea9dfc20] [c00800001cbc1b1c] vfio_pci_ioctl+0x334/0xea0 [vfio_pci]
> > [c000003fea9dfcf0] [c00800001ca9046c] vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> > [c000003fea9dfd10] [c00000000039fd84] do_vfs_ioctl+0xd4/0xa00
> > [c000003fea9dfdb0] [c0000000003a07b4] ksys_ioctl+0x104/0x120
> > [c000003fea9dfe00] [c0000000003a07f8] sys_ioctl+0x28/0x80
> > [c000003fea9dfe20] [c00000000000b3a4] system_call+0x5c/0x70
> > EEH: Detected error on PHB#6
> > EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 fail
> > ures.
> > EEH: Notify device drivers to shutdown
> > EEH: Beginning: 'error_detected(IO frozen)'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: PE#c (PCI 0006:00:01.0): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#c (PCI 0006:00:01.0): vfio-pci driver reports: 'can recover'
> > EEH: PE#c (PCI 0006:00:01.1): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#c (PCI 0006:00:01.1): vfio-pci driver reports: 'can recover'
> > EEH: PE#b (PCI 0006:00:02.0): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#b (PCI 0006:00:02.0): vfio-pci driver reports: 'can recover'
> > EEH: PE#b (PCI 0006:00:02.1): Invoking vfio-pci->error_detected(IO frozen)
> > EEH: PE#b (PCI 0006:00:02.1): vfio-pci driver reports: 'can recover'
> > EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'can recover'
> > EEH: Collect temporary log
> > pnv_pci_dump_phb_diag_data: Unrecognized ioType 0
> > EEH: Reset without hotplug activity
> > iommu: Removing device 0006:00:01.0 from group 4
> > iommu: Removing device 0006:00:01.1 from group 4
> > iommu: Removing device 0006:00:02.0 from group 4
> > iommu: Removing device 0006:00:02.1 from group 4
> > pnv_ioda_freeze_pe: Failure -7 freezing PHB#6-PE#0
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x0 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x1 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x8 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x9 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x10 (-7)
> > pnv_eeh_restore_config: Can't reinit PCI dev 0x11 (-7)
> > EEH: Sleep 5s ahead of partial hotplug
> > pci 0004:04     : [PE# 00] Setting up window#0 0..3fffffff pg=1000
> > pci 0004:05     : [PE# 18] Setting up window#0 0..3fffffff pg=1000
> > pci 0004:06     : [PE# 30] Setting up window#0 0..3fffffff pg=1000
> > pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fffffff pg=1000
> > pci 0006:00:01.0: [PE# 0c] Setting up window 0..3fffffff pg=1000
> > pci 0006:00:02.0: [PE# 0b] Setting up window 0..3fffffff pg=1000
> > EEH: Beginning: 'slot_reset'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: Finished:'slot_reset' with aggregate recovery state:'none'
> > EEH: Notify device driver to resume
> > EEH: Beginning: 'resume'
> > EEH: PE#d (PCI 0006:00:00.0): not actionable (1,1,0)
> > EEH: PE#d (PCI 0006:00:00.1): not actionable (1,1,0)
> > EEH: Finished:'resume'
> > EEH: Recovery successful.
> > ---
> >  arch/powerpc/platforms/powernv/eeh-powernv.c | 8 ++++----
> >  arch/powerpc/platforms/powernv/pci-ioda.c    | 4 ++--
> >  arch/powerpc/platforms/powernv/pci.c         | 4 ++--
> >  3 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > index abc0be7..f380789 100644
> > --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> > +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> > @@ -564,8 +564,8 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
> >  static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
> >  {
> >  	struct pnv_phb *phb = pe->phb->private_data;
> > -	u8 fstate;
> > -	__be16 pcierr;
> > +	u8 fstate = 0;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  	int result = 0;
> >  
> > @@ -603,8 +603,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
> >  static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
> >  {
> >  	struct pnv_phb *phb = pe->phb->private_data;
> > -	u8 fstate;
> > -	__be16 pcierr;
> > +	u8 fstate = 0;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  	int result;
> >  
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index dd80744..72b5cc0 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -604,8 +604,8 @@ static int pnv_ioda_unfreeze_pe(struct pnv_phb *phb, int pe_no, int opt)
> >  static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
> >  {
> >  	struct pnv_ioda_pe *slave, *pe;
> > -	u8 fstate, state;
> > -	__be16 pcierr;
> > +	u8 fstate = 0, state;
> > +	__be16 pcierr = 0;
> >  	s64 rc;
> >  
> >  	/* Sanity check on PE number */
> > diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> > index 13aef23..db230a35 100644
> > --- a/arch/powerpc/platforms/powernv/pci.c
> > +++ b/arch/powerpc/platforms/powernv/pci.c
> > @@ -602,8 +602,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
> >  static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
> >  {
> >  	struct pnv_phb *phb = pdn->phb->private_data;
> > -	u8	fstate;
> > -	__be16	pcierr;
> > +	u8	fstate = 0;
> > +	__be16	pcierr = 0;
> >  	unsigned int pe_no;
> >  	s64	rc;
> >  
> > -- 
> > 2.17.1
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20181120/9e07006d/attachment-0001.sig>


More information about the Linuxppc-dev mailing list