[PATCH kernel] powerpc/ioda/npu2: Call hot reset skiboot hook when disabling NPU

Alexey Kardashevskiy aik at ozlabs.ru
Sat Jul 14 21:34:50 AEST 2018


On Thu, 12 Jul 2018 11:38:34 +1000
Alistair Popple <alistair at popple.id.au> wrote:

> Hi Alexey,
> 
> On Wednesday, 11 July 2018 7:45:10 PM AEST Alexey Kardashevskiy wrote:
> > On Thu,  7 Jun 2018 17:06:07 +1000
> > Alexey Kardashevskiy <aik at ozlabs.ru> wrote:
> >   
> > > This brings NPU2 in a safe mode when it does not throw HMI if GPU
> > > coherent memory is gone.  
> 
> It might be helpful if you you could describe the problem and what you are
> trying to solve in a bit more depth. Assuming the memory was online how are you
> offlining it?

Fair enough. I am offlining it by simply killing a guest which triggers
GPU PCI reset. Before this, PCI reset would trigger HMI as PTEs were
still in both QEMU and guest pagetables and that would cause
prefetching and thus killing the host.


> If the memory has been online merely fencing/hot-resetting the
> NVLink is likely not sufficient as you also need to flush caches prior to taking
> the links down.

I'd expect the guest driver to take care of this. If this is not enough
and I need to pass some other MMIO (in addition to the ATS/tlb
invalidation thingy which I'll add anyway), then what is it?


> 
> - Alistair
> 
> > > Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>  
> > 
> > 
> > Anyone, ping?
> > 
> >   
> > > ---
> > > 
> > > The main aim for this is nvlink2 pass through, helps a lot.
> > > 
> > > 
> > > ---
> > >  arch/powerpc/platforms/powernv/pci-ioda.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > index 66c2804..29f798c 100644
> > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > @@ -3797,6 +3797,16 @@ static void pnv_pci_release_device(struct pci_dev *pdev)
> > >  		pnv_ioda_release_pe(pe);
> > >  }
> > >  
> > > +void pnv_npu_disable_device(struct pci_dev *pdev)
> > > +{
> > > +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> > > +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> > > +
> > > +	if (eehpe && eeh_ops && eeh_ops->reset) {
> > > +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> > > +	}
> > > +}
> > > +
> > >  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> > >  {
> > >  	struct pnv_phb *phb = hose->private_data;
> > > @@ -3841,6 +3851,7 @@ static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
> > >  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> > >  	.dma_set_mask		= pnv_npu_dma_set_mask,
> > >  	.shutdown		= pnv_pci_ioda_shutdown,
> > > +	.disable_device		= pnv_npu_disable_device,
> > >  };
> > >  
> > >  static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {  
> > 
> > 
> > 
> > --
> > Alexey
> >   
> 
> 



--
Alexey


More information about the Linuxppc-dev mailing list