[Skiboot] [PATCH skiboot] npu2: Allow ATSD for LPAR other than 0
Alistair Popple
alistair at popple.id.au
Wed Dec 12 13:07:45 AEDT 2018
On Wednesday, 12 December 2018 11:52:48 AM AEDT Alexey Kardashevskiy wrote:
> On 12/12/2018 11:31, Alistair Popple wrote:
> > On Tuesday, 11 December 2018 6:07:41 PM AEDT Stewart Smith wrote:
> >> Alexey Kardashevskiy <aik at ozlabs.ru> writes:
> >>> On 11/12/2018 13:53, Alistair Popple wrote:
> >>>> On Wednesday, 5 December 2018 3:52:22 PM AEDT Alexey Kardashevskiy
wrote:
> >>>>> Each XTS MMIO ATSD# register is accompanied by another register -
> >>>>> XTS MMIO ATSD0 LPARID# - which controls LPID filtering for ATSD
> >>>>> transactions.
> >>>>>
> >>>>> When a host system passes a GPU through to a guest, we need to enable
> >>>>> some ATSD for an LPAR. At the moment the host assigns one ATSD to
> >>>>> a NVLink bridge and this maps it to an LPAR when GPU is assigned to
> >>>>> the LPAR. The link number is used for an ATSD index.
> >>>>>
> >>>>> ATSD6&7 stay mapped to the host (LPAR=0) all the time which seems to
> >>>>> be
> >>>>> acceptable price for the simplicity.
> >>>>>
> >>>>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> >>>>> ---
> >>>>>
> >>>>> include/npu2-regs.h | 2 ++
> >>>>> hw/npu2.c | 22 ++++++++++++++++++----
> >>>>> 2 files changed, 20 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/include/npu2-regs.h b/include/npu2-regs.h
> >>>>> index 8273b2b..ae5e225 100644
> >>>>> --- a/include/npu2-regs.h
> >>>>> +++ b/include/npu2-regs.h
> >>>>> @@ -547,6 +547,8 @@ void npu2_scom_write(uint64_t gcid, uint64_t
> >>>>> scom_base,
> >>>>>
> >>>>> #define NPU2_XTS_MMIO_ATSD5_LPARID
NPU2_REG_OFFSET(NPU2_STACK_MISC,
> >>>>>
> >>>>> NPU2_BLOCK_XTS, 0x128) #define
> >>>>> NPU2_XTS_MMIO_ATSD6_LPARID NPU2_REG_OFFSET(NPU2_STACK_MISC,
> >>>>> NPU2_BLOCK_XTS, 0x130) #define
> >>>>> NPU2_XTS_MMIO_ATSD7_LPARID NPU2_REG_OFFSET(NPU2_STACK_MISC,
> >>>>> NPU2_BLOCK_XTS, 0x138) +#define NPU2_XTS_MMIO_ATSD_MSR_HV
> >
> > PPC_BIT(51)
> >
> >>>>> +#define NPU2_XTS_MMIO_ATSD_LPARID PPC_BITMASK(52,63)
> >>>>>
> >>>>> #define NPU2_XTS_BDF_MAP NPU2_REG_OFFSET(NPU2_STACK_MISC,
> >>>>
> >>>> NPU2_BLOCK_XTS,
> >>>>
> >>>>> 0x4000) #define NPU2_XTS_BDF_MAP_VALID PPC_BIT(0)
> >>>>>
> >>>>> #define NPU2_XTS_BDF_MAP_UNFILT PPC_BIT(1)
> >>>>>
> >>>>> diff --git a/hw/npu2.c b/hw/npu2.c
> >>>>> index 41563b4..767306f 100644
> >>>>> --- a/hw/npu2.c
> >>>>> +++ b/hw/npu2.c
> >>>>> @@ -2255,9 +2255,14 @@ static int opal_npu_map_lpar(uint64_t phb_id,
> >>>>> uint64_t bdf, uint64_t lparid, struct phb *phb = pci_get_phb(phb_id);
> >>>>>
> >>>>> struct npu2 *p;
> >>>>> struct npu2_dev *ndev = NULL;
> >>>>>
> >>>>> - uint64_t xts_bdf_lpar, rc = OPAL_SUCCESS;
> >>>>> + uint64_t xts_bdf_lpar, atsd_lpar, rc = OPAL_SUCCESS;
> >>>>>
> >>>>> int i;
> >>>>> int id;
> >>>>>
> >>>>> + static uint64_t atsd_lpar_regs[] = {
> >>>>> + NPU2_XTS_MMIO_ATSD0_LPARID, NPU2_XTS_MMIO_ATSD1_LPARID,
> >>>>> + NPU2_XTS_MMIO_ATSD2_LPARID, NPU2_XTS_MMIO_ATSD3_LPARID,
> >>>>> + NPU2_XTS_MMIO_ATSD4_LPARID, NPU2_XTS_MMIO_ATSD5_LPARID,
> >>>>> + NPU2_XTS_MMIO_ATSD6_LPARID, NPU2_XTS_MMIO_ATSD7_LPARID };
> >>>>>
> >>>>> if (!phb || phb->phb_type != phb_type_npu_v2)
> >>>>>
> >>>>> return OPAL_PARAMETER;
> >>>>>
> >>>>> @@ -2297,11 +2302,20 @@ static int opal_npu_map_lpar(uint64_t phb_id,
> >>>>> uint64_t bdf, uint64_t lparid, xts_bdf_lpar =
> >>>>> SETFIELD(NPU2_XTS_BDF_MAP_LPARID, xts_bdf_lpar, lparid); xts_bdf_lpar
> >>>>> =
> >>>>> SETFIELD(NPU2_XTS_BDF_MAP_LPARSHORT, xts_bdf_lpar, id);
> >>>>>
> >>>>> - /* Need to find an NVLink to send the ATSDs for this device over */
> >>>>> + /*
> >>>>> + * Need to find an NVLink to send the ATSDs for this device over.
> >>>>> + * Also, the host allocates an ATSD per NVLink, enable filtering
> >>>>> now.
> >>>>> + */
> >>>>> + atsd_lpar = SETFIELD(NPU2_XTS_MMIO_ATSD_LPARID, 0, lparid);
> >>>>> + if (!lparid)
> >>>>> + atsd_lpar = SETFIELD(NPU2_XTS_MMIO_ATSD_MSR_HV, atsd_lpar, 1);
> >>>>> +
> >>>>>
> >>>>> for (i = 0; i < p->total_devices; i++) {
> >>>>>
> >>>>> if (p->devices[i].nvlink.gpu_bdfn == bdf) {
> >>>>>
> >>>>> - ndev = &p->devices[i];
> >>>>> - break;
> >>>>> + if (!ndev)
> >>>>> + ndev = &p->devices[i];
> >>>>> + if (i < ARRAY_SIZE(atsd_lpar_regs))
> >>>>> + npu2_write(p, atsd_lpar_regs[i], atsd_lpar);
> >>>>
> >>>> I'm not sure I really like this as ATSD resources should be allocated
> >>>> and
> >>>> passed through to guests by the hypervisor independently of the NVLink
> >>>> devices themselves, and a single ATSD register can serve more than one
> >>>> device.>
> >>>
> >>> Sure it can serve more, and all passed ATSDs are assembled in one vPHB
> >>> property and the guest then does not make distinction between them. I
> >>> just pass maximum 6 of 8 ATSDs but until recently it was just a single
> >>> ATSD per PHB for all devices and yet nobody complained.
> >>>
> >>> The problem here is that I need to mmap() ATSD to QEMU, then map them to
> >>> KVM and then present this somehow via the device tree.
> >>>
> >>> Now, I can do mmap() on a fd such as a VFIO device fd or a IOMMU
> >>> container fd. If I choose the dev fd, VFIO in QEMU can mmap it, store
> >>> the pointer in the QEMU PCI device, and SPAPR vPHB can put atsd property
> >>> in vPHB. If I choose the container fd, then VFIO in QEMU can mmap it but
> >>> SPAPR vPHB has no concept of IOMMU/VFIO groups/containers and I'd rather
> >>> leave like this.
> >>>
> >>>> This also creates an implicit assumption that the order of NVLink
> >>>> devices
> >>>> in p->devices[] matches ATSD register numbers in some way that the
> >>>> hypervisor assumes, which seems like it would be especially brittle as
> >>>> devices[] is essentially unordered.
> >>>
> >>> The order does not matter here, all that matters is that the order does
> >>> not change - any ATSD works with any NVLink bridge on that NPU PHB.
> >>>
> >>>> How does the hypervisor know which of the 8 ATSD registers has been
> >>>> assigned to the LPARID when calling opal_npu_map_lpar?
> >>>
> >>> One ATSD register is exposed to the userspace via mmap() on a
> >>> "corresponding" NVLink bridge VFIO device fd, register index == link
> >>> index. An NVLink device belongs to an IOMMU group. A KVM pointer (i.e.
> >>> LPID) is set to the IOMMU group from an ioctl() called by QEMU.
> >
> > Right, but doesn't the hypervisor have to know which of the 8 ATSD
> > registers to pass through to mmap to the guest? Only one of them gets
> > mapped to the LPAR in opal_npu_map_lpar(), so how do you work out which
> > one to mmap?
>
> You got me there :) It is "ibm,npu-link-index" in the hypervisor/vfio
> but nothing says the index from npu2::devices[index] is the same thing,
> I need this conversion here as "index" is not visible to the host system :-/
Yeah, that is what I was getting at. You could just use dev->link_index (which
is the same as "ibm,npu-link-index") instead of npu2::devices[index]. That
would make me somewhat happier. Theorectically there is nothing in HW that
guarantees there will be a matching ATSD register for every npu-link-index,
but in practice there is and it looks likely that won't change in future so it
should be ok.
> >>>> It seems like you might need a
> >>>> seperate OPAL call for this along the lines of
> >>>> npu2_map_mmio_atsd(atsd_index, lparid).
> >>>
> >>> I do not see what it buys us. I need an allocator on the host or skiboot
> >>> anyway, either static or dynamic, dynamic seems excessive as the
> >>> devices[] order or the number of ATSDs do not change.
> >>
> >> So, of course I spot discussion after having hit the merge
> >> button[1]. Maybe I should care/look closer at all the NPU2 things?
> >
> > I could of course review things in a more timely and consistent manner :-)
> >
> >> (it'd help if there was a sane way to test things that I could plug into
> >> op-test)
> >>
> >> Any recommendation what I should do from here, is keeping the patch okay?
> >
> > I thought it was ok but after re-reading the discussion I'm less convinced
> > again.
>
> Yeah, Stewart needs to remove the patch for now.
>
> > - Alistair
> >
> >> [1] not an actual button.
More information about the Skiboot
mailing list