[RFC PATCH v2] uacce: Add uacce_ctrl misc device

Fri Jan 29 21:09:03 AEDT 2021

> From: Song Bao Hua (Barry Song)
> Sent: Tuesday, January 26, 2021 9:27 AM
> 
> > -----Original Message-----
> > From: Jason Gunthorpe [mailto:jgg at ziepe.ca]
> > Sent: Tuesday, January 26, 2021 2:13 PM
> > To: Song Bao Hua (Barry Song) <song.bao.hua at hisilicon.com>
> > Cc: Wangzhou (B) <wangzhou1 at hisilicon.com>; Greg Kroah-Hartman
> > <gregkh at linuxfoundation.org>; Arnd Bergmann <arnd at arndb.de>;
> Zhangfei Gao
> > <zhangfei.gao at linaro.org>; linux-accelerators at lists.ozlabs.org;
> > linux-kernel at vger.kernel.org; iommu at lists.linux-foundation.org;
> > linux-mm at kvack.org; Liguozhu (Kenneth) <liguozhu at hisilicon.com>;
> chensihang
> > (A) <chensihang1 at hisilicon.com>
> > Subject: Re: [RFC PATCH v2] uacce: Add uacce_ctrl misc device
> >
> > On Mon, Jan 25, 2021 at 11:35:22PM +0000, Song Bao Hua (Barry Song)
> wrote:
> >
> > > > On Mon, Jan 25, 2021 at 10:21:14PM +0000, Song Bao Hua (Barry Song)
> wrote:
> > > > > mlock, while certainly be able to prevent swapping out, it won't
> > > > > be able to stop page moving due to:
> > > > > * memory compaction in alloc_pages()
> > > > > * making huge pages
> > > > > * numa balance
> > > > > * memory compaction in CMA
> > > >
> > > > Enabling those things is a major reason to have SVA device in the
> > > > first place, providing a SW API to turn it all off seems like the
> > > > wrong direction.
> > >
> > > I wouldn't say this is a major reason to have SVA. If we read the
> > > history of SVA and papers, people would think easy programming due
> > > to data struct sharing between cpu and device, and process space
> > > isolation in device would be the major reasons for SVA. SVA also
> > > declares it supports zero-copy while zero-copy doesn't necessarily
> > > depend on SVA.
> >
> > Once you have to explicitly make system calls to declare memory under
> > IO, you loose all of that.
> >
> > Since you've asked the app to be explicit about the DMAs it intends to
> > do, there is not really much reason to use SVA for those DMAs anymore.
> 
> Let's see a non-SVA case. We are not using SVA, we can have
> a memory pool by hugetlb or pin, and app can allocate memory
> from this pool, and get stable I/O performance on the memory
> from the pool. But device has its separate page table which
> is not bound with this process, thus lacking the protection
> of process space isolation. Plus, CPU and device are using
> different address.
> 
> And then we move to SVA case, we can still have a memory pool
> by hugetlb or pin, and app can allocate memory from this pool
> since this pool is mapped to the address space of the process,
> and we are able to get stable I/O performance since it is always
> there. But in this case, device is using the page table of
> process with the full permission control.
> And they are using same address and can possibly enjoy the easy
> programming if HW supports.
> 
> SVA is not doom to work with IO page fault only. If we have SVA+pin,
> we would get both sharing address and stable I/O latency.
> 

Isn't it like a traditional MAP_DMA API (imply pinning) plus specifying 
cpu_va of the memory pool as the iova? 

Thanks
Kevin