[PATCH kernel v4 2/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page
David Gibson
david at gibson.dropbear.id.au
Fri Jul 6 15:13:26 AEST 2018
On Thu, Jul 05, 2018 at 06:01:33PM +1000, Alexey Kardashevskiy wrote:
> A VM which has:
> - a DMA capable device passed through to it (eg. network card);
> - running a malicious kernel that ignores H_PUT_TCE failure;
> - capability of using IOMMU pages bigger that physical pages
> can create an IOMMU mapping that exposes (for example) 16MB of
> the host physical memory to the device when only 64K was allocated to
> the VM.
>
> The remaining 16MB - 64K will be some other content of host memory,
> possibly including pages of the VM, but also pages of host kernel memory,
> host programs or other VMs.
>
> The attacking VM does not control the location of the page it can map,
> and is only allowed to map as many pages as it has pages of RAM.
>
> We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
> an IOMMU page is contained in the physical page so the PCI hardware won't
> get access to unassigned host memory; however this check is missing in
> the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
> did not hit this yet as the very first time when the mapping happens
> we do not have tbl::it_userspace allocated yet and fall back to
> the userspace which in turn calls VFIO IOMMU driver, this fails and
> the guest does not retry,
>
> This stores the smallest preregistered page size in the preregistered
> region descriptor and changes the mm_iommu_xxx API to check this against
> the IOMMU page size. This calculates maximum page size as a minimum of
> the natural region alignment and compound page size.
>
> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
Reviewed-by: David Gibson <david at gibson.dropbear.id.au>
It's certainly better than what we have, though a couple of comments
still:
[snip]
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index abb4364..11e1029 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -27,6 +27,7 @@ struct mm_iommu_table_group_mem_t {
> struct rcu_head rcu;
> unsigned long used;
> atomic64_t mapped;
> + unsigned int pageshift;
> u64 ua; /* userspace address */
> u64 entries; /* number of entries in hpas[] */
> u64 *hpas; /* vmalloc'ed */
> @@ -125,7 +126,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
> {
> struct mm_iommu_table_group_mem_t *mem;
> long i, j, ret = 0, locked_entries = 0;
> - struct page *page = NULL;
> + unsigned int pageshift;
> + struct page *page = NULL, *head = NULL;
>
> mutex_lock(&mem_list_mutex);
>
> @@ -159,6 +161,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
> goto unlock_exit;
> }
>
> + mem->pageshift = __builtin_ctzl(ua | (entries << PAGE_SHIFT));
This could definitely do with a comment saying what this is trying to
calculate.
Explicitly calling a _builtin_ is also kinda horrid. I wrote my
sample thinking of qemu where there's a standard and widely used
ctz64(). Not sure if there's something else we should be using in kernelspace.
> mem->hpas = vzalloc(array_size(entries, sizeof(mem->hpas[0])));
> if (!mem->hpas) {
> kfree(mem);
> @@ -199,9 +202,17 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
> }
> }
> populate:
> + pageshift = PAGE_SHIFT;
> + if (PageCompound(page))
> + pageshift += compound_order(compound_head(page));
So, as I said in reply to the earlier version, I'm not 100% sure there
isn't some way a compound_page could end up mapped unaligned in
userspace (with smaller userspace mappings of a larger underlying
physical page). A WARN_ON() and fallback to assuming pageshift =
PAGE_SHIFT in that case would probably be a good idea.
> + mem->pageshift = min(mem->pageshift, pageshift);
> mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
> }
>
> + /* We have an incomplete huge page, default to PAGE_SHIFT */
> + if (head)
> + mem->pageshift = PAGE_SHIFT;
> +
> atomic64_set(&mem->mapped, 1);
> mem->used = 1;
> mem->ua = ua;
> @@ -349,7 +360,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
> EXPORT_SYMBOL_GPL(mm_iommu_find);
>
> long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
> - unsigned long ua, unsigned long *hpa)
> + unsigned long ua, unsigned int pageshift, unsigned long *hpa)
> {
> const long entry = (ua - mem->ua) >> PAGE_SHIFT;
> u64 *va = &mem->hpas[entry];
> @@ -357,6 +368,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
> if (entry >= mem->entries)
> return -EFAULT;
>
> + if (pageshift > mem->pageshift)
> + return -EFAULT;
> +
> *hpa = *va | (ua & ~PAGE_MASK);
>
> return 0;
> @@ -364,7 +378,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
> EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);
>
> long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
> - unsigned long ua, unsigned long *hpa)
> + unsigned long ua, unsigned int pageshift, unsigned long *hpa)
> {
> const long entry = (ua - mem->ua) >> PAGE_SHIFT;
> void *va = &mem->hpas[entry];
> @@ -373,6 +387,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
> if (entry >= mem->entries)
> return -EFAULT;
>
> + if (pageshift > mem->pageshift)
> + return -EFAULT;
> +
> pa = (void *) vmalloc_to_phys(va);
> if (!pa)
> return -EFAULT;
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 2da5f05..7cd63b0 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container,
> if (!mem)
> return -EINVAL;
>
> - ret = mm_iommu_ua_to_hpa(mem, tce, phpa);
> + ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa);
> if (ret)
> return -EINVAL;
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20180706/2566bbad/attachment-0001.sig>
More information about the Linuxppc-dev
mailing list