[PATCH v1 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

Tue Jun 2 05:05:35 AEST 2020

On Mon, Jun 01, 2020 at 05:25:18PM +0530, Bharata B Rao wrote:
> On Sat, May 30, 2020 at 07:27:50PM -0700, Ram Pai wrote:
> > H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly
> > called H_SVM_PAGE_IN for all secure pages.
> 
> I don't think that is quite true. HV doesn't assume anything about
> secure pages by itself.

Yes. Currently, it does not assume anything about secure pages.  But I am
proposing that it should consider all pages (except the shared pages) as
secure pages, when H_SVM_INIT_DONE is called.

In other words, HV should treat all pages; except shared pages, as
secure pages once H_SVM_INIT_DONE is called. And this includes pages
added subsequently through memory hotplug.

Yes. the Ultravisor can explicitly request the HV to move the pages
individually.  But that will slow down the transition too significantly.
It takes above 20min to transition them, for a SVM of size 100G.

With this proposed enhancement, the switch completes in a few seconds.

> 
> > These GFNs continue to be
> > normal GFNs associated with normal PFNs; when infact, these GFNs should
> > have been secure GFNs associated with device PFNs.
> 
> Transition to secure state is driven by SVM/UV and HV just responds to
> hcalls by issuing appropriate uvcalls. SVM/UV is in the best position to
> determine the required pages that need to be moved into secure side.
> HV just responds to it and tracks such pages as device private pages.
> 
> If SVM/UV doesn't get in all the pages to secure side by the time
> of H_SVM_INIT_DONE, the remaining pages are just normal (shared or
> otherwise) pages as far as HV is concerned.  Why should HV assume that
> SVM/UV didn't ask for a few pages and hence push those pages during
> H_SVM_INIT_DONE?

By definition, SVM is a VM backed by secure pages.
Hence all pages(except shared) must turn secure when a VM switches to SVM.

UV is interested in only a certain pages for the VM, which it will
request explicitly through H_SVM_PAGE_IN.  All other pages, need not
be paged-in through UV_PAGE_IN.  They just need to be switched to
device-pages.

> 
> I think UV should drive the movement of pages into secure side both
> of boot-time SVM memory and hot-plugged memory. HV does memslot
> registration uvcall when new memory is plugged in, UV should explicitly
> get the required pages in at that time instead of expecting HV to drive
> the same.
> 
> > +static int uv_migrate_mem_slot(struct kvm *kvm,
> > +		const struct kvm_memory_slot *memslot)
> > +{
> > +	unsigned long gfn = memslot->base_gfn;
> > +	unsigned long end;
> > +	bool downgrade = false;
> > +	struct vm_area_struct *vma;
> > +	int i, ret = 0;
> > +	unsigned long start = gfn_to_hva(kvm, gfn);
> > +
> > +	if (kvm_is_error_hva(start))
> > +		return H_STATE;
> > +
> > +	end = start + (memslot->npages << PAGE_SHIFT);
> > +
> > +	down_write(&kvm->mm->mmap_sem);
> > +
> > +	mutex_lock(&kvm->arch.uvmem_lock);
> > +	vma = find_vma_intersection(kvm->mm, start, end);
> > +	if (!vma || vma->vm_start > start || vma->vm_end < end) {
> > +		ret = H_STATE;
> > +		goto out_unlock;
> > +	}
> > +
> > +	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > +			  MADV_UNMERGEABLE, &vma->vm_flags);
> > +	downgrade_write(&kvm->mm->mmap_sem);
> > +	downgrade = true;
> > +	if (ret) {
> > +		ret = H_STATE;
> > +		goto out_unlock;
> > +	}
> > +
> > +	for (i = 0; i < memslot->npages; i++, ++gfn) {
> > +		/* skip paged-in pages and shared pages */
> > +		if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL) ||
> > +			kvmppc_gfn_is_uvmem_shared(gfn, kvm))
> > +			continue;
> > +
> > +		start = gfn_to_hva(kvm, gfn);
> > +		end = start + (1UL << PAGE_SHIFT);
> > +		ret = kvmppc_svm_migrate_page(vma, start, end,
> > +			(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> > +
> > +		if (ret)
> > +			goto out_unlock;
> > +	}
> 
> Is there a guarantee that the vma you got for the start address remains
> valid for all the addresses till end in a memslot? If not, you should
> re-get the vma for the current address in each iteration I suppose.

mm->mmap_sem  is the semaphore that guards the vma. right?  If that
semaphore is held, can the vma change?

Thanks for your comments,
RP