[PATCH v3 22/33] KVM: PPC: Book3S HV: Handle page fault for a nested guest

David Gibson david at gibson.dropbear.id.au
Fri Oct 5 12:46:12 AEST 2018


On Thu, Oct 04, 2018 at 07:21:20PM +1000, Paul Mackerras wrote:
> On Wed, Oct 03, 2018 at 03:39:13PM +1000, David Gibson wrote:
> > On Tue, Oct 02, 2018 at 09:31:21PM +1000, Paul Mackerras wrote:
> > > From: Suraj Jitindar Singh <sjitindarsingh at gmail.com>
> > > @@ -367,7 +367,9 @@ struct kvmppc_pte {
> > >  	bool may_write		: 1;
> > >  	bool may_execute	: 1;
> > >  	unsigned long wimg;
> > > +	unsigned long rc;
> > >  	u8 page_size;		/* MMU_PAGE_xxx */
> > > +	u16 page_shift;
> > 
> > It's a bit ugly that this has both page_size and page_shift, which is
> > redundant information AFAICT.  Also, why does page_shift need to be
> > u16 - given that 2^255 bytes is much more than our supported address
> > space, let alone a plausible page size.
> 
> These values are all essentially function outputs, so I don't think
> it's ugly to have the same information in different forms.  I actually
> don't like using the MMU_PAGE_xxx values, because the information in
> the mmu_psize_defs[] array depends on the MMU mode of the host, but
> KVM needs to be able to work with guests in both MMU modes.  More
> generally I don't think it's a good idea that the KVM <-> guest
> interface depends so much on what the host firmware tells us about the
> physical machine we're on.  Thus I'm trying to move away from using
> MMU_PSIZE_xxx values and mmu_psize_defs[] in KVM code.

Fair enough.

> I'll change the type to u8.
> 
> > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > > index bd06a95..ee6f493 100644
> > > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> > > @@ -29,43 +29,16 @@
> > >   */
> > >  static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 };
> > >  
> > > -/*
> > > - * Used to walk a partition or process table radix tree in guest memory
> > > - * Note: We exploit the fact that a partition table and a process
> > > - * table have the same layout, a partition-scoped page table and a
> > > - * process-scoped page table have the same layout, and the 2nd
> > > - * doubleword of a partition table entry has the same layout as
> > > - * the PTCR register.
> > > - */
> > > -int kvmppc_mmu_radix_translate_table(struct kvm_vcpu *vcpu, gva_t eaddr,
> > > -				     struct kvmppc_pte *gpte, u64 table,
> > > -				     int table_index, u64 *pte_ret_p)
> > > +int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr,
> > > +			       struct kvmppc_pte *gpte, u64 root,
> > > +			       u64 *pte_ret_p)
> > >  {
> > >  	struct kvm *kvm = vcpu->kvm;
> > >  	int ret, level, ps;
> > > -	unsigned long ptbl, root;
> > > -	unsigned long rts, bits, offset;
> > > -	unsigned long size, index;
> > > -	struct prtb_entry entry;
> > > +	unsigned long rts, bits, offset, index;
> > >  	u64 pte, base, gpa;
> > >  	__be64 rpte;
> > >  
> > > -	if ((table & PRTS_MASK) > 24)
> > > -		return -EINVAL;
> > > -	size = 1ul << ((table & PRTS_MASK) + 12);
> > > -
> > > -	/* Is the table big enough to contain this entry? */
> > > -	if ((table_index * sizeof(entry)) >= size)
> > > -		return -EINVAL;
> > > -
> > > -	/* Read the table to find the root of the radix tree */
> > > -	ptbl = (table & PRTB_MASK) + (table_index * sizeof(entry));
> > > -	ret = kvm_read_guest(kvm, ptbl, &entry, sizeof(entry));
> > > -	if (ret)
> > > -		return ret;
> > > -
> > > -	/* Root is stored in the first double word */
> > > -	root = be64_to_cpu(entry.prtb0);
> > 
> > This refactoring somewhat obscures the changes directly relevant to
> > the nested guest handling.  Ideally it would be nice to fold some of
> > this into the earlier reworkings.
> 
> True, but given the rapidly approaching merge window, I'm not inclined
> to rework it.

Yeah, ok.

> 
> > > +	if (ret) {
> > > +		/* We didn't find a pte */
> > > +		if (ret == -EINVAL) {
> > > +			/* Unsupported mmu config */
> > > +			flags |= DSISR_UNSUPP_MMU;
> > > +		} else if (ret == -ENOENT) {
> > > +			/* No translation found */
> > > +			flags |= DSISR_NOHPTE;
> > > +		} else if (ret == -EFAULT) {
> > > +			/* Couldn't access L1 real address */
> > > +			flags |= DSISR_PRTABLE_FAULT;
> > > +			vcpu->arch.fault_gpa = fault_addr;
> > > +		} else {
> > > +			/* Unknown error */
> > > +			return ret;
> > > +		}
> > > +		goto resume_host;
> > 
> > This is effectively forwarding the fault to L1, yes?  In which case a
> > different name might be better than the ambiguous "resume_host".
> 
> I'll change it to "forward_to_l1".

Thanks.

> 
> Paul.
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20181005/bb01ddf8/attachment-0001.sig>


More information about the Linuxppc-dev mailing list