[Lguest] PAE and lguest
Rusty Russell
rusty at rustcorp.com.au
Thu May 28 17:23:47 EST 2009
On Thu, 28 May 2009 02:18:33 am Matias Zabaljauregui wrote:
> Rusty, some updated news about pae patch:
> > That's excellent news! I'd be happy to apply it with the NX support
> > caveat (we can, if we have to, just disable it in the guest bootup code
> > somehow for the moment), but I too would like to know what the issue is
> > with sysenter.
>
> 1) mapping the switcher with PAGE_KERNEL_EXEC both in host and guest
> solves the need for noexec=off in host kernel params
Oops, we should have been doing that anyway of course!
Good catch,
> 2) trapping and emulating rdmsr and wrmsr solved the rest of the problems I
> had. when kernel is compiled with PAE, set_nx() reads and write EFER.NXE
> and this was generating a GPF not handled by lguest.
Right, or we can use paravirt ops in the guest to override them, which I'd
prefer if we can.
> I'd like to know if you think this is an acceptable solution regarding NX
> for THIS version of PAE patch: I just clear NX feature from cpuid when
> guest invokes function 80000001, so guest won't even try to use NX bit.
Yep, that makes sense. I've done it for other bits in the past.
> My main concern now (beside all the bugs you guys will find, so actually
> that's not a concern anymore :) ) is performance. Just one example :
>
> +void guest_set_pmd(struct lguest *lg, unsigned long pmdp, u32 idx)
> +{
> + guest_pagetable_clear_all(&lg->cpus[0]);
> +}
Correctness first, performance second. And it's usually best to have some
benchmark to indicate whether the extra complexity is worth it.
> Finally, Rusty I'd like to know your opinion about Jeremy's suggestion
> in relation with hypercall names used:
> (This is not a PAE subject exclusively)
> http://ozlabs.org/pipermail/lguest/2009-May/001502.html
He raises a good point. We should rename the existing hcall in a patch which
precedes PAE support.
> --- a/arch/x86/lguest/boot.c
> +++ b/arch/x86/lguest/boot.c
> @@ -375,8 +375,12 @@ static void lguest_cpuid(unsigned int *ax, unsigned
> int *bx, case 1: /* Basic feature request. */
> /* We only allow kernel to see SSE3, CMPXCHG16B and SSSE3 */
> *cx &= 0x00002201;
> - /* SSE, SSE2, FXSR, MMX, CMOV, CMPXCHG8B, TSC, FPU. */
> + /* SSE, SSE2, FXSR, MMX, CMOV, CMPXCHG8B, TSC, FPU, PAE. */
> +#ifdef CONFIG_X86_PAE
> + *dx &= 0x07808151;
> +#else
> *dx &= 0x07808111;
> +#endif
Since PAE is a compile-time requirement for Linux, I think we can just
advertise it all the time and avoid the #ifdef here.
> @@ -395,6 +399,13 @@ static void lguest_cpuid(unsigned int *ax, unsigned
> int *bx, if (*ax > 0x80000008)
> *ax = 0x80000008;
> break;
> +#ifdef CONFIG_X86_PAE
> + case 0x80000001:
> + /* Here we should fix nx cap depending on host. */
> + /* For this version of PAE, we just clear NX bit. */
> + *dx &= ~(1 << 20);
> + break;
> +#endif
Probably don't need #ifdef here either. Linux only looks for it when PAE is
enabled, but that's not really a concern here.
> static void lguest_set_pte_at(struct mm_struct *mm, unsigned long addr,
> pte_t *ptep, pte_t pteval)
> {
> - *ptep = pteval;
> + native_set_pte(ptep, pteval);
> lguest_pte_update(mm, addr, ptep);
> }
Many of these are really nice cleanups: a separate patch for them would help
simplify the transition as well.
> index a3d3cba..8f63845 100644
> --- a/drivers/lguest/Kconfig
> +++ b/drivers/lguest/Kconfig
> @@ -1,6 +1,6 @@
> config LGUEST
> tristate "Linux hypervisor example code"
> - depends on X86_32 && EXPERIMENTAL && !X86_PAE && FUTEX
> + depends on X86_32 && EXPERIMENTAL && FUTEX
Hmm, I don't think we depend on futex any more. But we do depend on
eventfd, so I've fixed that in one of my patches.
> @@ -171,7 +233,7 @@ static void release_pte(pte_t pte)
> /* Remember that get_user_pages_fast() took a reference to the page, in
> * get_pfn()? We have to put it back now. */
> if (pte_flags(pte) & _PAGE_PRESENT)
> - put_page(pfn_to_page(pte_pfn(pte)));
> + put_page(pte_page(pte));
> }
> /*:*/
>
Another nice cleanup.
> /*H:450 If we chase down the release_pgd() code, it looks like this: */
> -static void release_pgd(struct lguest *lg, pgd_t *spgd)
> +static void release_pgd(pgd_t *spgd)
> {
This one too!
> @@ -752,21 +1087,21 @@ static __init void
> populate_switcher_pte_page(unsigned int cpu,
>
> /* The first entries are easy: they map the Switcher code. */
> for (i = 0; i < pages; i++) {
> - pte[i] = mk_pte(switcher_page[i],
> - __pgprot(_PAGE_PRESENT|_PAGE_ACCESSED));
> + native_set_pte(&pte[i], mk_pte(switcher_page[i],
> + __pgprot(_PAGE_PRESENT|_PAGE_ACCESSED)));
> }
>
> /* The only other thing we map is this CPU's pair of pages. */
> i = pages + cpu*2;
>
> /* First page (Guest registers) is writable from the Guest */
> - pte[i] = pfn_pte(page_to_pfn(switcher_page[i]),
> - __pgprot(_PAGE_PRESENT|_PAGE_ACCESSED|_PAGE_RW));
> + native_set_pte(&pte[i], pfn_pte(page_to_pfn(switcher_page[i]),
> + __pgprot(_PAGE_PRESENT|_PAGE_ACCESSED|_PAGE_RW)));
>
> /* The second page contains the "struct lguest_ro_state", and is
> * read-only. */
> - pte[i+1] = pfn_pte(page_to_pfn(switcher_page[i+1]),
> - __pgprot(_PAGE_PRESENT|_PAGE_ACCESSED));
> + native_set_pte(&pte[i+1],pfn_pte(page_to_pfn(switcher_page[i+1]),
> + __pgprot(_PAGE_PRESENT|_PAGE_ACCESSED)) );
> }
More good cleanups...
Nice to have a cleanup patch which does all the preparation work (or maybe
several patches), then the one which actually allows PAE.
Thanks!
Rusty.
More information about the Lguest
mailing list