KVM guests freeze under upstream kernel

joserz at linux.vnet.ibm.com joserz at linux.vnet.ibm.com
Wed Jul 26 23:18:48 AEST 2017


On Thu, Jul 20, 2017 at 10:18:18PM -0300, joserz at linux.vnet.ibm.com wrote:
> On Thu, Jul 20, 2017 at 03:21:59PM +1000, Paul Mackerras wrote:
> > On Thu, Jul 20, 2017 at 12:02:23AM -0300, joserz at linux.vnet.ibm.com wrote:
> > > On Thu, Jul 20, 2017 at 09:42:50AM +1000, Benjamin Herrenschmidt wrote:
> > > > On Wed, 2017-07-19 at 16:46 -0300, joserz at linux.vnet.ibm.com wrote:
> > > > > Hello!
> > > > > 
> > > > > We're not able to boot any KVM guest using upstream kernel (cb8c65ccff7f77d0285f1b126c72d37b2572c865 - 4.13.0-rc1+).
> > > > > After reaching the SLOF initial counting, the guest simply freezes:
> > > > 
> > > > Can you send our .config ?
> > > 
> > > Sure,
> > > 
> > > Answering Michael as well:
> > > 
> > > It's a P9 with RHEL kernel 4.11.0-10.el7a.ppc64le installed. The problem
> > > was noticed with kernel > 4.13 (I'm currently running 4.13.0-rc1+).
> > > 
> > > QEMU is https://github.com/dgibson/qemu (ppc-for-2.10) but I gave the
> > > default packaged Qemu a try.
> > > 
> > > For the guest, I tried both a vanilla Ubuntu 17.04 and the host kernel.
> > > But they had never a chance to run since the freezing happened in SLOF.
> > > 
> > > Note that using the 4.11.0-10.el7a.ppc64le kernel it works fine
> > > (for any of these Qemu/Guest setup). With 4.13.0-rc1 I have it run after
> > > reverting that referred commit.
> > 
> > Is the host kernel running in radix mode?
> 
> yes
> 
> > 
> > Did you check the host kernel logs for any oops messages?
> 
> dmesg was clean but after sometime waiting (I forgot QEMU running in
> another terminal) I got the oops below (after rebooting the host I 
> couldn't reproduce it again).
> 
> Another test that I did was:
> Compile with transparent huge pages disabled: KVM works fine
> Compile with transparent huge pages enabled: doesn't work
>   + disabling it in /sys/kernel/mm/transparent_hugepage: doesn't work
> 
> Just out of my own curiosity I made this small change:
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> b/arch/powerpc/include
> index c0737c8..f94a3b6 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -80,7 +80,7 @@
>  
>   #define _PAGE_SOFT_DIRTY       _RPAGE_SW3 /* software: software dirty
>   tracking 
>    #define _PAGE_SPECIAL          _RPAGE_SW2 /* software: special page */
>    -#define _PAGE_DEVMAP           _RPAGE_SW1 /* software: ZONE_DEVICE page */
>    +#define _PAGE_DEVMAP           _RPAGE_RSV3
>     #define __HAVE_ARCH_PTE_DEVMAP
> 
> and it works. I chose _RPAGE_RSV3 because it uses the same value that
> x86 uses (0x0400000000000000UL) but I don't if it could have any side
> effect
> 

Does this change make any sense to you people?
I didn't see any side effect expect that devices backed memory will have
a bigger address space in transparent huge pages IF I understand that
correctly.

If so I can send a patch with this change.

Thank you!!



More information about the Linuxppc-dev mailing list