HPT allocation failures on POWER8 KVM hosts

Roman Bolshakov r.bolshakov at yadro.com
Fri Dec 13 11:33:05 AEDT 2019


On Mon, Nov 18, 2019 at 02:42:42PM +0300, Roman Bolshakov wrote:
> On Mon, Nov 18, 2019 at 01:02:00PM +1100, Daniel Axtens wrote:
> > Hi Roman,
> > 
> > > We're running a lot of KVM virtual machines on POWER8 hosts and
> > > sometimes new VMs can't be started because there are no contiguous
> > > regions for HPT because of CMA region fragmentation.
> > >
> > > The issue is covered in the LWN article: https://lwn.net/Articles/684611/
> > > The article points that you raised the problem on LSFMM 2016. However I
> > > couldn't find a follow up article on the issue.
> > >
> > > Looking at the kernel commit log I've identified a few commits that
> > > might reduce CMA fragmentaiton and overcome HPT allocation failure:
> > >   - bd2e75633c801 ("dma-contiguous: use fallback alloc_pages for single pages")
> > >   - 678e174c4c16a ("powerpc/mm/iommu: allow migration of cma allocated
> > >     pages during mm_iommu_do_alloc")
> > >   - 9a4e9f3b2d739 ("mm: update get_user_pages_longterm to migrate pages allocated from
> > >     CMA region")
> > >   - d7fefcc8de914 ("mm/cma: add PF flag to force non cma alloc")
> > >
> > > Are there any other commits that address the issue? What is the first
> > > kernel version that shouldn't have the HPT allocation problem due to CMA
> > > fragmentation?
> > 
> > I've had some success increasing the CMA allocation with the
> > kvm_cma_resv_ratio boot parameter - see
> > arch/powerpc/kvm/book3s_hv_builtin.c
> > 
> > The default is 5%. In a support case in a former job we had a customer
> > who increased this to I think 7 or 8% and saw the symptoms subside
> > dramatically.
> > 
> 
> Hi Daniel,
> 
> Thank you, I'll try to increase kvm_cma_resv_ratio for now, but even 5%
> CMA reserve should be more than enough, given the size of HPT as 1/128th
> of VM max memory.
> 
> For a 16GB RAM VM without balloon device, only 128MB is going to be
> reserved for HPT using CMA. So, 5% CMA reserve should allow to provision
> VMs with over 1.5TB of RAM on 256GB RAM host. In other words the default
> CMA reserve allows to overprovision 6 times more memory for VMs than
> presented on a host.
> 
> We rarely add balloon device and sometimes don't add it at all. Therefore
> I'm still looking for commits that would help to avoid the issue with
> the default CMA reserve.
> 

FWIW, I have noticed the following. My host has 4 NUMA nodes with 4 CPUs
per node, only one of the nodes have CMA pages and only two of the nodes
have memory according to /proc/zoneinfo. The error can be reliably
reproduced if I attempt to place vCPUs on the node with CMA pages.

Roman


More information about the Linuxppc-dev mailing list