[PATCH] powerpc/mm: Handle page table allocation failures

Tue May 14 16:40:43 AEST 2019

"Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
> This fix the below crash that arise due to not handling page table allocation
> failures while allocating hugetlb page table.
>
>  BUG: Kernel NULL pointer dereference at 0x0000001c
>  Faulting instruction address: 0xc000000001d1e58c
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>
>  CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G        W  O      5.1.0-next-20190507-autotest #1
>  NIP:  c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000
>  REGS: c000000004937890 TRAP: 0300   Tainted: G        W  O       (5.1.0-next-20190507-autotest)
>  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22424822  XER: 00000000
>  CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0
>  GPR00: c000000001901a80 c000000004937b20 c000000003938700 0000000000000000
>  GPR04: 0000000000400cc0 000000000003efff 000000027966e000 c000000003ba8700
>  GPR08: c000000003ba8700 000000000d601125 c000000003ba8700 0000000080000000
>  GPR12: 0000000022424822 c00000001ecae280 0000000000000000 0000000000000000
>  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d28 c0000002762da460
>  GPR24: 000000000000001c 0000000000000000 0000000000000001 c000000001901a80
>  GPR28: 0000000000400cc0 0000000000000000 0000000000000000 0000000000400cc0
>  NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0
>  LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0
>  Call Trace:
>   [c000000001c91150] __pud_alloc+0x160/0x200 (unreliable)
>   [c000000001901a80] huge_pte_alloc+0x580/0x950
>   [c000000001cf7910] hugetlb_fault+0x9a0/0x1250
>   [c000000001c94a80] handle_mm_fault+0x490/0x4a0
>   [c0000000018d529c] __do_page_fault+0x77c/0x1f00
>   [c0000000018d6a48] do_page_fault+0x28/0x50
>   [c00000000183b0d4] handle_page_fault+0x18/0x38
>
> Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format")
> Reported-by: Sachin Sant <sachinp at linux.vnet.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
> ---
>
> Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table
> allocation failure there. If we want to go to that old commit, then we may need.

If we never checked for failure in that path, is there some reason we've
only just noticed the crashes? Are we just testing under memory pressure
more effectively than we used to?

cheers