[PATCH] powerpc/mm: Handle page table allocation failures

Tue May 14 19:33:02 AEST 2019

> On 14-May-2019, at 12:10 PM, Michael Ellerman <mpe at ellerman.id.au> wrote:
> 
> "Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
>> This fix the below crash that arise due to not handling page table allocation
>> failures while allocating hugetlb page table.
>> 
>> BUG: Kernel NULL pointer dereference at 0x0000001c
>> Faulting instruction address: 0xc000000001d1e58c
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> 
>> CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: G        W  O      5.1.0-next-20190507-autotest #1
>> NIP:  c000000001d1e58c LR: c000000001d1e54c CTR: 0000000000000000
>> REGS: c000000004937890 TRAP: 0300   Tainted: G        W  O       (5.1.0-next-20190507-autotest)
>> MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22424822  XER: 00000000
>> CFAR: c00000000183e9e0 DAR: 000000000000001c DSISR: 40000000 IRQMASK: 0
>> GPR00: c000000001901a80 c000000004937b20 c000000003938700 0000000000000000
>> GPR04: 0000000000400cc0 000000000003efff 000000027966e000 c000000003ba8700
>> GPR08: c000000003ba8700 000000000d601125 c000000003ba8700 0000000080000000
>> GPR12: 0000000022424822 c00000001ecae280 0000000000000000 0000000000000000
>> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR20: 0000000000000018 c0000000039e2d30 c0000000039e2d28 c0000002762da460
>> GPR24: 000000000000001c 0000000000000000 0000000000000001 c000000001901a80
>> GPR28: 0000000000400cc0 0000000000000000 0000000000000000 0000000000400cc0
>> NIP [c000000001d1e58c] kmem_cache_alloc+0xbc/0x5a0
>> LR [c000000001d1e54c] kmem_cache_alloc+0x7c/0x5a0
>> Call Trace:
>>  [c000000001c91150] __pud_alloc+0x160/0x200 (unreliable)
>>  [c000000001901a80] huge_pte_alloc+0x580/0x950
>>  [c000000001cf7910] hugetlb_fault+0x9a0/0x1250
>>  [c000000001c94a80] handle_mm_fault+0x490/0x4a0
>>  [c0000000018d529c] __do_page_fault+0x77c/0x1f00
>>  [c0000000018d6a48] do_page_fault+0x28/0x50
>>  [c00000000183b0d4] handle_page_fault+0x18/0x38
>> 
>> Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format")
>> Reported-by: Sachin Sant <sachinp at linux.vnet.ibm.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>> ---
>> 
>> Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table
>> allocation failure there. If we want to go to that old commit, then we may need.
> 
> If we never checked for failure in that path, is there some reason we've
> only just noticed the crashes? Are we just testing under memory pressure
> more effectively than we used to?
> 
Actually the reported crash seems to be due to commit 723f268f19

723f268f19 - powerpc/mm: cleanup ifdef mess in add_huge_page_size()

Reverting this patch allows the test case to execute correctly without a crash.

Thanks
-Sachin