[PATCH] powerpc/mm: make NULL pointer deferences explicit on bad page faults.

Fri Dec 14 11:57:26 AEDT 2018

Hi Christophe,

You know it's the trivial patches that are going to get lots of review
comments :)

Christophe Leroy <christophe.leroy at c-s.fr> writes:
> As several other arches including x86, this patch makes it explicit
> that a bad page fault is a NULL pointer dereference when the fault
> address is lower than PAGE_SIZE

I'm being pedantic, but it's not necessarily a NULL pointer dereference.
It might just be a direct access to a low address, eg:

 char *p = 0x100;
 *p = 0;

That's not a NULL pointer dereference.

But other arches do print this so I guess it's OK to add, and in most
cases it will be an actual NULL pointer dereference.

I wonder though if we should use 4096 rather than PAGE_SIZE, given
that's the actual value other arches are using. We support 256K pages on
some systems, which is getting quite large.

> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index d51cf5f4e45e..501a1eadb3e9 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -631,13 +631,16 @@ void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
>  	switch (TRAP(regs)) {
>  	case 0x300:
>  	case 0x380:
> -		printk(KERN_ALERT "Unable to handle kernel paging request for "
> -			"data at address 0x%08lx\n", regs->dar);
> +		pr_alert("Unable to handle kernel %s for data at address 0x%08lx\n",
> +			 regs->dar < PAGE_SIZE ? "NULL pointer dereference" :
> +						 "paging request",
> +			 regs->dar);

This is now too long I think, with printk time you get:

[ 1096.450711] Unable to handle kernel NULL pointer dereference for data at address 0x00000000

Which is 93 columns. It's true on many systems it doesn't really matter
any more, but it would still be good if it was shorter.

I like that on x86 they prefix it with "BUG:", just to avoid any confusion.

What if we had for the NULL pointer case:

  BUG: Kernel NULL pointer dereference at 0x00000000

And for the normal case:

  BUG: Unable to handle kernel data access at 0x00000000

Note on the very next line we print:
  Faulting instruction address: 0xc000000000795cc8

So there should be no confusion about whether "at" refers to the data
address or the instruction address.

>  	case 0x400:
>  	case 0x480:
> -		printk(KERN_ALERT "Unable to handle kernel paging request for "
> -			"instruction fetch\n");
> +		pr_alert("Unable to handle kernel %s for instruction fetch\n",
> +			 regs->nip < PAGE_SIZE ? "NULL pointer dereference" :
> +						 "paging request");

I don't really like using "NULL pointer dereference" here, that
terminology makes me think of a load/store, I think it confuses things
rather than making it clearer.

What about:

  BUG: Unable to handle kernel instruction fetch at 0x00000000

>  		break;
>  	case 0x600:
>  		printk(KERN_ALERT "Unable to handle kernel paging request for "

It would be good to clean up these other cases as well. They seem to be
trying to use the "page request for" terminology which leads to them
being very wordy. I assume that was done to help people grepping kernel
logs for errors, but I think we should not worry about that if we have
the "BUG:" prefix.

So we have:
	printk(KERN_ALERT "Unable to handle kernel paging request for "
		"unaligned access at address 0x%08lx\n", regs->dar);

What about:

  BUG: Unable to handle kernel unaligned access at 0x00000000

And:
	printk(KERN_ALERT "Unable to handle kernel paging request for "
		"unknown fault\n");

What about:

  BUG: Unable to handle unknown paging fault at 0x00000000

Thoughts?

cheers