Powerpc64: Fixup oops when debug programs with CONFIG_RELOCATABLE=y

Tue Apr 25 11:24:13 AEST 2017

On Tue, Feb 07, 2017 at 10:35:52AM +0800, Liu Hailong wrote:
> From: LiuHailong <liu.hailong6 at zte.com.cn>
> 
> Debug interrupts can be taken during regular program or a standard
> interrupt, the EA of the instruction causing the interrupt will be
> kept in DSRR0.
> Kernel will check if this value is between [interrupt_base_book3e,
> __end_interrupts].
> However, when the kernel build with CONFIG_RELOCATABLE, it can't get
> EA of those lables by LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
> and LOAD_REG_IMMEDIATE(r15,__end_interrupts),then it cases problems
> later.
> At the same time, r2(toc) are not usable here, so LOAD_REG_ADDR()
> dosen't work neither. So we use the *name at got* to get the EV of two
> lables directly.
> This patch can fix the problem and remove the oops when we gdb a
> program with single-step.
> 
> Test programs test.c shows as follows:
> #include <fcntl.h>
> #include <stdio.h>
> int main(int argc, char *argv[])
> {
> 	if (access("/proc/sys/kernel/perf_event_paranoid", F_OK) == -1)
> 		printf("Kernel doesn't have perf_event support\n");
> }
> 
> Steps to reproduce the bug, for example:
>  1) ./gdb ./test
>  2) (gdb) b access
>  3) (gdb) r
>  4) (gdb) s
> 
> Then will trigger the oops, it looks like:
> (gdb) s
> Single stepping Oops: Exception in kernel mode, sig: 5 [#2]
> PREEMPT CoreNet Generic
> Modules linked in:
> CPU: 0 PID: 1135 Comm: test Tainted: G    D    Linux (none) 4.9.5 #79
> task: c000000079199580 ti: c00000007ffc4000 task.ti: c000000074064000
> NIP: c00000000001a1e4 LR: 000000001000103c CTR: 000000001000100c
> REGS: c00000007ffc7cf0 TRAP: 0d08   Tainted: G   D  (Linux (none) 4.9.5)
> MSR: 0000000080021000 <CE,ME>  CR: 24000442  XER: 00000000
> SOFTE: 1

I apologize for not getting to this earlier...

Does it really produce an oops, rather than a hang?  It looks like
without this fix, flow would go to kernel_dbg_exc which is a
branch-to-self.  Do you have other changes in your tree that affected
this?

If so, have you tested the patch on an unmodified top-of-tree kernel?  I
can't test this at the moment as I don't currently have hardware and QEMU
doesn't emulate the booke debug registers.

That said, the patch looks correct, and the bug is even worse if it's a
hang rather than merely noisily killing the debugged process.  It should
go to stable for 4.4+ (when support for relocatable e500 was added) and
probably to Linus this week (though I'd feel more comfortable knowing it
got testing on the current tree).  OTOH, I believe this bug will only
trigger if a relocation actually happened, which on e500 is an unusual
case outside of a kdump crash kernel, since the kernel is normally loaded
at zero.  But maybe you've got a different use case for relocatable?

-Scott