[PATCH] powerpc/livepatch: Fix livepatch stack access

Thu Sep 21 21:53:50 AEST 2017

On Thursday 21 September 2017 04:30 PM, Balbir Singh wrote:
> On Thu, Sep 21, 2017 at 8:02 PM, Michael Ellerman <mpe at ellerman.id.au> wrote:
>> Kamalesh Babulal <kamalesh at linux.vnet.ibm.com> writes:
>>
>>> While running stress test with livepatch module loaded, kernel
>>> bug was triggered.
>>>
>>> cpu 0x5: Vector: 400 (Instruction Access) at [c0000000eb9d3b60]
>>>     pc: c0000000eb9d3e30
>>>     lr: c0000000eb9d3e30
>>>     sp: c0000000eb9d3de0
>>>    msr: 800000001280b033
>>>   current = 0xc0000000dbd38700
>>>   paca    = 0xc00000000fe01400   softe: 0        irq_happened: 0x01
>>>     pid   = 8618, comm = make
>>> Linux version 4.13.0+ (root at ubuntu) (gcc version 6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2)) #1 SMP Wed Sep 13 03:49:27 EDT 2017
>>>
>>> 5:mon> t
>>> [c0000000eb9d3de0] c0000000eb9d3e30 (unreliable)
>>> [c0000000eb9d3e30] c000000000008ab4 hardware_interrupt_common+0x114/0x120
>>>  --- Exception: 501 (Hardware Interrupt) at c000000000053040 livepatch_handler+0x4c/0x74
>>> [c0000000eb9d4120] 0000000057ac6e9d (unreliable)
>>> [d0000000089d9f78] 2e0965747962382e
>>> SP (965747962342e09) is in userspace
>>>
>>> When an interrupt is served in between the livepatch_handler execution,
>>> there are chances of the livepatch_stack/task task getting corrupted.
>>
>> Ouch. That's pretty broken by me.
>>
>
> I was worried more about preemption as I said in the review comment earlier,
> this is new. It looks like we restored the wrong r1 on returning from
> the interrupt
> context?

Yes, consider the example in the mail thread. Where the r0 is temporary 
used to hold the stack pointer value before loading r1 with the 
livepatch_sp (current->thread_info + 24). An interrupt occurs while the 
livepatch stack is being setup to store or restore the LR/TOC value of 
the calling function.

do_IRQ -> call_do_irq's first instruction mflr r0, over-writes the stack 
value with LR value. Consecutive instruction referring to r1 in 
call_do_irq are actually referring to the livepatch_sp instead of task 
stack. On the return to livepatch_handler, stack is restored from r0 
value (which has been over-written with LR value in call_do_irq). Any 
access therefore made to the stack is to a wrong address.

> It would be nice to see any pt_regs changes due to the interrupt.
> Did the interrupt handling code called something that needed live-patching?
>

AFAIK, the livepatch_sp value for softirq_ctx/hardirq_ctx are 
initialized, as it's part of thread_info. Am I missing something here.

>
>>> Fix the corruption by using r11 register for livepatch stack manipulation,
>>> instead of shuffling task stack and livepatch stack into r1 register.
>>> Using r11 register also avoids disabling/enabling irq's while setting
>>> up the livepatch stack.
>>

The r11 is restored before calling the livepatch_handler by ftrace_caller:

         /* Load CTR with the possibly modified NIP */
         mtctr   r15

         /* Restore gprs */
         REST_GPR(0,r1)
         REST_10GPRS(2,r1)
         REST_10GPRS(12,r1)
         REST_10GPRS(22,r1)

         [...]

#ifdef CONFIG_LIVEPATCH
         /*
          * Based on the cmpd or cmpdi above, if the NIP was altered and 
we're
          * not on a kprobe/jprobe, then handle livepatch.
          */
         bne-    livepatch_handler
#endif

-- 
cheers,
Kamalesh.