Traceback due to 'powerpc/mm: Fix kernel RAM protection...' when running ppc image in qemu

Guenter Roeck linux at roeck-us.net
Mon Sep 25 02:05:15 AEST 2017


On 09/21/2017 11:44 AM, Christophe LEROY wrote:
> 
> 
> Le 20/09/2017 à 05:45, Guenter Roeck a écrit :
>> On 09/19/2017 08:05 PM, Michael Ellerman wrote:
>>> Guenter Roeck <linux at roeck-us.net> writes:
>>>
>>>> Hi,
>>>>
>>>> I see a the following traceback when running an SMP image based on
>>>> 85xx/mpc85xx_cds_defconfig in qemu.
>>>>
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 0 PID: 1 at kernel/smp.c:416 smp_call_function_many+0xcc/0x2fc
>>>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc1-00009-g0666f56 #1
>>>> task: cf830000 task.stack: cf82e000
>>>> NIP:  c00a93c8 LR: c00a9634 CTR: 00000001
>>>> REGS: cf82fde0 TRAP: 0700   Not tainted  (4.14.0-rc1-00009-g0666f56)
>>>> MSR:  00021000 <CE,ME>  CR: 24000082  XER: 00000000
>>>>
>>>> GPR00: c00a9634 cf82fe90 cf830000 c050ad3c c0015a54 00000000 00000001 00000001
>>>> GPR08: 00000001 00000000 00000000 cf82e000 24000084 00000000 c0003150 00000000
>>>> GPR16: 00000000 00000000 00000000 00000000 00000000 00000001 00000000 c0510000
>>>> GPR24: 00000000 c0015a54 00000000 c050ad3c c051823c c050ad3c 00000025 00000000
>>>> NIP [c00a93c8] smp_call_function_many+0xcc/0x2fc
>>>> LR [c00a9634] smp_call_function+0x3c/0x50
>>>> Call Trace:
>>>> [cf82fe90] [00000010] 0x10 (unreliable)
>>>> [cf82fed0] [c00a9634] smp_call_function+0x3c/0x50
>>>> [cf82fee0] [c0015d2c] flush_tlb_kernel_range+0x20/0x38
>>>> [cf82fef0] [c001524c] mark_initmem_nx+0x154/0x16c
>>>> [cf82ff20] [c001484c] free_initmem+0x20/0x4c
>>>> [cf82ff30] [c000316c] kernel_init+0x1c/0x108
>>>> [cf82ff40] [c000f3a8] ret_from_kernel_thread+0x5c/0x64
>>>> Instruction dump:
>>>> 7c0803a6 7d808120 38210040 4e800020 3d20c052 812981a0 2f890000 40beffac
>>>> 3d20c051 8929ac64 2f890000 40beff9c <0fe00000> 4bffff94 7fc3f378 7f64db78
>>>> ---[ end trace 7da7bdcf8b15ddb3 ]---
>>>
>>> Thanks.
>>>
>>> I guess the system still runs OK otherwise, you're just seeing the warning?
>>>
>> Yes, though I am not sure if that is because there is only one active CPU (there is
>> still only one if I say "-smp 4" on the qemu command line).
>>
>>>> A complete log is available at:
>>>> http://kerneltests.org/builders/qemu-ppc-master/builds/814/steps/qemubuildcommand/logs/stdio
>>>>
>>>> Bisect points to commit 3184cc4b6f6a1dc0 ("powerpc/mm: Fix kernel RAM protection
>>>> after freeing unused memory on PPC32"). Bisect log is attached. A quick look
>>>> suggests that mark_initmem_nx() is called with interrupts disabled, which
>>>> triggers the traceback.
>>>
>>> Hmm. Yes the MSR says you have interrupts disabled (EE missing).
>>>
>>> But I don't see why. start_kernel() did local_irq_enable(), so I don't
>>> understand why we got to mark_initmem_nx() with them disabled. I'll hope
>>> that Christophe has some idea.
>>>
>> Good question. I only see this with one of 9 ppc emulations, with 85xx/mpc85xx_cds_defconfig
>> +CONFIG_DEVTMPFS=y +CONFIG_SMP=y. Maybe there is a platform specific init function
>> which leaves interrupts disabled. Question is which one that might be.
>>
> 
> Unfortunatly no, I have no idea. My three platforms (860, 885 and 8321) are not SMPs so that warning would not appear, but I added a WARN_ON(1) just become calling mark_initmem_nx(), and I can confirm that MSR has EE set on all three at that time.
> 

You should still be able to compile and run a SMP kernel. mpc85xx_cds_defconfig
without CONFIG_SMP=y does not show the warning either.

Turns out interrupts are disabled in change_page_attr(), called by mark_initmem_nx().
change_page_attr() calls flush_tlb_kernel_range() with interrupts disabled.
This only happens if CONFIG_PPC_MMU_NOHASH=y.
Given that, I would assume that this will be seen with every 32 bit ppc build which has
CONFIG_SMP=y and CONFIG_PPC_MMU_NOHASH=y.

Maybe the problem was really introduced with commit e611939fc8ec1 ("powerpc/mm: Ensure
change_page_attr() doesn't invalidate pinned TLBs"). From the context it appears that
flush_tlb_kernel_range() should not be called with interrupts disabled.
Indeed, moving flush_tlb_kernel_range() outside the irq disabled code fixes
the problem for me.

Thanks,
Guenter

> So as you suggest, there must be a platform specific stuff leaving the interrupts disabled.
> 
> Christophe
> 
> 
>> Guenter
> 



More information about the Linuxppc-dev mailing list