[PATCH v3] powerpc: setup_64: set up PACA earlier to avoid kcov problems
Daniel Axtens
dja at axtens.net
Sat Mar 7 00:30:10 AEDT 2020
Andrew Donnellan <ajd at linux.ibm.com> writes:
> On 6/3/20 6:30 pm, Daniel Axtens wrote:
>> kcov instrumentation is collected the __sanitizer_cov_trace_pc hook in
>> kernel/kcov.c. The compiler inserts these hooks into every basic block
>> unless kcov is disabled for that file.
>>
>> We then have a deep call-chain:
>> - __sanitizer_cov_trace_pc calls to check_kcov_mode()
>> - check_kcov_mode() (kernel/kcov.c) calls in_task()
>> - in_task() (include/linux/preempt.h) calls preempt_count().
>> - preempt_count() (include/asm-generic/preempt.h) calls
>> current_thread_info()
>> - because powerpc has THREAD_INFO_IN_TASK, current_thread_info()
>> (include/linux/thread_info.h) is defined to 'current'
>> - current (arch/powerpc/include/asm/current.h) is defined to
>> get_current().
>> - get_current (same file) loads an offset of r13.
>> - arch/powerpc/include/asm/paca.h makes r13 a register variable
>> called local_paca - it is the PACA for the current CPU, so
>> this has the effect of loading the current task from PACA.
>> - get_current returns the current task from PACA,
>> - current_thread_info returns the task cast to a thread_info
>> - preempt_count dereferences the thread_info to load preempt_count
>> - that value is used by in_task and so on up the chain
>>
>> The problem is:
>>
>> - kcov instrumentation is enabled for arch/powerpc/kernel/dt_cpu_ftrs.c
>>
>> - even if it were not, dt_cpu_ftrs_init calls generic dt parsing code
>> which should definitely have instrumentation enabled.
>>
>> - setup_64.c calls dt_cpu_ftrs_init before it sets up a PACA.
>>
>> - If we don't set up a paca, r13 will contain unpredictable data.
>>
>> - In a zImage compiled with kcov and KASAN, we see r13 containing a value
>> that leads to dereferencing invalid memory (something like
>> 912a72603d420015).
>>
>> - Weirdly, the same kernel as a vmlinux loaded directly by qemu does not
>> crash. Investigating with gdb, it seems that in the vmlinux boot case,
>> r13 is near enough to zero that we just happen to be able to read that
>> part of memory (we're operating with translation off at this point) and
>> the current pointer also happens to land in readable memory and
>> everything just works.
>>
>> - PACA setup refers to CPU features - setup_paca() looks at
>> early_cpu_has_feature(CPU_FTR_HVMODE)
>>
>> There's no generic kill switch for kcov (as far as I can tell), and we
>> don't want to have to turn off instrumentation in the generic dt parsing
>> code (which lives outside arch/powerpc/) just because we don't have a real
>> paca or task yet.
>>
>> So:
>> - change the test when setting up a PACA to consider the actual value of
>> the MSR rather than the CPU feature.
>>
>> - move the PACA setup to before the cpu feature parsing.
>>
>> Translations get switched on once we leave early_setup, so I think we'd
>> already catch any other cases where the PACA or task aren't set up.
>>
>> Boot tested on a P9 guest and host.
>>
>> Fixes: fb0b0a73b223 ("powerpc: Enable kcov")
>> Cc: Andrew Donnellan <ajd at linux.ibm.com>
>> Suggested-by: Michael Ellerman <mpe at ellerman.id.au>
>> Signed-off-by: Daniel Axtens <dja at axtens.net>
>>
>> ---
>>
>> Regarding moving the comment about printk()-safety:
>> I am about 75% sure that the thing that makes printk() safe is the PACA,
>> not the CPU features. That's what commit 24d9649574fb ("[POWERPC] Document
>> when printk is useable") seems to indicate, but as someone wise recently
>> told me, "bootstrapping is hard", so I may be totally wrong.
>>
>> v3: Update comment, thanks Christophe Leroy.
>> Remove a comment in dt_cpu_ftrs.c that is no longer accurate - thanks
>> Andrew. I think we want to retain all the code still, but I'm open to
>> being told otherwise.
>
> Thanks for doing that.
>
> This patch and the justification doesn't seem obviously wrong, and is
> snowpatch-clean.
>
> Reviewed-by: Andrew Donnellan <ajd at linux.ibm.com>
>
> (Is it worth cc'ing this to stable in case there are other situations we
> haven't foreseen where we hit the unpredictable r13 data? Few people use
> kcov...)
I did briefly consider it but didn't believe it reached the stable
criteria:
| It must fix a real bug that bothers people (not a, “This could be a
| problem...” type thing).
On reflection it's a real bug (boot hang), it bothers me, and presumably
also you due to the syzkaller interaction, and I am led to believe we
are both people, so I guess I'll do a v3 with cc: stable. Thanks!
Regards,
Daniel
>
> --
> Andrew Donnellan OzLabs, ADL Canberra
> ajd at linux.ibm.com IBM Australia Limited
More information about the Linuxppc-dev
mailing list