[PATCH] powerpc/irq: Increase stack_overflow detection limit when KASAN is enabled

Christophe Leroy christophe.leroy at csgroup.eu
Tue May 31 16:39:19 AEST 2022



Le 31/05/2022 à 08:21, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy at csgroup.eu> writes:
>> When KASAN is enabled, as shown by the Oops below, the 2k limit is not
>> enough to allow stack dump after a stack overflow detection when
>> CONFIG_DEBUG_STACKOVERFLOW is selected:
>>
>> 	do_IRQ: stack overflow: 1984
>> 	CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1
>> 	Call Trace:
>> 	Oops: Kernel stack overflow, sig: 11 [#1]
>> 	BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
>> 	Modules linked in: sr_mod cdrom radeon(+) ohci_pci(+) hwmon i2c_algo_bit drm_ttm_helper ttm drm_dp_helper snd_aoa_i2sbus snd_aoa_soundbus snd_pcm ehci_pci snd_timer ohci_hcd snd ssb ehci_hcd 8250_pci soundcore drm_kms_helper pcmcia 8250 pcmcia_core syscopyarea usbcore sysfillrect 8250_base sysimgblt serial_mctrl_gpio fb_sys_fops usb_common pkcs8_key_parser fuse drm drm_panel_orientation_quirks configfs
>> 	CPU: 0 PID: 126 Comm: systemd-udevd Not tainted 5.18.0-gentoo-PMacG4 #1
>> 	NIP:  c02e5558 LR: c07eb3bc CTR: c07f46a8
>> 	REGS: e7fe9f50 TRAP: 0000   Not tainted  (5.18.0-gentoo-PMacG4)
>> 	MSR:  00001032 <ME,IR,DR,RI>  CR: 44a14824  XER: 20000000
>>
>> 	GPR00: c07eb3bc eaa1c000 c26baea0 eaa1c0a0 00000008 00000000 c07eb3bc eaa1c010
>> 	GPR08: eaa1c0a8 04f3f3f3 f1f1f1f1 c07f4c84 44a14824 0080f7e4 00000005 00000010
>> 	GPR16: 00000025 eaa1c154 eaa1c158 c0dbad64 00000020 fd543810 eaa1c0a0 eaa1c29e
>> 	GPR24: c0dbad44 c0db8740 05ffffff fd543802 eaa1c150 c0c9a3c0 eaa1c0a0 c0c9a3c0
>> 	NIP [c02e5558] kasan_check_range+0xc/0x2b4
>> 	LR [c07eb3bc] format_decode+0x80/0x604
>> 	Call Trace:
>> 	[eaa1c000] [c07eb3bc] format_decode+0x80/0x604 (unreliable)
>> 	[eaa1c070] [c07f4dac] vsnprintf+0x128/0x938
>> 	[eaa1c110] [c07f5788] sprintf+0xa0/0xc0
>> 	[eaa1c180] [c0154c1c] __sprint_symbol.constprop.0+0x170/0x198
>> 	[eaa1c230] [c07ee71c] symbol_string+0xf8/0x260
>> 	[eaa1c430] [c07f46d0] pointer+0x15c/0x710
>> 	[eaa1c4b0] [c07f4fbc] vsnprintf+0x338/0x938
>> 	[eaa1c550] [c00e8fa0] vprintk_store+0x2a8/0x678
>> 	[eaa1c690] [c00e94e4] vprintk_emit+0x174/0x378
>> 	[eaa1c6d0] [c00ea008] _printk+0x9c/0xc0
>> 	[eaa1c750] [c000ca94] show_stack+0x21c/0x260
>> 	[eaa1c7a0] [c07d0bd4] dump_stack_lvl+0x60/0x90
>> 	[eaa1c7c0] [c0009234] __do_IRQ+0x170/0x174
>> 	[eaa1c800] [c0009258] do_IRQ+0x20/0x34
>> 	[eaa1c820] [c00045b4] HardwareInterrupt_virt+0x108/0x10c
> 
> Is this actually caused by KASAN? There's no stack frames in there that
> are KASAN related AFAICS.

Yes but enabling KASAN often increases the size of any functions.

And by the way here you have NIP in kasan_check_range()

But I can try to perform some more tests.

> 
> Seems like the 2K limit is never going to be enough even if KASAN is not
> enabled. Presumably we just haven't noticed because we don't trigger the
> check unless KASAN is enabled.

I think what trigger the Oops really is VMAP_STACK. Without VMAP_STACK 
we just silently overwrite other memory.

> 
>> ...
>>
>> Increase the limit to 3k when KASAN is enabled.
>>
>> While at it remove the 'inline' keywork for check_stack_overflow().
>> This function is called only once so it will be inlined regardless.
> 
> I'd rather that was a separate change, in case it has some unintended
> affect.

ok

> 
>> Reported-by: Erhard Furtner <erhard_f at mailbox.org>
>> Cc: Arnd Bergmann <arnd at arndb.de>
>> Signed-off-by: Christophe Leroy <christophe.leroy at csgroup.eu>
>> ---
>>   arch/powerpc/kernel/irq.c | 16 ++++++++++------
>>   1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
>> index 873e6dffb868..5ff4cf69fc2f 100644
>> --- a/arch/powerpc/kernel/irq.c
>> +++ b/arch/powerpc/kernel/irq.c
>> @@ -53,6 +53,7 @@
>>   #include <linux/vmalloc.h>
>>   #include <linux/pgtable.h>
>>   #include <linux/static_call.h>
>> +#include <linux/sizes.h>
>>   
>>   #include <linux/uaccess.h>
>>   #include <asm/interrupt.h>
>> @@ -184,7 +185,7 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
>>   	return sum;
>>   }
>>   
>> -static inline void check_stack_overflow(void)
>> +static void check_stack_overflow(void)
>>   {
>>   	long sp;
>>   
>> @@ -193,11 +194,14 @@ static inline void check_stack_overflow(void)
>>
> 
> Wouldn't it be cleaner to just do:
> 
> #ifdef CONFIG_KASAN
> #define STACK_CHECK_LIMIT (3 * 1024)
> #else
> #define STACK_CHECK_LIMIT (2 * 1024)
> #endif

Well, as you think 2k is not enough even without KASAN, then we should 
just increase it to 3k ?

In the meantime I was thinking about moving the test into __do_irq(), so 
that it will be done on IRQ stack. That would ease things unless we 
overfill the IRQ stack itself.

Because even if we put a detection limit at 3 or 4k, as the detection is 
asynchronous we still have a risk that the stack filling be much more 
than the limit and still be unable to perform the stack dump within the 
remaining stack space.

> 
>>   	sp = current_stack_pointer & (THREAD_SIZE - 1);
>>   
>> -	/* check for stack overflow: is there less than 2KB free? */
>> -	if (unlikely(sp < 2048)) {
>   
> +	if (unlikely(sp < STACK_CHECK_LIMIT)) {
>   
> And then the code could stay as it is?
> 
> cheers
> 
>> -		pr_err("do_IRQ: stack overflow: %ld\n", sp);
>> -		dump_stack();
>> -	}
>> +	/* check for stack overflow: is there less than 2/3KB free? */
>> +	if (!IS_ENABLED(KASAN) && likely(sp >= SZ_2K))
>> +		return;
>> +	if (IS_ENABLED(KASAN) && likely(sp >= SZ_2K + SZ_1K))
>> +		return;
>> +
>> +	pr_err("do_IRQ: stack overflow: %ld\n", sp);
>> +	dump_stack();
>>   }


More information about the Linuxppc-dev mailing list