OVERFLOW_KUNIT_TEST fails with BUG: KASAN: stack-out-of-bounds in string_nocheck+0x168/0x1c8 (kernel 6.11-rc2, PowerMac G4 DP)

Ivan Orlov ivan.orlov0322 at gmail.com
Thu Aug 15 09:26:40 AEST 2024


On 8/14/24 21:38, 'Erhard Furtner' via KUnit Development wrote:
> On Mon, 12 Aug 2024 11:54:11 -0700
> Kees Cook <kees at kernel.org> wrote:
> 
>> On Fri, Aug 09, 2024 at 11:15:37PM +0200, Erhard Furtner wrote:
>>> Greetings!
>>>
>>> When KASAN is enabled the Overflow KUnit test fails:
>>>
>>> [...]
>>>      ok 16 shift_nonsense_test
>>>      # overflow_allocation_test: 11 allocation overflow tests finished
>>> ==================================================================
>>> BUG: KASAN: stack-out-of-bounds in string_nocheck+0x168/0x1c8
>>> Read of size 1 at addr c976be40 by task kunit_try_catch/1843
>>>
>>> CPU: 0 UID: 0 PID: 1843 Comm: kunit_try_catch Tainted: G                 N 6.11.0-rc2-PMacG4 #1
>>> Tainted: [N]=TEST
>>> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
>>> Call Trace:
>>> [c992fb80] [c16651c0] dump_stack_lvl+0x80/0xac (unreliable)
>>> [c992fba0] [c04e0420] print_report+0xdc/0x504
>>> [c992fc00] [c04e01d8] kasan_report+0xf8/0x108
>>> [c992fc80] [c16ae4c8] string_nocheck+0x168/0x1c8
>>> [c992fcf0] [c16b37a4] string+0xa8/0xbc
>>> [c992fd60] [c16b8134] vsnprintf+0x868/0x1750
>>> [c992fdf0] [c0b8490c] kvasprintf+0xa4/0x13c
>>> [c992fe60] [c0b84c3c] kasprintf+0xb4/0xc8
>>> [c992fed0] [c0f4c954] module_remove_driver+0x1f0/0x2fc
>>> [c992ff00] [c0f21628] bus_remove_driver+0x1d0/0x240
>>> [c992ff30] [bfd0cd40] kunit_put_resource+0x128/0x134 [kunit]
>>> [c992ff50] [bfd0a120] kunit_cleanup+0x140/0x144 [kunit]
>>> [c992ff90] [bfd10d64] kunit_generic_run_threadfn_adapter+0xf8/0x148 [kunit]
>>> [c992ffc0] [c00f57e0] kthread+0x36c/0x37c
>>> [c992fff0] [c0028304] start_kernel_thread+0x10/0x14
>>>
>>> The buggy address belongs to the physical page:
>>> page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x976b
>>> flags: 0x0(zone=0)
>>> raw: 00000000 00000000 eef2bb10 00000000 00000000 00000000 ffffffff 00000000
>>> raw: 00000000
>>> page dumped because: kasan: bad access detected
>>>
>>> Memory state around the buggy address:
>>>   c976bd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>   c976bd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> c976be00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 04 f2
>>>                                     ^
>>>   c976be80: 00 04 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>>>   c976bf00: 00 00 f1 f1 f1 f1 00 f3 f3 f3 00 00 00 00 00 00
>>> ==================================================================
>>> Disabling lock debugging due to kernel taint
>>>      not ok 17 overflow_allocation_test
>>>      # overflow_size_helpers_test: 43 overflow size helper tests finished
>>>      ok 18 overflow_size_helpers_test
>>>      # overflows_type_test: 378 overflows_type() tests finished
>>>      ok 19 overflows_type_test
>>>      # same_type_test: 0 __same_type() tests finished
>>>      ok 20 same_type_test
>>>      # castable_to_type_test: 75 castable_to_type() tests finished
>>>      ok 21 castable_to_type_test
>>>      ok 22 DEFINE_FLEX_test
>>> # overflow: pass:21 fail:1 skip:0 total:22
>>> # Totals: pass:21 fail:1 skip:0 total:22
>>> not ok 1 overflow
>>>
>>>
>>> This is reproducible on my machine and always happens when running the test via 'modprobe -v overflow_kunit'. Without KASAN enabled (but KFENCE) overflow_allocation_test passes.
>>
>> Hmm, this implies some kind of corruption is sneaking in and the kunit
>> resource freeing code is exploding. I don't immediately see the problem,
>> though.
> 
> Not the 1st memory corruption I got on ppc32 (https://lore.kernel.org/all/20240811165230.91DCFA0660@freki.localdomain/) btw., but this does not seem related.
> 
> I just did a kernel build with overflow_kunit statically built in to run at boot. This way I don't get the "BUG: KASAN: stack-out-of-bounds in string_nocheck+0x168/0x1c8" on the PowerMac and on qemu. Run directly at boot the overflow_kunit just passes. As soon as I build it as module and modprobe it later, I hit the issue. Strange...
> 
> A hint that not the test itself might cause the stack corruption but another process.
> 
> Regards,
> Erhard
> 

Hi Erhard and Kees,

On my QEMU setup the overflow_kunit test produces the following kernel 
panic when running "modprobe -v overflow_kunit" with KASAN and KFENCE 
enabled:

[   52.574541] BUG: KASAN: stack-out-of-bounds in string+0x2a0/0x320
[   52.574541] Read of size 1 at addr ffffc900010d7d88 by task 
systemd-udevd/144
[   52.574541]
[   52.574541] CPU: 11 UID: 0 PID: 144 Comm: systemd-udevd Tainted: G 
              N 6.11.0-rc2-00319-g1fcd5c59a7f8 #83
[   52.574541] Tainted: [N]=TEST
[   52.574541] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.15.0-1 04/01/2014
[   52.574541] Call Trace:
[   52.574541]  <TASK>
[   52.574541]  dump_stack_lvl+0x55/0x70
[   52.574541]  print_report+0xcb/0x620
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? string+0x2a0/0x320
[   52.574541]  kasan_report+0xc5/0x100
[   52.574541]  ? string+0x2a0/0x320
[   52.574541]  string+0x2a0/0x320
[   52.574541]  ? __pfx_string+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  vsnprintf+0x809/0x1600
[   52.574541]  ? __pfx_vsnprintf+0x10/0x10
[   52.574541]  ? kasan_save_stack+0x24/0x50
[   52.574541]  ? __kasan_kmalloc+0xaa/0xb0
[   52.574541]  ? uevent_show+0x127/0x300
[   52.574541]  ? dev_attr_show+0x41/0xc0
[   52.574541]  ? sysfs_kf_seq_show+0x213/0x400
[   52.574541]  ? seq_read_iter+0x404/0x1070
[   52.574541]  ? vfs_read+0x642/0x8f0
[   52.574541]  add_uevent_var+0x135/0x2e0
[   52.574541]  ? __kmalloc_node_noprof+0x1bc/0x3a0
[   52.574541]  ? seq_read_iter+0x67d/0x1070
[   52.574541]  ? __pfx_add_uevent_var+0x10/0x10
[   52.574541]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   52.574541]  ? stack_trace_save+0x8f/0xc0
[   52.574541]  ? __pfx_stack_trace_save+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? stack_depot_save_flags+0x2e/0x710
[   52.574541]  dev_uevent+0x166/0x6a0
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? __pfx_dev_uevent+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? kasan_unpoison+0x27/0x60
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? __kasan_slab_alloc+0x4d/0x90
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? __kmalloc_cache_noprof+0x100/0x2b0
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? kasan_save_track+0x14/0x30
[   52.574541]  uevent_show+0x183/0x300
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? mutex_lock+0x8c/0xe0
[   52.574541]  ? __pfx_dev_attr_show+0x10/0x10
[   52.574541]  dev_attr_show+0x41/0xc0
[   52.574541]  sysfs_kf_seq_show+0x213/0x400
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  seq_read_iter+0x404/0x1070
[   52.574541]  vfs_read+0x642/0x8f0
[   52.574541]  ? __pfx_vfs_read+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? __do_sys_newfstatat+0x86/0xd0
[   52.574541]  ? __pfx___do_sys_newfstatat+0x10/0x10
[   52.574541]  ksys_read+0xec/0x1c0
[   52.574541]  ? __pfx_ksys_read+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  do_syscall_64+0xa6/0x1a0
[   52.574541]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   52.574541] RIP: 0033:0x7fcf58ddf7e2
[   52.574541] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a b4 0c 00 e8 a5 1d 
02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 
05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[   52.574541] RSP: 002b:00007ffd98a30d88 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
[   52.574541] RAX: ffffffffffffffda RBX: 0000000000001018 RCX: 
00007fcf58ddf7e2
[   52.574541] RDX: 0000000000001018 RSI: 00005586e442f2a0 RDI: 
000000000000000c
[   52.574541] RBP: 00005586e442f2a0 R08: 0000000000000000 R09: 
00005586e442f2a0
[   52.574541] R10: 00007fcf58ee5d10 R11: 0000000000000246 R12: 
000000000000000c
[   52.574541] R13: 0000000000001017 R14: 0000000000000002 R15: 
00007ffd98a30db0
[   52.574541]  </TASK>
[   52.574541]
[   52.574541] The buggy address belongs to the virtual mapping at
[   52.574541]  [ffffc900010d0000, ffffc900010d9000) created by:
[   52.574541]  kernel_clone+0xb9/0x6c0
[   52.574541]
[   52.574541] The buggy address belongs to the physical page:
[   52.574541] page: refcount:1 mapcount:0 mapping:0000000000000000 
index:0x0 pfn:0xa9df
[   52.574541] flags: 0x100000000000000(node=0|zone=1)
[   52.574541] raw: 0100000000000000 0000000000000000 dead000000000122 
0000000000000000
[   52.574541] raw: 0000000000000000 0000000000000000 00000001ffffffff 
0000000000000000
[   52.574541] page dumped because: kasan: bad access detected
[   52.574541]
[   52.574541] Memory state around the buggy address:
[   52.574541]  ffffc900010d7c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 f1
[   52.574541]  ffffc900010d7d00: f1 f1 f1 f1 f1 04 f2 00 f2 f2 f2 00 00 
00 f3 f3
[   52.574541] >ffffc900010d7d80: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 
00 00 00
[   52.574541]                       ^
[   52.574541]  ffffc900010d7e00: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 
f1 00 00
[   52.574541]  ffffc900010d7e80: f3 f3 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[   52.574541] 
==================================================================
[   52.600667] Disabling lock debugging due to kernel taint


And it looks like I found the root cause (lib/overflow_kunit.c +671):
...
static void overflow_allocation_test(struct kunit *test)
{
        const char device_name[] = "overflow-test";
...

As you can see, the device name is defined as a local variable, which 
means that it doesn't exist out of the 'overflow_allocation_test' 
function scope. This patch:

diff --git a/lib/overflow_kunit.c b/lib/overflow_kunit.c
index f314a0c15a6d..fa7ca8c94eee 100644
--- a/lib/overflow_kunit.c
+++ b/lib/overflow_kunit.c
@@ -668,7 +668,7 @@ DEFINE_TEST_ALLOC(devm_kzalloc,  devm_kfree, 1, 1, 0);

  static void overflow_allocation_test(struct kunit *test)
  {
-	const char device_name[] = "overflow-test";
+	static const char device_name[] = "overflow-test";
  	struct device *dev;
  	int count = 0;


Seems to fix the problem and it is not reproducable anymore.

I will send the proper patch tomorrow.

Good night!

-- 
Kind regards,
Ivan Orlov


More information about the Linuxppc-dev mailing list