Recurring data races when machine at idle e.g. "BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_sched_do_timer" and several data races at boot (G4 DP, kernel v6.6)

Erhard Furtner erhard_f at mailbox.org
Tue Nov 7 08:52:41 AEDT 2023


Greetings!

I did a test run with kernel v6.6 on my G4 DP with KCSAN_EARLY_ENABLE=y, KCSAN_REPORT_ONCE_IN_MS=12000, KCSAN_STRICT=y, KCSAN_WEAK_MEMORY=y, CONFIG_KCSAN_VERBOSE=y hoping to get some hints about what might cause https://bugzilla.kernel.org/show_bug.cgi?id=215389 The G4 was operated and also output was gathered via serial console.

When the G4 has finished booting and is idle I get races like these every few minutes, sometimes more often without even using the machine:
==================================================================
BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_sched_do_timer

write to 0xc1a3f0c4 of 4 bytes by interrupt on cpu 1:
 tick_sched_do_timer+0x170/0x1c0
 tick_sched_timer+0x88/0x224
 __hrtimer_run_queues+0x5d8/0x7d4
 hrtimer_interrupt+0x234/0x4d8
 timer_interrupt+0xec/0x24c
 Decrementer_virt+0x108/0x10c
 default_idle_call+0x13c/0x274
 do_idle+0x100/0x188
 cpu_startup_entry+0x34/0x38
 start_secondary+0x5ac/0xb58
 0x3334

no locks held by swapper/1/0.
irq event stamp: 148094222
hardirqs last  enabled at (148094221): [<c12fe03c>] default_idle_call+0x13c/0x274
hardirqs last disabled at (148094222): [<c00156a0>] interrupt_async_enter_prepare+0xf0/0x138
softirqs last  enabled at (147912796): [<c13107a0>] __do_softirq+0x490/0x4f8
softirqs last disabled at (147912785): [<c000c658>] do_softirq_own_stack+0x58/0xa4

read to 0xc1a3f0c4 of 4 bytes by task 0 on cpu 0:
 tick_nohz_idle_stop_tick+0x1f0/0x6f4
 do_idle+0xfc/0x188
 cpu_startup_entry+0x34/0x38
 rest_init+0x184/0x24c
 arch_post_acpi_subsys_init+0x0/0xc
 console_on_rootfs+0x0/0x98
 0x3610

no locks held by swapper/0/0.
irq event stamp: 149549446
hardirqs last  enabled at (149549445): [<c12fe03c>] default_idle_call+0x13c/0x274
hardirqs last disabled at (149549446): [<c011591c>] do_idle+0xd0/0x188
softirqs last  enabled at (149492862): [<c13107a0>] __do_softirq+0x490/0x4f8
softirqs last disabled at (149492845): [<c000c658>] do_softirq_own_stack+0x58/0xa4

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.6.0-PMacG4-dirty #19
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
==================================================================


Also I get "BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event" and "BUG: KCSAN: data-race in tick_nohz_next_event / tick_sched_do_timer" frequently when not actively using the machine (see dmesg).

During boot I regularly get rcu related races like this:
==================================================================
BUG: KCSAN: data-race in rcu_all_qs / rcu_disable_urgency_upon_qs

write (marked) to 0xeedc7805 of 1 bytes by interrupt on cpu 1:
 rcu_disable_urgency_upon_qs+0x4c/0x10c
 rcu_report_qs_rdp+0x490/0x4e0
 rcu_core+0x810/0xe74
 rcu_core_si+0x20/0x48
 __do_softirq+0x16c/0x4f8
 do_softirq_own_stack+0x58/0xa4
 ct_irq_enter+0x9c/0xc0
 __irq_exit_rcu+0x148/0x184
 irq_exit+0x10/0x60
 interrupt_async_exit_prepare.isra.0+0x18/0x34
 timer_interrupt+0x124/0x24c
 Decrementer_virt+0x108/0x10c
 0xeedc7805
 0x0
 rcu_all_qs+0x6c/0x268
 __cond_resched+0x1c/0x7c
 __kmem_cache_alloc_node+0x324/0x4a4
 kmalloc_trace+0x40/0x150
 ttm_range_man_alloc+0xc0/0x224 [ttm]
 ttm_resource_alloc+0x9c/0x124 [ttm]
 ttm_bo_mem_space+0x114/0x3a4 [ttm]
 ttm_bo_validate+0x108/0x39c [ttm]
 ttm_bo_init_reserved+0x2c8/0x310 [ttm]
 ttm_bo_init_validate+0xb4/0x1cc [ttm]
 radeon_bo_create+0x220/0x3ec [radeon]
 radeon_gem_object_create+0xe4/0x2dc [radeon]
 radeon_gem_create_ioctl+0xd4/0x20c [radeon]
 drm_ioctl_kernel+0x150/0x270
 drm_ioctl+0x328/0x6d0
 radeon_drm_ioctl+0x94/0x174 [radeon]
 sys_ioctl+0x338/0x1144
 system_call_exception+0x10c/0x1f0
 ret_from_syscall+0x0/0x2c

read to 0xeedc7805 of 1 bytes by task 331 on cpu 1:
 rcu_all_qs+0x6c/0x268
 __cond_resched+0x1c/0x7c
 __kmem_cache_alloc_node+0x324/0x4a4
 kmalloc_trace+0x40/0x150
 ttm_range_man_alloc+0xc0/0x224 [ttm]
 ttm_resource_alloc+0x9c/0x124 [ttm]
 ttm_bo_mem_space+0x114/0x3a4 [ttm]
 ttm_bo_validate+0x108/0x39c [ttm]
 ttm_bo_init_reserved+0x2c8/0x310 [ttm]
 ttm_bo_init_validate+0xb4/0x1cc [ttm]
 radeon_bo_create+0x220/0x3ec [radeon]
 radeon_gem_object_create+0xe4/0x2dc [radeon]
 radeon_gem_create_ioctl+0xd4/0x20c [radeon]
 drm_ioctl_kernel+0x150/0x270
 drm_ioctl+0x328/0x6d0
 radeon_drm_ioctl+0x94/0x174 [radeon]
 sys_ioctl+0x338/0x1144
 system_call_exception+0x10c/0x1f0
 ret_from_syscall+0x0/0x2c

3 locks held by X/331:
 #0: c5ac004c (&rdev->exclusive_lock){.+.+}-{3:3}, at: radeon_gem_create_ioctl+0x68/0x20c [radeon]
 #1: c5ac1ea8 (&rdev->pm.mclk_lock){.+.+}-{3:3}, at: radeon_bo_create+0x1f0/0x3ec [radeon]
 #2: c7b4855c (reservation_ww_class_mutex){+.+.}-{3:3}, at: ww_mutex_trylock+0x22c/0x53c
irq event stamp: 483150
hardirqs last  enabled at (483149): [<c017df18>] note_gp_changes+0x2b4/0x2e4
hardirqs last disabled at (483150): [<c130ee58>] _raw_spin_lock_irqsave+0xd8/0xdc
softirqs last  enabled at (482937): [<c13107a0>] __do_softirq+0x490/0x4f8
softirqs last disabled at (483144): [<c000c658>] do_softirq_own_stack+0x58/0xa4

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 331 Comm: X Not tainted 6.6.0-PMacG4-dirty #19
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
==================================================================


This one also occurs most of the time:
==================================================================
BUG: KCSAN: data-race in copyout / interrupt_async_enter_prepare

read to 0xc26420dc of 4 bytes by task 1 on cpu 1:
 interrupt_async_enter_prepare+0x78/0x138
 do_IRQ+0x18/0x40
 HardwareInterrupt_virt+0x108/0x10c
 0xc26420dc
 0xa9041d9f
 copyout+0x64/0x130
 _copy_to_iter+0xc8/0x894
 get_random_bytes_user+0x23c/0x260
 sys_getrandom+0xd0/0x16c
 system_call_exception+0x10c/0x1f0
 ret_from_syscall+0x0/0x2c

write to 0xc26420dc of 4 bytes by task 1 on cpu 1:
 copyout+0x64/0x130
 _copy_to_iter+0xc8/0x894
 get_random_bytes_user+0x23c/0x260
 sys_getrandom+0xd0/0x16c
 system_call_exception+0x10c/0x1f0
 ret_from_syscall+0x0/0x2c

no locks held by systemd/1.
irq event stamp: 1056783
hardirqs last  enabled at (1056782): [<c0c095c0>] crng_make_state+0x2f8/0x490
hardirqs last disabled at (1056783): [<c000bc80>] interrupt_async_enter_prepare+0xf0/0x138
softirqs last  enabled at (1056741): [<c13107a0>] __do_softirq+0x490/0x4f8
softirqs last disabled at (1056712): [<c000c658>] do_softirq_own_stack+0x58/0xa4

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1 Comm: systemd Not tainted 6.6.0-PMacG4-dirty #19
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
==================================================================

Some data about the G4 DP:
0000:00:0b.0 Host bridge: Apple Inc. UniNorth 2 AGP
0000:00:10.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV350 [Radeon 9550/9600/X1050 Series]
0001:00:0b.0 Host bridge: Apple Inc. UniNorth 2 PCI
0001:00:12.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
0001:00:12.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
0001:00:12.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
0001:00:13.0 Mass storage controller: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller (rev 02)
0001:00:15.0 Serial controller: MosChip Semiconductor Technology Ltd. PCI 9865 Multi-I/O Controller
0001:00:15.1 Serial controller: MosChip Semiconductor Technology Ltd. PCI 9865 Multi-I/O Controller
0001:00:15.2 Parallel controller: Illegal Vendor ID Device 9865
0001:00:16.0 Network controller: Broadcom Inc. and subsidiaries BCM4306 802.11b/g Wireless LAN Controller (rev 02)
0001:00:17.0 Unassigned class [ff00]: Apple Inc. KeyLargo Mac I/O (rev 03)
0001:00:18.0 USB controller: Apple Inc. KeyLargo USB
0001:00:19.0 USB controller: Apple Inc. KeyLargo USB
0001:00:1b.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
0001:00:1b.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
0001:00:1b.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
0002:00:0b.0 Host bridge: Apple Inc. UniNorth 2 Internal PCI
0002:00:0d.0 Unassigned class [ff00]: Apple Inc. UniNorth 2 ATA/100
0002:00:0e.0 FireWire (IEEE 1394): Apple Inc. UniNorth 2 FireWire (rev 01)
0002:00:0f.0 Ethernet controller: Apple Inc. UniNorth 2 GMAC (Sun GEM)

Please find full dmesg and 6.6 kernel .config attached.

Regards,
Erhard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config_660-van_g4
Type: application/octet-stream
Size: 113920 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20231106/9c8189c1/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg_660_g4_09
Type: application/octet-stream
Size: 72390 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20231106/9c8189c1/attachment-0003.obj>


More information about the Linuxppc-dev mailing list