UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:137:13

Mathieu Malaterre malat at debian.org
Wed Jun 13 21:41:56 AEST 2018


On Wed, Jun 13, 2018 at 10:43 AM Mathieu Malaterre <malat at debian.org> wrote:
>
> On Wed, Jun 13, 2018 at 3:43 AM Michael Ellerman <mpe at ellerman.id.au> wrote:
> >
> > Mathieu Malaterre <malat at debian.org> writes:
> >
> > > Hi there,
> > >
> > > I have a reproducible UBSAN appearing in dmesg after a while on my G4
> > > (*). Could anyone suggest a way to diagnose the actual root issue here
> > > (or is it just a false positive) ?
> >
> > It looks like a real overflow, I guess the question is why are we seeing it.
> >
> > The first thing to work out would be what exactly is overflowing.
> >
> > Is it in here?
> >
> >         cfqg_stats_update_completion(cfqq->cfqg, rq->start_time_ns,
> >                                      rq->io_start_time_ns, rq->cmd_flags);
> >
> >
> > If so that would suggest something is taking multiple hours to complete,
> > which seems unlikely. Is time going backward?
>
> There is also something suspicious in the kern.log file:
>
> Jun 12 20:09:04 debian kernel: [    5.504182]
> ================================================================================
> Jun 12 20:09:04 debian kernel: [    5.508945] UBSAN: Undefined
> behaviour in ../drivers/rtc/rtc-lib.c:87:22
> Jun 12 20:09:04 debian kernel: [    5.513658] signed integer overflow:
> Jun 12 20:09:04 debian kernel: [    5.518211] 1193024 * 3600 cannot be
> represented in type 'int'
> Jun 12 20:09:04 debian kernel: [    5.522866] CPU: 0 PID: 1 Comm:
> swapper Not tainted 4.17.0+ #1
> Jun 12 20:09:04 debian kernel: [    5.527567] Call Trace:
> Jun 12 20:09:04 debian kernel: [    5.532200] [df4e7b00] [c0481074]
> ubsan_epilogue+0x18/0x4c (unreliable)
> Jun 12 20:09:04 debian kernel: [    5.537019] [df4e7b10] [c0481a14]
> handle_overflow+0xbc/0xdc
> Jun 12 20:09:04 debian kernel: [    5.541832] [df4e7b90] [c060d698]
> rtc_time64_to_tm+0x344/0x388
> Jun 12 20:09:04 debian kernel: [    5.546655] [df4e7bd0] [c001076c]
> rtc_generic_get_time+0x2c/0x40
> Jun 12 20:09:04 debian kernel: [    5.551477] [df4e7be0] [c06113d4]
> __rtc_read_time+0x70/0x13c
> Jun 12 20:09:04 debian kernel: [    5.556288] [df4e7c00] [c061150c]
> rtc_read_time+0x6c/0x130
> Jun 12 20:09:04 debian kernel: [    5.561088] [df4e7c30] [c061271c]
> __rtc_read_alarm+0x34/0x684
> Jun 12 20:09:04 debian kernel: [    5.565884] [df4e7ce0] [c060f234]
> rtc_device_register+0x88/0x218
> Jun 12 20:09:04 debian kernel: [    5.570695] [df4e7d40] [c060f428]
> devm_rtc_device_register+0x64/0xc4
> Jun 12 20:09:04 debian kernel: [    5.575528] [df4e7d60] [c09d15d4]
> generic_rtc_probe+0x50/0x78
> Jun 12 20:09:04 debian kernel: [    5.580359] [df4e7d70] [c055e4a4]
> platform_drv_probe+0xa8/0x128
> Jun 12 20:09:04 debian kernel: [    5.585210] [df4e7d90] [c0559d28]
> driver_probe_device+0x354/0x6fc
> Jun 12 20:09:04 debian kernel: [    5.590064] [df4e7dd0] [c055a270]
> __driver_attach+0x1a0/0x22c
> Jun 12 20:09:04 debian kernel: [    5.594917] [df4e7df0] [c0555b70]
> bus_for_each_dev+0x84/0xdc
> Jun 12 20:09:04 debian kernel: [    5.599750] [df4e7e20] [c0558420]
> bus_add_driver+0x188/0x348
> Jun 12 20:09:04 debian kernel: [    5.604584] [df4e7e40] [c055b7b4]
> driver_register+0xa0/0x18c
> Jun 12 20:09:04 debian kernel: [    5.609433] [df4e7e50] [c055e950]
> __platform_driver_probe+0x8c/0x198
> Jun 12 20:09:04 debian kernel: [    5.614330] [df4e7e70] [c0005800]
> do_one_initcall+0x64/0x280
> Jun 12 20:09:04 debian kernel: [    5.619237] [df4e7ee0] [c0997c04]
> kernel_init_freeable+0x3a4/0x444
> Jun 12 20:09:04 debian kernel: [    5.624145] [df4e7f30] [c00049f8]
> kernel_init+0x24/0x118
> Jun 12 20:09:04 debian kernel: [    5.629029] [df4e7f40] [c001b1c4]
> ret_from_kernel_thread+0x14/0x1c
> Jun 12 20:09:04 debian kernel: [    5.633878]
> ================================================================================
>
>
> Grep-ing all leads to:
>
> $ grep  "cannot be represented" kern.log | colrm 1 45|sort -u
>  1193022 * 3600 cannot be represented in type 'int'
>  1193024 * 3600 cannot be represented in type 'int'
>  1193032 * 3600 cannot be represented in type 'int'
>  1193033 * 3600 cannot be represented in type 'int'
>  1193034 * 3600 cannot be represented in type 'int'
>  1193035 * 3600 cannot be represented in type 'int'
>
> How come tm_hour can store a value of 1193035 ?

It appears that I am getting a negative value for  time64_t :

[    5.495470] devices_kset: Moving rtc-generic to end of list
[    5.502584]  mm DBG: TIME64_T -2068738348
[    5.507205] WARNING: CPU: 0 PID: 1 at ../drivers/rtc/rtc-lib.c:60
rtc_time64_to_tm+0x48/0x38c
[    5.511843] Modules linked in:
[    5.516306] CPU: 0 PID: 1 Comm: swapper Not tainted 4.17.0+ #6
[    5.520879] NIP:  c060d498 LR: c060d494 CTR: c00d49dc
[    5.525467] REGS: df4e7ad0 TRAP: 0700   Not tainted  (4.17.0+)
[    5.530115] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 88008282  XER: 00000000
[    5.534865]
               GPR00: c060d494 df4e7b80 df4e8000 00000019 00000001
0000032e 00000400 00000000
               GPR08: 00000003 00000001 c0aa4b38 00000000 24008882
00000000 c09dfc4c c09dfc28
               GPR16: 00000690 c0aedc6c c0aee7f8 c0aed13c 00000054
c0ae75cc c0aea2a4 00000000
               GPR24: c0d70000 c0a85850 df4e7c5c df4e7c5c df4e7ce8
de519040 df4e7c5c ffffffff
[    5.559346] NIP [c060d498] rtc_time64_to_tm+0x48/0x38c
[    5.564356] LR [c060d494] rtc_time64_to_tm+0x44/0x38c
[    5.569382] Call Trace:
[    5.574354] [df4e7b80] [c060d494] rtc_time64_to_tm+0x44/0x38c (unreliable)
[    5.579525] [df4e7bd0] [c001076c] rtc_generic_get_time+0x2c/0x40
[    5.584689] [df4e7be0] [c06114d4] __rtc_read_time+0x70/0x13c
[    5.589845] [df4e7c00] [c061160c] rtc_read_time+0x6c/0x130
[    5.594993] [df4e7c30] [c061281c] __rtc_read_alarm+0x34/0x684
[    5.600151] [df4e7ce0] [c060f334] rtc_device_register+0x88/0x218
[    5.605343] [df4e7d40] [c060f528] devm_rtc_device_register+0x64/0xc4
[    5.610584] [df4e7d60] [c09d25a8] generic_rtc_probe+0x50/0x78
[    5.615833] [df4e7d70] [c055e5a0] platform_drv_probe+0xa8/0x128
[    5.621106] [df4e7d90] [c0559e24] driver_probe_device+0x354/0x6fc
[    5.626387] [df4e7dd0] [c055a36c] __driver_attach+0x1a0/0x22c
[    5.631663] [df4e7df0] [c0555c6c] bus_for_each_dev+0x84/0xdc
[    5.636853] [df4e7e20] [c055851c] bus_add_driver+0x188/0x348
[    5.641906] [df4e7e40] [c055b8b0] driver_register+0xa0/0x18c
[    5.646850] [df4e7e50] [c055ea4c] __platform_driver_probe+0x8c/0x198
[    5.651841] [df4e7e70] [c0005800] do_one_initcall+0x64/0x280
[    5.656805] [df4e7ee0] [c0998bd8] kernel_init_freeable+0x3a4/0x444
[    5.661752] [df4e7f30] [c00049f8] kernel_init+0x24/0x118
[    5.666596] [df4e7f40] [c001b1c4] ret_from_kernel_thread+0x14/0x1c
[    5.671348] Instruction dump:
[    5.675944] 90010054 bf010030 9181002c 7c7f1b78 7cba2b78 90810018
7c651b78 3c60c093
[    5.680655] 7c862378 38636340 4bac3c1d 57e90ffe <0f090000> 3ca00001
80810018 7fe3fb78
[    5.685436] ---[ end trace 57955f2c4e7598fd ]---
[    5.690118]  mm DBG: SECS 4294904148
[    5.694648]  mm DBG: HOUR 1193028


> > cheers
> >
> > > (*)
> > > [41877.514338] ================================================================================
> > > [41877.514364] UBSAN: Undefined behaviour in
> > > ../include/linux/percpu_counter.h:137:13
> > > [41877.514373] signed integer overflow:
> > > [41877.514378] 9223352809007201260 + 41997676517838 cannot be
> > > represented in type 'long long int'
> > > [41877.514389] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
> > > [41877.514394] Call Trace:
> > > [41877.514411] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
> > > [41877.514422] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
> > > [41877.514437] [dffeddc0] [c043aaa8] cfq_completed_request+0x560/0x1234
> > > [41877.514446] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
> > > [41877.514460] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
> > > [41877.514469] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
> > > [41877.514482] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
> > > [41877.514496] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
> > > [41877.514508] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> > > [41877.514520] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> > > [41877.514533] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
> > > [41877.514541] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
> > > [41877.514549] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> > >                    LR = arch_cpu_idle+0x30/0x78
> > > [41877.514558] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
> > > [41877.514570] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
> > > [41877.514577] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
> > > [41877.514585] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
> > > [41877.514592] [c0ce5ff0] [00003444] 0x3444
> > > [41877.514597] ================================================================================
> > > [41886.390210] ================================================================================
> > > [41886.390236] UBSAN: Undefined behaviour in
> > > ../include/linux/percpu_counter.h:137:13
> > > [41886.390245] signed integer overflow:
> > > [41886.390250] 9223366156262940402 + 42006563339289 cannot be
> > > represented in type 'long long int'
> > > [41886.390260] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #54
> > > [41886.390265] Call Trace:
> > > [41886.390282] [dffedd30] [c047a5f8] ubsan_epilogue+0x18/0x4c (unreliable)
> > > [41886.390293] [dffedd40] [c047af98] handle_overflow+0xbc/0xdc
> > > [41886.390309] [dffeddc0] [c043a8c4] cfq_completed_request+0x37c/0x1234
> > > [41886.390317] [dffede40] [c03f595c] __blk_put_request+0xb0/0x2dc
> > > [41886.390331] [dffede80] [c05aa41c] scsi_end_request+0x19c/0x344
> > > [41886.390340] [dffedeb0] [c05abba0] scsi_io_completion+0x4b4/0x854
> > > [41886.390353] [dffedf10] [c040604c] blk_done_softirq+0xe4/0x1e0
> > > [41886.390367] [dffedf60] [c07eef84] __do_softirq+0x16c/0x5f0
> > > [41886.390379] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> > > [41886.390391] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> > > [41886.390404] [c0ce5e80] [c0009a2c] do_IRQ+0x98/0x1a0
> > > [41886.390411] [c0ce5eb0] [c001b93c] ret_from_except+0x0/0x14
> > > [41886.390420] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> > >                    LR = arch_cpu_idle+0x30/0x78
> > > [41886.390429] [c0ce5f70] [c0ce4000] 0xc0ce4000 (unreliable)
> > > [41886.390441] [c0ce5f80] [c00a3928] do_idle+0xc4/0x158
> > > [41886.390449] [c0ce5fb0] [c00a3b74] cpu_startup_entry+0x24/0x28
> > > [41886.390457] [c0ce5fc0] [c0988820] start_kernel+0x47c/0x490
> > > [41886.390463] [c0ce5ff0] [00003444] 0x3444
> > > [41886.390468] ================================================================================


More information about the Linuxppc-dev mailing list