Question: handling early hotplug interrupts
Daniel Henrique Barboza
danielhb at linux.vnet.ibm.com
Wed Aug 30 09:53:20 AEST 2017
Hi Ben,
On 08/29/2017 06:55 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-08-29 at 17:43 -0300, Daniel Henrique Barboza wrote:
>> Hi,
>>
>> This is a scenario I've been facing when working in early device
>> hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn
>> the guest of the event, then the kernel fetches it by calling
>> 'check_exception' and handles it. If the hotplug is done too early
>> (before SLOF, for example), the pulse is ignored and the hotplug event
>> is left unchecked in the events queue.
>>
>> One solution would be to pulse the hotplug queue interrupt after CAS,
>> when we are sure that the hotplug queue is negotiated. However, this
>> panics the kernel with sig 11 kernel access of bad area, which suggests
>> that the kernel wasn't quite ready to handle it.
> That's not right. This is a bug that needs fixing. The interrupt should
> be masked anyway but still.
>
> Tell us more about the crash (backtrace etc...) this definitely needs
> fixing.
This is the backtrace using a 4.13.0-rc3 guest:
---------
[ 0.008913] Unable to handle kernel paging request for data at
address 0x00000100
[ 0.008989] Faulting instruction address: 0xc00000000012c318
[ 0.009046] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.009092] SMP NR_CPUS=1024
[ 0.009092] NUMA
[ 0.009128] pSeries
[ 0.009173] Modules linked in:
[ 0.009210] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3+ #1
[ 0.009268] task: c0000000feb02580 task.stack: c0000000fe108000
[ 0.009325] NIP: c00000000012c318 LR: c00000000012c9c4 CTR:
0000000000000000
[ 0.009394] REGS: c0000000fffef910 TRAP: 0380 Not tainted (4.13.0-rc3+)
[ 0.009450] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>
[ 0.009454] CR: 28000822 XER: 20000000
[ 0.009554] CFAR: c00000000012c9c0 SOFTE: 0
[ 0.009554] GPR00: c00000000012c9c4 c0000000fffefb90 c00000000141f100
0000000000000400
[ 0.009554] GPR04: 0000000000000000 c0000000fe1851c0 0000000000000000
00000000fee60000
[ 0.009554] GPR08: 0000000fffffffe1 0000000000000000 0000000000000001
0000000002001001
[ 0.009554] GPR12: 0000000000000040 c00000000fd80000 c00000000000db58
0000000000000000
[ 0.009554] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 0.009554] GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000001
[ 0.009554] GPR24: 0000000000000002 0000000000000013 c0000000fe14bc00
0000000000000400
[ 0.009554] GPR28: 0000000000000400 0000000000000000 c0000000fe1851c0
0000000000000001
[ 0.010121] NIP [c00000000012c318] __queue_work+0x48/0x640
[ 0.010168] LR [c00000000012c9c4] queue_work_on+0xb4/0xf0
[ 0.010213] Call Trace:
[ 0.010239] [c0000000fffefb90] [c00000000000db58]
kernel_init+0x8/0x160 (unreliable)
[ 0.010308] [c0000000fffefc70] [c00000000012c9c4] queue_work_on+0xb4/0xf0
[ 0.010368] [c0000000fffefcb0] [c0000000000c4608]
queue_hotplug_event+0xd8/0x150
[ 0.010435] [c0000000fffefd00] [c0000000000c30d0]
ras_hotplug_interrupt+0x140/0x190
[ 0.010505] [c0000000fffefd90] [c00000000018c8b0]
__handle_irq_event_percpu+0x90/0x310
[ 0.010573] [c0000000fffefe50] [c00000000018cb6c]
handle_irq_event_percpu+0x3c/0x90
[ 0.010642] [c0000000fffefe90] [c00000000018cc24]
handle_irq_event+0x64/0xc0
[ 0.010710] [c0000000fffefec0] [c0000000001928b0]
handle_fasteoi_irq+0xc0/0x230
[ 0.010779] [c0000000fffefef0] [c00000000018ae14]
generic_handle_irq+0x54/0x80
[ 0.010847] [c0000000fffeff20] [c0000000000189f0] __do_irq+0x90/0x210
[ 0.010904] [c0000000fffeff90] [c00000000002e730] call_do_irq+0x14/0x24
[ 0.010961] [c0000000fe10b640] [c000000000018c10] do_IRQ+0xa0/0x130
[ 0.011021] [c0000000fe10b6a0] [c000000000008c58]
hardware_interrupt_common+0x158/0x160
[ 0.011090] --- interrupt: 501 at __replay_interrupt+0x38/0x3c
[ 0.011090] LR = arch_local_irq_restore+0x74/0x90
[ 0.011179] [c0000000fe10b990] [c0000000fe10b9e0] 0xc0000000fe10b9e0
(unreliable)
[ 0.011249] [c0000000fe10b9b0] [c000000000b967fc]
_raw_spin_unlock_irqrestore+0x4c/0xb0
[ 0.011316] [c0000000fe10b9e0] [c00000000018ff50] __setup_irq+0x630/0x9e0
[ 0.011374] [c0000000fe10ba90] [c00000000019054c]
request_threaded_irq+0x13c/0x250
[ 0.011441] [c0000000fe10baf0] [c0000000000c2cd0]
request_event_sources_irqs+0x100/0x180
[ 0.011511] [c0000000fe10bc10] [c000000000eceda8]
__machine_initcall_pseries_init_ras_IRQ+0xc4/0x12c
[ 0.011591] [c0000000fe10bc40] [c00000000000d8c8]
do_one_initcall+0x68/0x1e0
[ 0.011659] [c0000000fe10bd00] [c000000000eb4484]
kernel_init_freeable+0x284/0x370
[ 0.011725] [c0000000fe10bdc0] [c00000000000db7c] kernel_init+0x2c/0x160
[ 0.011782] [c0000000fe10be30] [c00000000000bc9c]
ret_from_kernel_thread+0x5c/0xc0
[ 0.011848] Instruction dump:
[ 0.011885] fbc1fff0 f8010010 f821ff21 7c7c1b78 7c9d2378 7cbe2b78
787b0020 60000000
[ 0.011955] 60000000 892d028a 2fa90000 409e04bc <813d0100> 75290001
408204c0 3d2061c8
[ 0.012026] ---[ end trace e0b4d36daf3f8b2a ]---
[ 0.013850]
[ 2.013962] Kernel panic - not syncing: Fatal exception in interrupt
-------------
To reproduce it, what I did was to fire a pulse in the hotplug queue
right after CAS by
hacking QEMU code.
However, this can also be reproduced without changing QEMU by simply
hotpluging a
CPU/LMB after CAS using device_add.
[adding dgibson in CC in case he wants to comment]
Thanks,
Daniel
>
>> In my experiments using upstream 4.13 I saw that there is a 'safe time'
>> to pulse the queue, sometime after CAS and before mounting the root fs,
>> but I wasn't able to pinpoint it. From QEMU perspective, the last hcall
>> done (an h_set_mode) is still too early to pulse it and the kernel
>> panics. Looking at the kernel source I saw that the IRQ handling is
>> initiated quite early in the init process.
>>
>> So my question (ok, actually 2 questions):
>>
>> - Is my analysis correct? Is there an unsafe time to fire a IRQ pulse
>> before CAS that can break the kernel or am I overlooking/doing something
>> wrong?
>> - is there a reliable way to know when can the kernel safely handle the
>> hotplug interrupt?
> So I don't think that's the right approach. Virtual interrutps are edge
> sensitive and we will potentially lose them if they occur early. I
> think what needs to happen is:
>
> - Fix whatever's causing the above crash
>
> and
>
> - The hotplug code should check for pending events (check_exception ?)
> at boot time to enqueue whatever's there. It needs to do that after
> unmasking the interrupt and in a way that is protected from races with
> said interrupt.
>
> Cheers,
> Ben.
>
>
>> Thanks,
>>
>>
>> Daniel
More information about the Linuxppc-dev
mailing list