Question: handling early hotplug interrupts
Benjamin Herrenschmidt
benh at au1.ibm.com
Wed Aug 30 07:55:00 AEST 2017
On Tue, 2017-08-29 at 17:43 -0300, Daniel Henrique Barboza wrote:
> Hi,
>
> This is a scenario I've been facing when working in early device
> hotplugs in QEMU. When a device is added, a IRQ pulse is fired to warn
> the guest of the event, then the kernel fetches it by calling
> 'check_exception' and handles it. If the hotplug is done too early
> (before SLOF, for example), the pulse is ignored and the hotplug event
> is left unchecked in the events queue.
>
> One solution would be to pulse the hotplug queue interrupt after CAS,
> when we are sure that the hotplug queue is negotiated. However, this
> panics the kernel with sig 11 kernel access of bad area, which suggests
> that the kernel wasn't quite ready to handle it.
That's not right. This is a bug that needs fixing. The interrupt should
be masked anyway but still.
Tell us more about the crash (backtrace etc...) this definitely needs
fixing.
> In my experiments using upstream 4.13 I saw that there is a 'safe time'
> to pulse the queue, sometime after CAS and before mounting the root fs,
> but I wasn't able to pinpoint it. From QEMU perspective, the last hcall
> done (an h_set_mode) is still too early to pulse it and the kernel
> panics. Looking at the kernel source I saw that the IRQ handling is
> initiated quite early in the init process.
>
> So my question (ok, actually 2 questions):
>
> - Is my analysis correct? Is there an unsafe time to fire a IRQ pulse
> before CAS that can break the kernel or am I overlooking/doing something
> wrong?
> - is there a reliable way to know when can the kernel safely handle the
> hotplug interrupt?
So I don't think that's the right approach. Virtual interrutps are edge
sensitive and we will potentially lose them if they occur early. I
think what needs to happen is:
- Fix whatever's causing the above crash
and
- The hotplug code should check for pending events (check_exception ?)
at boot time to enqueue whatever's there. It needs to do that after
unmasking the interrupt and in a way that is protected from races with
said interrupt.
Cheers,
Ben.
>
> Thanks,
>
>
> Daniel
More information about the Linuxppc-dev
mailing list