powerpc/cell/axon-msi: fix MSI after kexec

Arnd Bergmann arnd at arndb.de
Sat Dec 13 06:19:50 EST 2008


Commit d015fe995 'powerpc/cell/axon-msi: Retry on missing interrupt'
has turned a rare failure to kexec on QS22 into a reproducible
error, which we have now analysed.

The problem is that after a kexec, the MSIC hardware still points
into the middle of the old ring buffer. We set up the ring buffer
during reboot, but not the offset into it. On older kernels, this
would cause a storm of thousands of spurious interrupts after a
kexec, which would most of the time get dropped silently.

With the new code, we time out on each interrupt, waiting for
it to become valid. If more interrupts come in that we time
out on, this goes on indefinitely, which eventually leads to
a hard crash.

The solution in this patch is to read the current offset from
the MSIC when reinitializing it. This now works correctly, as
expected.

Reported-by: Dirk Herrendoerfer <d.herrendoerfer at de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd at arndb.de>
---

Please apply when Dirk and Michael have given their Ack.
Should we have it in 2.6.28? Not sure if going from 'works sometimes'
to 'works never' counts as a regression. Most users won't be impacted,
because they don't use kexec on QS22.

diff --git a/arch/powerpc/platforms/cell/axon_msi.c b/arch/powerpc/platforms/cell/axon_msi.c
index 442cf36..548fa4e 100644
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -413,6 +422,9 @@ static int axon_msi_probe(struct of_device *device,
 			MSIC_CTRL_IRQ_ENABLE | MSIC_CTRL_ENABLE |
 			MSIC_CTRL_FIFO_SIZE);
 
+	msic->read_offset = dcr_read(msic->dcr_host, MSIC_WRITE_OFFSET_REG)
+				& MSIC_FIFO_SIZE_MASK;
+
 	device->dev.platform_data = msic;
 
 	ppc_md.setup_msi_irqs = axon_msi_setup_msi_irqs;



More information about the Linuxppc-dev mailing list