[Cbe-oss-dev] [PATCH 3/3] spufs context switch - fix interrupt routing

Thu Apr 24 05:58:03 EST 2008

On Wed, 2008-04-23 at 14:08 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2008-04-22 at 23:00 -0300, Luke Browning wrote:
> > 
> > Interrupt routing must be programmed when mfc queue is in a quiescent state, 
> > either empty or stopped with no pending interrupts.  Otherwise, multiple
> > cpus could be interrupted for a given spu.  This is problematic for several
> > reasons.  The state of the mfc queue, the dma restart, needs to be performed
> > after virtual memory operations take place.  There is nothing to prevent 
> > concurrent exception handling in the slih and the thread level.  Also,
> > there is only one set of fields for dealing with exceptions in the csa, so
> > you could have exception data being over-written while it is being processed.
> > Many things could go wrong. 
> > 
> > Signed-Off-By: Luke Browning <lukebrowning at us.ibm.com>
> 
> While I agree that it's more "sane" to only change the routing while the
> MFC is quiescent, I fail to understand the problems you claim it would
> trigger if not.
> 
> First, the interrupt handler cannot be run on more than one CPU at once,
> that's guaranteed by the interrupt core.
> 
> Then, there is definitely something that prevents concurrent handling of
> exceptions though I don't totally understand why you mean by "slih" vs.
> "thread level" here. 
> 
> Finally, spurrious interrupts shouldn't be much of a problem as long as
> the MFC provides correct status bits anyway.
> 
> I also don't see how one would overwrite exception data as only hash
> misses can write there and there cannot be two pending at once. What do
> I miss here ?

>From 770bc074cc4ef45c450eb172f994e8a1425a3666 Mon Sep 17 00:00:00 2001
From: Jeremy Kerr <jk at ozlabs.org>
Date: Fri, 4 Apr 2008 17:55:28 +1100
Subject: [PATCH] [POWERPC] cell: Fix lost interrupts due to fasteoi handler

We may currently lose interrupts during SPE context switch, as we alter
the INT_Route register. Because the IIC uses a per-thread priority
status, changing the interrupt routing to a different thread means that
the IRQ is no longer masked by the priority status, so we end up with
two fasteoi IRQ handlers executing for the one irq_desc. The fasteoi
handler doesn't handle multiple IRQs, so drops the second one.

---
Maybe I misunderstood but I thought Jeremy was saying that two processors
was being interrupted at virtually the same time and the second interrupt 
was dropped, because the first one had not completed.  

Jeremy's code changes are designed to make the interrupt handling
re-entrant by synchronizing the execution of the second level interrupt
handlers (slihs) spu_irq_class_0, spu_irq_class_1, and  spu_irq_class_2.
These callouts are made sequentially from the same cpu one after the
other almost immediately as each handler just records the current
exception data in the csa and performs a thread wakeup.  But the
exceptions are not really handled yet.  The controlling thread needs to
run, perform some virtual memory operations, and perform a dma restart
for the exception to be truly handled and this takes a lot of time as
the thread needs to be scheduled.  Therefore, if an spu is generating
multiple spus at virtually the same time, we have a problem as the
second call out will overwrite the exception data presented with the
first exception.  There is only one set of fields in the csa.  Also, the
exception data is pulled out of the csa without any locking, so you
could end up with a mix of exception data, at least theoretically.

Anyway, this is just what I have observed through code review.  I don't
think the right answer is to batch up interrupts.  I think it is better
to just avoid the problem by not rerouting interrupts while the mfc is
active.

Regards,
Luke