[Cbe-oss-dev] [PATCH 3/3] spufs context switch - fix interrupt routing

Benjamin Herrenschmidt benh at kernel.crashing.org
Thu Apr 24 08:18:33 EST 2008


On Wed, 2008-04-23 at 16:58 -0300, Luke Browning wrote:
> > I also don't see how one would overwrite exception data as only hash
> > misses can write there and there cannot be two pending at once. What do
> > I miss here ?
> 
> >From 770bc074cc4ef45c450eb172f994e8a1425a3666 Mon Sep 17 00:00:00 2001
> From: Jeremy Kerr <jk at ozlabs.org>
> Date: Fri, 4 Apr 2008 17:55:28 +1100
> Subject: [PATCH] [POWERPC] cell: Fix lost interrupts due to fasteoi handler

 .../...

Well, that's orthogonal :-) That's a bug I found with Jeremy and that
affects affinity setting for any interrupt, and needs to be fixed (which
it was), It wasn't per-se incorrect to re-route the interrupt, there was
a bug and it has been fixed.

> ---
> Maybe I misunderstood but I thought Jeremy was saying that two processors
> was being interrupted at virtually the same time and the second interrupt 
> was dropped, because the first one had not completed.  

Well, it's more like the second interrupt happens right after the first
one, but on a different CPU. So close that the core hasn't yet cleared
IRQ_INPROGRESS.

There can be 2 class 1 happening so close that they basically look like
one interrupt to two processor, but that isn't a problem per-se, again.
The register lock will make sure only one guy fetches things.

But due to the edge nature of the IIC messages, it's important that if
the second one fires just after the first one's handler has read the
pending bits, the handler gets called again as a new bit might be been
set in between.

The bug was that we didn't do that, so if they were close enough -and-
moved to a different CPU, we would "lose" the second one.

> Jeremy's code changes are designed to make the interrupt handling
> re-entrant by synchronizing the execution of the second level interrupt
> handlers (slihs) spu_irq_class_0, spu_irq_class_1, and  spu_irq_class_2.

I don't think so. All the code change does is to properly take note that
the interrupt re-occured while marked IN_PROGRESS and re-call the
handler when that was the case. That's it. Just make sure we don't lose
any.

> These callouts are made sequentially from the same cpu one after the
> other almost immediately as each handler just records the current
> exception data in the csa and performs a thread wakeup.  But the
> exceptions are not really handled yet.  The controlling thread needs to
> run, perform some virtual memory operations, and perform a dma restart
> for the exception to be truly handled and this takes a lot of time as
> the thread needs to be scheduled.  Therefore, if an spu is generating
> multiple spus at virtually the same time, we have a problem as the
> second call out will overwrite the exception data presented with the
> first exception.

It shouldn't as it shouldn't be the same type of interrupt. Only the
hash miss can write to the CSA, not the segment miss. The only case I
know where two interrupts happen back to back is a segment miss followed
by a hash miss for the same address.

There should never be two hash misses unless there is a restart in
between.

Ben.




More information about the cbe-oss-dev mailing list