PPC405EX based irq flooding with USB-OTG and usbserial device

Hunter Cobbs hunter.cobbs at gmail.com
Sat May 23 22:44:26 EST 2009


Egads!  Forgot to respond to the list!

My git checkout failed last night, so I'm downloading the resource cd, but I
can tell you what I did before I get the actual patch done, and you can tell
me if my logic is sound.

First thing I thought when I saw this is WHY use IRQ based methods to access
a USB controller with internal DMA transfers?  I tried in vain to enable
this with the driver module parameters(which I dug up how to specify module
parameters to built-in drivers from an old 2.2-series kernel discussion).
So, then I put on my boots and started slogging throught the driver.

Getting frustrated with that line of execution, I turned up the verbosity on
the kernel compile and noticed a warning in the dwc_otg compilation.
Specifically that a left and right shift go out of bounds of the variables
used.  The only place this occurs is in a section of code that is wrapped
with DMA_64BIT.  Which made absolutely no sense because the DMA controller
on the 405EX is only 32 bits wide.  On tracking this define down, I come to
find out that someone made the assumption that the 44x and the 405EX/r all
have the same DMA controller.  Which is incorrect, they both have the same
control register definitions(the offset of 1 due to the MSBit being reserved
and the register being in Big Endian mode); however, the 44x is 64bits and
the 405 is 32bits.  So, I broke the DMA control down into two areas,
data-width and control register offsets.

When this still didn't fix the problem, I found yet another section that can
force you to operate in slave(irq) mode only wrapped in yet another define.
When I search out that define (DWC_SLAVE I believe), I find it in the
dwc_otg Makefile.

Correcting both of these has enabled full DMA access to the USB, and I'm
doing much better with my sierra wireless dev kit.

On Sat, May 23, 2009 at 7:11 AM, Chuck Meade <chuckmeade at mindspring.com>wrote:

> Hunter Cobbs wrote:
> > Hello everyone,
> >
> > This is my first post to the PPC dev list as my company has just started
> > developing a new project based on Linux.  The good news is, this post is
> > not debug-related as much as it is an introduction and query while I
> > download the latest DENX kernel(only place I know that has the DWC_OTG
> > driver).
> >
> > I've been working with a Kilauea dev board and have had lots of trouble
> > when I plug in a sierra-wireless modem dev kit on the USB.  It goes fine
> > untill I actually try to communicate(pppd or minicom) with the little
> > bugger and then my IRQs go through the roof.  And they only calm back
> > down after I shut down my communicaiton channel.
> >
> > I've solved this issue with our board, and was wondering if it has since
> > been fixed (I'm running 2.6.25-DENX).  I don't want to waste the board's
> > time with a patch that is no longer necesarry.
> >
> > --
> > Hunter Cobbs
>
> Hello Hunter,
>
> It would absolutely *not* be a waste of anyone's time.  I for one would
> like
> to see how you solved this.  I am dealing with the same problem, with the
> same
> setup.
>
> The underlying cause for this problem is the PPC405EX CPU's erratum USBO_9.
> The USB 2.0 PING protocol is supposed to handle a PING transaction in
> the hardware -- note that in USB 2.0, a PING is the method used by the
> sender to
> determine if it can send.  If I remember correctly, erratum USBO_9 is
> caused when
> a NAK response from the PING transaction is handled not in hardware, but
> instead
> as an interrupt in software, and that NAK leads to a lot of processing.  In
> the
> 2.6.25 Denx Linux tree that I used, that processing ends up trying to
> restart the
> channel, restart the send, which leads to yet another PING/NAK sequence,
> yet another
> interrupt...
>
> The end result is that you get over 100,000 interrupts (with significant
> interrupt
> handling logic) per second, and the target can't do anything else.  I was
> able
> to get this interrupt count by looking at /proc/interrupts, then causing
> this problem
> for 20 seconds, then pulling out the USB modem physically (mine is on a
> Express card)
> to stop the interrupt storm, then checking /proc/interrupts again.
>  Averaged over
> 100,000 ints/sec.
>
> In contact with AMCC, they told us they are not respinning the CPU (at
> least not
> at this time) to fix this erratum.
>
> I have tried to solve the problem as suggested by the erratum, by not
> allowing the
> NAK interrupt handling to *directly* cause a retry of the send, but rather
> to wait
> until the next SOF interrupt (start of microframe, which happens 8,000
> times per sec)
> to restart it.  "Breaking the chain" like this does allow the board to
> proceed, but
> I think it is suboptimal, or at least unfortunate.
>
> One painful side effect of this workaround is that you cannot disable the
> 8,000 SOF
> interrupts/second, or at least some of them, since they are being used now
> for another
> purpose -- recovery from the erratum.
>
> The 8000 SOF ints being handled per second do cause a measurable drain on
> the
> CPU.  In some cursory testing we see a 10% slowdown of certain transactions
> in
> lmbench.
>
> So please send me your patch for the dwc_otg driver.  I am very interested
> in what
> you did, and if it perhaps is a better solution for the problem we both are
> seeing
> than what I implemented.
>
> Thanks in advance,
> Chuck
>
>


-- 
Hunter Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20090523/94299f03/attachment.htm>


More information about the Linuxppc-dev mailing list