usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

Tue May 10 17:23:59 AEST 2016

On Tuesday 10 May 2016 08:37:52 Benjamin Herrenschmidt wrote:
> On Mon, 2016-05-09 at 17:08 +0200, Arnd Bergmann wrote:
> > 
> > Unfortunately, I don't see any way this could be done in MIPS specific
> > code: There is typically a byteswap between the internal bus and the PCI
> > bus on big-endian MIPS systems, so the PCI MMIO ends up being little-endian,
> 
> Ugh ... not exactly, re-watch my talk on the matter :-) While there is
> a specific lane wiring to preserve byte addresss, in the end it's the
> end device itself that is either BE or LE. Regardless of any "bus
> endianness".

I found your slides on

http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-ref-big-little-endian-herrenschmidt.odp

but there are at least two more twists that you completely missed here:

- Some architectures (e.g. ARMv5 "BE32" mode in IXP4xx, surely some others)
  do not implement big-endian mode by wiring up the data lines between the
  bus and the CPU differently between big- and little-endian mode like
  powerpc and armv7 "BE8" do, but instead they swizzle the *address* lines
  on 8-bit and 16-bit addresses. The effect of that is that normal RAM
  accesses work as expected both ways, and devices that are accessed using
  32-bit MMIO ops never need any byteswap (you actually get "native
  endian") while MMIO with 8 and 16 bit width does something completely
  unexpected and touches the wrong register. Having an explicit byteswap
  on the PCI host bridge gets you the expected addresses again for 8-bit
  cycles but it also means that readl()/writel() again need to swap the
  data.

- Some other architectures (e.g. Broadcom MIPS) apparently are even fancier
  and use a strapping pin on the SoC flips the endianess of the CPU core
  at the same time as all the peripheral MMIO registers, with the intention
  of never requiring any byte swaps. I believe they are implemented careful
  enough to actually get this right, but it confuses the heck out of
  Linux drivers that don't expect this.

> > which matches the expected behavior of readl/writel. However, drivers
> > for non-PCI devices often use the same readl/writel accessors because
> > that is how it's done on ARMv6/ARMv7.
> 
> Even then, you can have on-SoC (non-PCI) devices that also have a
> different endianness from the main CPU. How does it work on ARM for
> example ? The device endianness should be fixed, regardless of the
> endianness of the core, no ?

ARMv6/v7 is uses BE8 mode like powerpc: each peripheral is fixed-endian
and you have to know what it is. Only Freescale managed to put identical
IP blocks on various (powerpc-derived) SoCs and have a subset of them
treat the access as little-endian while others remain big-endian, so all
those drivers now require runtime detection.

> > Doing it hardcoded by architecture is just the simplest way to deal
> > with it, working on the assumption that nothing actually needs the
> > runtime detection that Ben suggested. 
> 
> No, it's not an archicture problem. It's a problem specific to that one
> SoC that the device was synthetized to be a certain endian while it was
> synthetized differently on another SoC... that also happens to be a
> different architecture. But doesn't have to.
> 
> For example, we had in the past cases of both LE and BE EHCI
> implementations on the same architecture (PowerPC).

I understand this, but from what I see in this history of this particular
driver, all ARM and PowerPC implementations chose to use LE registers for
DWC2 because the normal approach for these is to not mess with endianess,
while presumably all MIPS users of the same block wired up the endian-select
line of the IP block to match that of the CPU core, again because it's
what you are expected to do on a MIPS based SoC.

So hardcoding it per architecture would make an assumption based on
the mindset of the SoC designers rather than strict technical differences,
and that can fail as soon as someone does things differently on any of
them (see the Freescale example), but I still think it's the easiest
workaround for backporting to stable kernels. A revert of the original
patch would be even easier, but that would break the one big-endian
MIPS machine we know about.

> > Detecting the endianess of the
> > device is probably the best future-proof solution, but it's also
> > considerably more work to do in the driver, and comes with a
> > tiny runtime overhead.
> 
> The runtime overhead is probably non-measurable compared with the cost
> of the actual MMIOs.

Right. The code size increase is probably measurable (but still small),
the runtime overhead is not.

	Arnd