[Cbe-oss-dev] [RFC 1/9] AXON - common driver core

Fri Dec 29 04:09:26 EST 2006

On Tuesday 26 December 2006 21:00, Arnd Bergmann wrote:
> On Friday 22 December 2006 17:38, Jean-Christophe Dubois wrote:
> > On Thursday 21 December 2006 22:13, Arnd Bergmann wrote:
> > > > +#include <axon_utils.h>
> > > > +#include <axon_io.h>
> > > > +#include <axon_dbg.h>
> > > > +#include <axon_defs.h>
> > > > +#include <axon_bitops.h>
> > > > +#include <axon_dma_ops_dma_aux.h>
> > >
> > > [style] Your own headers should move to a more appropriate place.
> > > Ideally, this is the local directory, with '#include "axon_foo.h"'
> > > lines. If that doesn't work, put them into include/asm.
> >
> > This can't be "include/asm" because it need to be available to all PCI-E
> > host architecture. So it need to be something "common". I didn't want to
> > go an pollute "include/linux" directly so I am actually seeking advice on
> > where architecture independent header files related to the Axon should
> > go.
>
> Yes, good point.
>
> This depends a lot in what way the header is shared. The interesting cases
> are:
>
> 1. only in one directory: easy, all headers go to that dir
> 2. arch specific, kernel only, but multiple locations: should
>    go to include/asm.
> 3. kernel only, but multiple architectures, few headers: include/linux
> 4. kernel only, multiple architectures, many headers: include/linux/foo/
>    or include/foo/.
> 5. shared with user space, linux specific: include/linux/{,foo/}, but
>    needs to be announced/discussed on linux-kernel at vger.kernel.org.
> 6. shared with user space and other OSs: include/foo, definitely needs
>    discussion.
>
> I guess that we're talking about 3 or 4, so first put them into
> include/linux/axon_*.h.

I guess 4 will do for the files in axon/include at this time. So what would be 
preffered? include/linux/axon/ or include/axon/ ?

Then I have a few files for user space in axon/usr/include ... 

> > > > +static int      axon_aux_dma_is_broken = 1;
> > > > +
> > > > +module_param(axon_aux_dma_is_broken, int, 0644);
> > > > +MODULE_PARM_DESC(axon_aux_dma_is_broken,
> > > > +		 "On FCAB the AuxDMA is broken and can't cross 256 bytes bondaries
> > > > without corrupting data. If you know your AuxDMA is not broken you
> > > > can set this parameter to 0");
> > >
> > > This should be detected from a device tree property of the dma
> > > device.
> >
> > On FCAB, the device tree is very minimal and doesn't have that much in
> > it. I am not in control of the device tree. It is provided by IBM and it
> > sounds like the FCAB device tree is stuck forever at that point. I'll see
> > if I can come up with my own device tree using DTC and the flattened
> > device tree.
>
> That's an interesting idea, using DTC should vastly simplify the round-trip
> time even for other devices where the firmware is not final. If you need
> kernel patches to integrate the flat device tree in this, please share them
> so we can use the same mechanism.

I'll try to get time to work on this. Is there a platform (not necessarily 
Cell) where the DTC is used/integrated yet?

> In the long run, is FCAB relevant at all? My understanding was that this
> was a prototype produced in extreme small quantities.

This is true. There must be 10 or so FCAB out there but I don't want to throw 
them to the pile yet. Beside the CAB still doesn't have a final usable device 
tree.

> > > Using 'volatile' in a variable declaration is practically always
> > > a bug. Why do you think you need it here?
> >
> > This could be a pointer accross the PCI-E bus to the other side memory
> > space. It has to be volatile when/if you were to write to it. Actually it
> > is assigned but not used so I guess I could remove the "volatile" thing
> > even if it sounds correct to me.
>
> If it goes across a PCI{,-E} bus, it needs to be __iomem, not volatile,
> and you can't dereference it at all, except with functions like ioread32,
> readl or in_be32 (depending on your needs). These functions internally
> cast to volatile as needed, so declaring the variable itself volatile
> is pointless and can only cause the compiler to skip some optimizations.
>
> readl/writel should be the right abstractions here, they also do
> the correct endianess conversion.
>
> > > > +	volatile u8    *p_dma_channel;
> > >
> > > volatile seems wrong here.
> >
> > it is a pointer to an IO space where live the DMA control registers. It
> > can be on the PLB or on DCR depending on the DMA (Aux_DMA, DMAX).
>
> s/volatile u8/u8 __iomem/ again. Since this is on PLB directly, not on PCI,
> you should use in_be32/out_be32 and friends to access them.

DMAX registers are accessed by the host too as they live on PLB. The host 
controls one channel for itself in our driver.

> > > __raw_writel and __raw_readl are somewhat underspecified, so I'd
> > > rather avoid them. out_be32 and in_be32 are probably what you
> > > want.
> >
> > remember, it also work on LE platform ...
>
> Exactly. __raw_writel accesses the data in the native endianess.
> {in,out}_{be,le}{16,32,64} encode the endianess of the register, so
> the CPU can do the right conversion.
>
> It may still get tricky if the registers themselves appear different
> depending on whether they are accessed though PLB5 or PCIe mappings.
> In that case, you should probably abstract them to use either out_be32
> (on PLB) or writel (on PCIe).

OK, I'll try to rethink the low level accessor layer ..

> With the current code, I expect that you get the endianess wrong if
> you plug the CAB card into a big-endian PCIe host.

I should try this in my PCI-E G5 system sometime.

> > > This is probably part of the comments you removed, but why do you need
> > > to map anything physically? This should all be taken care of by
> > > the firmware.
> >
> > On the PCI side you (typically) have 2 x 128 MB bar to map 1TB Axon
> > space. So you need to "allocate" bars/PIM, set them to target the
> > requested PLB address ... It is not a  1 to 1 mapping. It is a lot more
> > restricted over PCI-E (from the host).
>
> You mean you need to remap the register area at run time in order to access
> the whole space? What limits the BAR size on PCIe? I guess we'll have
> to live with this hack then, but it looks like a potentially huge
> performance bottleneck.

Yes, you might have to remap the PIM if you are trying to do PIO from the 
host. As I said you typically get 2 128MB BARs that you can further split in 
2 64MB window each. There is little hope to get anything able to map the all 
thing as most host BIOS will not allocate a lot of space to PCI-E 
enumeration. Most of the time there is not more than 2GB for all IOs 
including the GPU board. It seems that so far most BIOSes will fail for a BAR 
> 512MB.

Note that you don't have to remap that often if you try to use the DMAX engine 
to do most transfers. The DMAX access things from the PLB and can access the 
full 64bits address space.

> > Well we are not sure how the public Axon spec will be but in the one that
> > was provided to us, this is how things were defined. So we preffer to
> > stick to the spec definition and come up with translation rules rather
> > than hard to reconcile values between the spec and the code.
> >
> > I guess this is a choice. Is it really unacceptable?
>
> IMHO, it's fine to stay as close as possible to the spec if that one
> is available and you can have a comment pointing to the original
> definition next to it.

In theory this spec should come. The Axon IP owner is working on it last I 
heard. For now, I can't put any comment pointing to any spec.

> If your source code has to serve as a replacement for the actual
> specification, I would prefer it to be readable instead.
>
> > > You can't hardcode these numbers, that immediately breaks with
> > > newer kernels. Instead, read the numbers from the device tree.
> >
> > Remember that I still don't have a "new" OF tree and for the time being I
> > need to support the CAB/FCAB I have. As soon as I get the new device tree
> > and I can move to 2.6.19 or later this could go I guess (assuming I come
> > up with a solution for FCAB).
>
> ok
>
> > > Same for the offsets. The kernel must not care about them.
> >
> > Same reason ...
> >
> > In addition the host side of thing (opteron based for example) has no OF
> > tree to play with. So things need to be hardcoded a bit. Note that these
> > are Axon internal addresses and should always be valid (from the host or
> > the Cell).
>
> Interesting point. I guess on the host side, the device is uniquely
> identified at run time though its PCI ID, so having the offsets there would
> be common practice and we can't do much better than that.

Yes from the host you just see one PCI device with 3 BARs. Then you have to 
play with some PCI registers to reach the rest of the internal/PLB space.

> > > > +static inline void
> > > > +dma_aux_write(struct axon_dma_ops_dma_aux_t *p_axon_dma_ops_dma_aux,
> > > > +	      u32 reg, u32 value)
> > > > +{
> > > > +	axon_writel(value, p_axon_dma_ops_dma_aux->p_regs + reg);
> > > > +	wmb();
> > > > +}
> > >
> > > better use out_be32
> >
> > Think LE ..
>
> Ok, in this case I was referring to the wmb(), which is already
> implied by out_be32. axon_writel() itself then should be either
> out_be32 or writel, depending on whether it is compiled to go
> to PLB or PCI mappings.

I'll think about this...

> 	Arnd <><