[Cbe-oss-dev] [patch 05/16] powerpc and x86_64: Initial port of PCIe endpoint device driver for Axon

Jesse P Arroyo arroyoj at us.ibm.com
Wed Mar 12 02:17:48 EST 2008


Arnd, thank you for your comments.  The review and constructive
criticism is very much appreciated.

Arnd Bergmann <arnd.bergmann at de.ibm.com> wrote on 03/11/2008 01:24:09 AM:
> > +const uint32_t chnl_status_mask[] =
> > +   {0x88000880, 0x40800440, 0x20080220, 0x10008110};
> 
> For mostly historical reasons, we tend to use types like 'u32'
> in the kernel instead of the standard uint32_t.
> 
Will fix.  All uint32_t types are carry over that need cleanup.

> > +inline void dmax_get_base_ctl(struct dmax_desc *desc)
> 
> static inline, please
Will fix.

> > +static inline int dmax_get_next_desc_index(struct dmax_chn *chan_p)
> > +{
> > +   u32 head = chan_p->desc_head;
> > +   chan_p->desc_head++;
> > +   chan_p->desc_head = chan_p->desc_head % chan_p->desc_ring_size;
> > +   return head;
> > +}
> 
> This code seems to lack locking around the access. If you use the trick
> I described above, you can avoid adding locking by using atomic updates,
> which you can't easily do in your algorithm.
>
Currently calls to any descriptor get/return functions are only made under
our per-DMA channel spin lock.  One other thing to note is that there is a
case in the driver today where we actually move the head pointer 
backwards.
This is done when we can't get all of the descriptors we need for the
descriptor list build.  The problem comes that we don't know how many we
need to satisfy a build given the two lists the user has passed us, and
assuming the worst case up front can prevent more DMAs from being issued
than we want.

> > +/**
> > + * dmax_seq_plb5 - get the plb5 addr to write the seequnce number via 
DMA
> > + * @chan_p:
> > + *
> > + */
> > +static u64 dmax_seq_plb5(struct dmax_chn *chan_p)
> > +{
> > +   int chan_num = chan_p->channel_num;
> > +
> > +   u64 local_d2d_plb5 = local_phys_to_plb5(chan_p->dev->axon_priv,
> > +         chan_p->dev->axon_priv->local_d2d_dma_handle);
> > +   return local_d2d_plb5 + offsetof(struct axon_d2d,
> > +               dma_channel_seq_complete[chan_num]);
> > +}
> 
> I believe we have discussed this in the past, but I may have forgotten
> your arguments. AFAICS, all the conversion between PCI, PLB and IOIF
> addresses should be handled by the dma api, if you provide an 
appropriate
> dma_mapping_ops implementation.
> 
The new driver architecture that Brett Bolen has been working on will make
dealing with this a little more straightforward, I think, so we have 
delayed
those type of changes until then.

> Since this is just a simple lookup table, why not write it in a more
> compact way, as
> 
>    static const char *plb_error_table[13] = {
>       [0] = "Parity Error on address for a write",
>       [1] = "Parity Error on BE for a write",
>       ...
>    };
>    printk(KERN_ERR "PLB5: %s\n", plb_error_table[RS_ERROR(pse)];
> 
Makes sense.

> > +      rc = 1;
> > +   }
> > +
> > +   return rc;
> > +}
> 
> Always use negative errno.h constants to indicate an error, or 0 for
> success.
> 
Thought I caught all of those... drat.

> You can use queue_delayed_work() to write this in a simpler way, it
> will set up a timer itself for calling the work function.
> 
makes sense.

> > +static inline int dmax_select_chnl(struct dmax *dmax_p, unsigned 
> long flags)
> 
> the flags argument is unused, is that on purpose?
Legacy parameter - we used to base the channel selection on flags from the
requested transfer (client type, etc). 

> 
> > +/**
> > + * xfer_add_interrupt - enable a completion interrupt for the given 
xfer
> > + * @xfer:
> > + *
> > + */
> > +static void xfer_add_interrupt(struct dmax_xfer *xfer)
> > +{
> > +#ifdef CONFIG_TB_AXON_PCI
> > +   /*
> > +    * we set up a dummy descriptor for this case
> > +    * move the source address to the right spot
> > +    */
> > +   u32 addr_lo = (xfer->chan_p->plb5_int_addr) & 0xFFFFFFFF;
> > +   xfer->last_desc->src_addr_lo = cpu_to_be32(addr_lo);
> > +#else
> > +   /* just tag the last descriptor with a BIE */
> > +   xfer->last_desc->ccw |= cpu_to_be32(BIE);
> > +#endif
> > +}
> 
> For functions that behave in different ways on PCI and on the BIF,
> you may need a callback to do the right stuff in the hardware specific
> driver.
> 
> In this case, I don't understand what the difference is about, can you
> explain?
> 
For each requested DMA we have a descriptor list that looks like:
[ Data descriptor 0 ]
[ Data descriptor 1 ]
.
[ Data descriptor n ]
[ Sequence @ update descriptor]

For the DMAs that are sourced by the driver for the Cell attached as a
southbridge, the last descriptor uses the native DMAx barrier interrupt.
This code turns that Barrier Interrupt Enable bit on in the final desc.

For the DMAs that are sourced by the driver for the Cell attached as a
PCI endpoint, we add another descriptor to the end of that list at build
time.
[ Interrupt generation descriptor]
This descriptor will write the UTL ISSR register to generate a PCI int.
For this driver, not all DMAs require an interrupt, so we maintatin two
locations in memory for each channel to DMA from.  One that has the 
correct bit set to cause the interrupt, and one that just has a 0 value.
This enables use to chain DMAs together and set/unset interrupts on
per transfer basis, and not need to add an extra descriptor later.


> > +static void dmax_link_ring_descriptors(struct dmax *dmax_p,
> > +               struct dmax_chn *chan_p,
> > +               int chan_num)
> > +{
> > +   struct dmax_desc *desc;
> > +   int num_descs = chan_p->desc_ring_size;
> > +   u64 base_plb5_addr = dmax_phys_to_plb5(chan_p->dev, 
chan_p->desc_da);
> > +   u32 base_plb5_addr_high = cpu_to_be32(base_plb5_addr >> 32);
> > +   u32 base_plb5_addr_low = base_plb5_addr & 0xFFFFFFFF;
> 
> This looks wrong: If you do cpu_to_be32() on the upper half, why
> not on the lower half here?
> 
The upper value is reused for each descriptor, so we do cpu_to_be32 once.
The lower is calculated per-descriptor in the loop just below the above.
        for ( i = 0; i < num_descs - 1; i++ ) {
                desc = &chan_p->descs[i];
                desc->next_sg_addr_hi = base_plb5_addr_high;
                desc->next_sg_addr_lo =
                        cpu_to_be32( base_plb5_addr_low +
                        ((i+1) * sizeof(struct dmax_desc)));
        }


> Why is a DMA for apnet different from the other DMAs? Can you move
> this function to the apnet driver and call axon_request_dma() from
> there?
> 
For the new architecture driver, this is an already planned change to
have a unified interface that does not assume the client type.

> > +/**
> > + *  @struct  struct dma_sg_list
> > + * This structure contains the SG list for a DMA transfer
> > + */
> > +struct dma_sg_list {
> > +   void *start_desc;   /* Starting descriptor */
> > +   void *last_desc;   /* Final descriptor */
> > +   u64 count;      /* Total bytes to transfer*/
> > +   u64 num_descs;
> > +
> > +   struct axon_dma_list *src;    /* Map of source addresses*/
> > +   struct axon_dma_list *dest;    /* Map of destination addresses*/
> > +};
> 
> Why do you define a new sg_list instead of using the one from
> linux/scatterlist.h?
This is an internal tracking structure used during the build of the
hardware descriptor list.  The client passes us two of the struct
axon_dma_lists, which have multiple elements, and there is no implied
alignment between the two.  The start_desc and last_desc also have
uses for tracking, cleanup, and debug per transfer.
The start_desc can be eliminated, and both start_desc and last_desc
should be of type struct dmax_desc * rather than void *

> 
> > +/**
> > + * dmax_find_channel -
> > + * @irq:
> > + *
> > + */
> 
> Why do you need this function? You currently pass down the struct dmax
> into the interrupt handler, why not pass a channel specific structure
> instead?
> 
Noted and on the list - this was cleanup that hasn't happened yet.

> > +   dmax_p->signature = dmax_signature;
> > +   dmax_p->index = dmax_driver.dma_engines;
> > +   dmax_p->dev = &(ofdev->dev);
> > +   dmax_p->chn_shift = 0x0;
> > +   sprintf((char *)&dmax_p->name[0], "dmax%d", 
dmax_driver.dma_engines);
> > +   dmax_p->local_phys_plb5_addr = DMAX_BE_XDR_BASE;
> 
> DMAX_BE_XDR_BASE should not be hardcoded here. On Open Firmware, all the
> addresses are encoded in the device tree.
>
Will fix.

> > +   dmax_driver.dma_engines++;
> > +
> > +   tmp = ofdev->node->full_name;
> > +   ret = of_address_to_resource(ofdev->node, 0, &r);
> > +   if (!ret) {
> > +      dmax_p->plb5_base_addr =
> > +         (__u8 *)ioremap(r.start, r.end - r.start + 1);
> 
> You can use of_iomap() to do both of_address_to_resource and ioremap
> in one call. The typecast is both unnecessary and incorrect, as
> plb5_base_addr needs to be an __iomem pointer.
> 
Will fix this - more carry over code from a earlier kernel (2.6.20?)

> 
> > +
> > +#define DMAX_0   0
> > +#define DMAX_1   1
> > +#define DMAX_2   2
> > +#define DMAX_3   3
> > +#define NUM_DMAX_CHAN   4
> 
> These definitions look like they should go into the dmax driver, not a
> header file, because they do not define an interface between two 
components
> of the kernel.
> 
This is not an interface, just a set of HW definitions.

The dmax_hw.h file is used by:axon_dmax.c, axon_dmax_sb.c, axon_dmax_pci.c
Once we move these drivers to the (fixed) new architecture, this will be]
cleaned up.

> > +#define DMAX_DDR0_BASE   0x0000000000000000
> > +#define DMAX_DDR1_BASE   0x2000002000000000
> > +
> > +#define DMAX_BE_XDR_BASE 0x6000006000000000
> > +#define DMAX_BE_XDR_BASE_HI 0x60000060
> 
> This does not look like an interface at all, but rather like something
> that is hardware specific. In case of the PCI driver, it might go into
> the driver implementation, while on the cell blade, it needs to come
> from the device tree.
That is correct.

> 
> > +#define DMAX_IRQ_0   119
> > +#define DMAX_IRQ_1   120
> > +#define DMAX_IRQ_2   121
> > +#define DMAX_IRQ_3   122
> > +#define DMAX_IRQ_0_ERR   123
> > +#define DMAX_IRQ_1_ERR   124
> > +#define DMAX_IRQ_2_ERR   125
> > +#define DMAX_IRQ_3_ERR   126
> > +#define DMAX_SLAVE_ERR   127
> > +#define DMAX_NUM_IRQ      9
> 
> These obviously have no place in a header file whatsoever. Interrupt
> numbers are specific to the physical interrupt controller and do
> not carry a significance in the device driver at all, they get remapped
> into virtual interrupt numbers.
> 
Ugh - we don't use those, and I haven't cleaned up that header file.

> > +/*
> > + * DMAX Descriptor - MUST by 64 bytes long
> > + */
> > +
> > +struct dmax_desc{
> > +   uint32_t ccw;        /* 0x0 DMA Channel Control Word */
> > +   uint32_t count_control;     /* 0x4 Count and Control */
> > +   uint32_t data_stride;     /* 0x8 Data Stride */
> > +   uint32_t reserved;     /* 0xc */
> > +   uint32_t src_addr_hi;     /* 0x10 Source Address High */
> > +   uint32_t src_addr_lo;     /* 0x14 Source Address Low */
> > +   uint32_t dest_addr_hi;     /* 0x18 Destination Address High */
> > +   uint32_t dest_addr_lo;     /* 0x1c Destination Address Lo */
> > +   uint32_t next_sg_addr_hi; /* 0x20 Linked Next Scatter/Gather
> > +                 Descriptor Address High */
> > +   uint32_t next_sg_addr_lo; /* 0x24 Linked Next Scatter/Gather
> > +                 Descriptor Address Lo */
> > +   uint32_t reserved2;     /* 0x28 */
> > +   uint32_t reserved3;     /* 0x2c */
> > +   void    *next;        /* 0x30 Must be cleared to reserved
> > +                 prior to transfer */
> > +   uint32_t reserved5;     /* 0x38 */
> > +   uint32_t reserved6;     /* 0x3c */
> > +};
> 
> If you build on a 32 bit host, the structure is 60 bytes long, not 64.
> Since this is hardware specific, you should have some comment on
> it, and use __be32 or __le32 data types to show the endianess.
> 
Will fix this - we don't even use the next pointer anymore.
Those should just be two u32s, really all __be32.
At this point, I don't believe that any of the driver set claims to
work in a 32 bit kernel.

> These definitions should go into the driver, like many more of the
> definitions below these.
OK - if there are hardware definitions that used by both southbridge
and PCI flavors of the DMAx driver in different files, where should these
go.

> > +#define DESC_BARRIER (CE | BAR)
> > +
> > +#ifdef CONFIG_TB_AXON_PCI
> > +#define LAST_DESC_BARRIER DESC_BARRIER
> > +#else
> > +#define LAST_DESC_BARRIER (CE | BAR | BIE)
> > +#endif
> 
> This #ifdef hurts portability, maybe you can express this value
> as a member in the device structure.
> 
That can certainly be done.  This goes along with the differences
in the descriptor lists that I mentioned above.


> > +static inline void tbtime_get(union tbtime_t *tb)
> > +{
> > +   tb->tb = get_cycles();
> > +}
> 
> What's the point in this abstraction? I can see no advantage over just
> using the well-known get_cycles function directly in your driver.
>
That was something that went through a number of iterations, starting
back in the 2.6.16 driver code.  Eventually the code converged, and
this was just never fully factored out.  It will be.









More information about the cbe-oss-dev mailing list