[Cbe-oss-dev] [patch 04/16] Common Core of the Axon driver

Sat May 26 00:54:58 EST 2007

On Friday 25 May 2007, Jean-Christophe Dubois wrote:
> On Friday 25 May 2007 02:13:10 Arnd Bergmann wrote:
> > On Thursday 24 May 2007, jdubois at mc.com wrote:

> This core part of the Axon driver can be seen as a "nexus" driver that 
> attached to all the hardware resources, abstract them to a common API 
> whatever side (PCI/BEI) of the Axon you are and provide a set of more 
> elaborated/common services to "clients" drivers so that these clients don't 
> have to deal with hardware details (PCI or BEI) so much. These clients 
> drivers (like Ethernet over PCI, soft reset, buffer driver ...) are not 
> attached directly to the hardware resource but are just sharing them (MBX, 
> DMA, mapping, ...) through the Nexus Axon driver. This allows more than 
> one "client" driver to use DMA, MBX, ... from kernel.

Ok, I remember now how we discussed this when you posted the patches
the first time. We certainly need to have logic in there to make up
for the missing abstraction on the PCI interface.

AFAICS, what you Nexus does is that on BEI it basically collects all the
information about the Axon from the various device nodes in the firmware
device tree and provides a layer that shows the bare registers the way
that you have it on the PCIe interface, as a single device.

While that's a workable solution for this problem, I think doing it
the opposite way is cleaner: All the knowledge that you have now
duplicated in the device tree and in the nexus (to make it work
on PCIe) should go into the pci_driver implementation. That, in turn,
creates the low-level devices (mbx, dma) in the Linux device tree.

The you have a driver for each of these that can either bind to
the of_device that is already there on BEI, or to the device created
as a child of the PCI device node.

These drivers can register both an of_platform_driver and a driver
for whichever bus you attach the child devices to, but the actual
code in there would already be bus-agnostic.

> Now when I load a "client" driver in the kernel, I need to notify it about all 
> the Axon nexus devices available so far. In the same kind of idea if an 
> Axon "nexus" device is "added" or "removed" from the kernel, I need to notify 
> all the already registered "client" drivers that an Axon "nexus" driver is 
> found/gone.
>
> > > +static struct list_head axon_list;
> > > +static struct list_head driver_list;
> >
> > You shouldn't need any global lists like these, drivers and
> > devices are maintained by the driver core in Linux.
> 
> There is nothing for the "nexus" driver concept with attached "client" drivers 
> I could found. I need to store this kind of info somewhere.

It's not the exact concept, but close enough. Your registration
interface is in essence a 'struct bus_type' lookalike. Instead
of doing your own, you should use an existing one, which gives you
all the features of regular Linux drivers. E.g. you can find the
devices in sysfs, add attributes to them, and have autoloading
of driver modules.

You don't even need to create your own bus_type, if you
simply add a 'struct platform_device' for each of the child devices
in the axon PCI driver.

> > > +struct axon_mbox_t *axon_peer_mbox_get(axon_t * p_axon)
> > > +{
> > > +	return p_axon->peer_mbox_get(p_axon);
> > > +}
> > > +
> > > +EXPORT_SYMBOL(axon_peer_mbox_get);
> >
> > Why not EXPORT_SYMBOL_GPL? Shouldn't this be in the mailbox driver?
> 
> OK, I'll make everything EXPORT_SYMBOL_GPL.
> 
> About the location, it need to be "common". There are 2 type of mailboxes. The 
> Axon hardware MBX on the Cell side and the software emulation on the host 
> side. This code allow to point to the peer MBX regardless of the MBX type.

I would think that you rather have two different mailbox drivers, which
export the same symbols. It means that you can't load them both in the
same kernel, but AFAICS that is not a requirement, since only one of the
two devices is present at any given time.

For the users, you then simply have interfaces like axon_mailbox_write().

> > What are the drivers that you are iterating over? I would think that
> > you should iterate over the child devices instead.
> 
> Here I am iterating over the "client" drivers to tell them the ongoing soft 
> reset is done and they can start using the "nexus" driver again knowing that 
> the "remote" side has been completely reseted.

You only need to do that for the PCI driver, right? If so, you can have
a function in the pci driver that removes all its children, does the
reset, and then adds the children when the become available again.

> > > +#if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,9)
> >
> > huh?
> 
> Support for RH4 on the host ...

ok. the more conventional way to do these is to have a header
file that provides macros emulating the new interfaces on top
of older kernels. That lets you avoid having different code
in the C implementation.

> > > +	p_dma_pkt->addr =
> > > +	    dma_pool_alloc(p_dma_req->p_dma_pool, GFP_ATOMIC,
> > > +			   &p_dma_pkt->cpu_bus_addr);
> >
> > Do you plan to port this to an ISA based system? Seriously, on a modern
> > system you don't want to use dma_pool interfaces. Just use the dma-mapping
> > API.
> 
> This is to allocate the command packets for the DMA. It seemed appropriate ... 
> For the data transferred by the DMA we are using the dma-mapping API. Do you 
> think I should create/manage my own pool of DMA command packets?

Sorry, I was wrong here. There are only three users in the current
linux source tree, but now that I looked it up again, it does seem
to be exactly what you need. Sorry for the noise.

> > Don't add new files to procfs, it's already messy enough.
> > Instead add attributes with one value per file to the sysfs
> > representation of your device. Of course, that requires the
> > use of device abstraction in the first place.
> 
> I'll look at the sysfs thing. This is for debug help.

If it's only for debugging, the easiest way might be to split it
out to a separate patch, and not submit that for inclusion.
If you would still like to have it upstream, debugfs is the
right place to put stuff like this.

procfs and sysfs both require that you never change the format
of the file, which would break user space applications relying
on it. debugfs is meant for ad-hoc information exchange.

> > > +static void axon_mbox_hw_stop_delayed_work(axon_mbox_hw_t *
> > > p_axon_mbox_hw) +{
> > > +	if (!p_axon_mbox_hw->irq_poller_wq)
> > > +		return;
> > > +
> > > +	cancel_rearming_delayed_workqueue
> > > +	    (p_axon_mbox_hw->irq_poller_wq, &p_axon_mbox_hw->irq_poller);
> > > +
> > > +	destroy_workqueue(p_axon_mbox_hw->irq_poller_wq);
> > > +
> > > +	p_axon_mbox_hw->irq_poller_wq = NULL;
> > > +}
> >
> > You shouldn't destroy the workqueu when stopping a work item in it.
> > Creating a workqueue is a very heavyweight operation that you should
> > not do during normal operation. Do you really need your own workqueue?
> 
> This is not a "normal" operation. The MBX driver is going away so I need to 
> cleanup everything.

Ok, so it's probably just named in a confusing way. Maybe use
something like axon_mbox_hw_shutdown_wq.

> >
> > What does this do that can't be done by the firmware?
> 
> This is common code used on the host and the Cell. On the host you are getting 
> 2 x 128M PCI Bars to MAP the 1TB Axon memory map. So you need to reprogram 
> the mapping on the fly depending on you target address on the PLB when doing 
> PIO. We don't do much PIO (we try to use the DMA as much as possible) but 
> when required you need the support.

Right, I remember now that you had already explained this before.

> > > +#define AXON_READ(size,type) \
> > > +static int axon_pio_pci_read##type(axon_pio_t* p_axon_pio, plb_addr_t
> > > src, u##size * p_data) \ +{ \
> > > +	int ret = 0; \
> > > +	axon_pio_pci_t* p_axon_pio_pci = p_axon_pio -> context; \
> > > +	void* src_cpu_addr; \
> > > + \
> > > +	 \
> > > +	src_cpu_addr = do_ioremap( p_axon_pio_pci, src, size / 8); \
> > > +	if ( src_cpu_addr != NULL ) { \
> > > +\
> > > +		*p_data = read##type( src_cpu_addr ); \
> > > +\
> > > +		do_iounmap( p_axon_pio_pci, src, src_cpu_addr ); \
> > > +\
> > > +	} else {\
> > > +		dbg_err("Unable to map IO space located on the bus 0x%p(%lld bytes)
> > > \n", src_cpu_addr, (unsigned long long)size / 8);\ +		ret = -EIO;\
> > > +	}\
> > > +\
> > > +	return ret;\
> > > +}
> >
> > Why do you need to access the I/O space? Are there registers that are not
> > in memory BARs? Note that on powerpc, we ioremap the whole I/O space
> > at boot time, so you can simply call the out{b,w,l} macros to write into
> > them, and on x86, you don't ioremap I/O ports in the first place.
> 
> See above. You need to reprogram the PIM to target the requested address on 
> PCI host.

Do you ever need to run on a 32 bit host? If not, I guess you can simply ioremap
the whole 256MB BAR, and then only reprogram the pim, but not the kernel
page tables.

	Arnd <><