[RFC/PATCH 0/16] Ops based MSI Implementation

Michael Ellerman michael at ellerman.id.au
Mon Jan 29 11:59:03 EST 2007


On Sun, 2007-01-28 at 16:38 -0700, Eric W. Biederman wrote:
> Benjamin Herrenschmidt <benh at kernel.crashing.org> writes:
> 
> >> Because it mixes concerns that do not need to be mixed, and it complicates
> >> the code.  The hypervisor has no need to understand how a hardware
> >> device is built, and how it's registers operate.  It just needs to
> >> know that the given hardware device will generate an msi message on
> >> the bus.
> >
> > I'd be happy for you to go explain your view of what an hypervisor
> > should or should not do to IBM HV architects :-) 
> 
> Sure think you can setup a meeting, or give me an email introduction to them.
> 
> > But in the meantime,
> > that's how they have defined it and how it's been implemented and how we
> > have to support it. And I have a strong feeling that they won't be the
> > only ones to do it that way (I'd like to be proven wrong tho).
> 
> No the general linux kernel does not have to support it, and if we don't
> I suspect that message would get back fairly clearly to the IBM HV architects.

OT, but: No, it wouldn't. It would just play into the hands of people
who think Linux is immature, unpredictable and risky. IBM has another
UNIX remember.

> I haven't been watching closely but I haven heard any rumors on the x86
> side that they are looking in that direction.
> 
> >> Right, so some way needs to be found to cope with that situation.
> >> Likely that involves bypassing all of the code that talks directly to
> >> the hardware for MSI.
> >
> > Which can be done by having the alloc() and free() hooks do all the work
> > provide they aren't done per-msi but per-call like in Michael's
> > approach. That is, in the MSI-X case, alloc is called once for all of
> > the MSI-X requested.
> >
> > I understand that this conflicts with your idea of requesting new MSI-X
> > on the fly but I don't think that trying to add/remove MSI-X that way is
> > a sane approach anyway. If you are concerned about HW problems, I think
> > by doing so, you'll indeed hit them hard.
> 
> That isn't even the reason it is that way.  It is because allocating
> 4096 irqs in a single vector is a bad idea, and because it requires you
> to pass type information of what kind of msi you are dealing with to the
> lower levels in an allocation routine that make it bad idea.  Because
> if you don't consider the IBM HV it provides not benefit and just puts
> unnecessary loops, and type information in architecture code.

I'm not sure what the issue with 4096 irqs is.

As far as passing type information to the alloc routine, it's only there
_if_ the alloc routine needs it.

If you'd prefer we could not pass the type explicitly to the alloc
routine, but rather just have it sitting in the msi_info ... which is
exactly what the current code does, the type is stored in the msi_desc.

What we could do is move the msi_msg into the msix_entry struct, then we
could do alloc like below and remove the need for setup_msi_msg:

int arch_setup_msi_irqs(struct pci_dev *pdev, int num,
                       struct msix_entry *entries)
{
        int i;

        for (i = 0; i < num; i++) {
       		int irq, ret;
       		irq = create_irq();
       		if (irq < 0)
               		return irq;

       		set_irq_msi(irq, desc);
       		ret = msi_compose_msg(dev, irq, &entries[i].msg);
       		if (ret < 0) {
               		destroy_irq(irq);
               		return ret;
       		}

                entries[i].vector = irq;

       		set_irq_chip_and_handler_name(irq, &msi_chip, handle_edge_irq, "edge");
        }
}

Which is almost exactly the same as the current code, except it's inside
a for loop - and it saves the msg and vector to be written later by the
enable hook - which will basically be write_msi_msg() in a loop.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20070129/aa740202/attachment.pgp>


More information about the Linuxppc-dev mailing list