[RFC/PATCH 0/16] Ops based MSI Implementation

Benjamin Herrenschmidt benh at kernel.crashing.org
Mon Jan 29 10:17:39 EST 2007


> Because it mixes concerns that do not need to be mixed, and it complicates
> the code.  The hypervisor has no need to understand how a hardware
> device is built, and how it's registers operate.  It just needs to
> know that the given hardware device will generate an msi message on
> the bus.

I'd be happy for you to go explain your view of what an hypervisor
should or should not do to IBM HV architects :-) But in the meantime,
that's how they have defined it and how it's been implemented and how we
have to support it. And I have a strong feeling that they won't be the
only ones to do it that way (I'd like to be proven wrong tho).

> Right, so some way needs to be found to cope with that situation.
> Likely that involves bypassing all of the code that talks directly to
> the hardware for MSI.

Which can be done by having the alloc() and free() hooks do all the work
provide they aren't done per-msi but per-call like in Michael's
approach. That is, in the MSI-X case, alloc is called once for all of
the MSI-X requested.

I understand that this conflicts with your idea of requesting new MSI-X
on the fly but I don't think that trying to add/remove MSI-X that way is
a sane approach anyway. If you are concerned about HW problems, I think
by doing so, you'll indeed hit them hard.

A driver who wants to modulate should really allocate all the MSI-X it
can possibly need and then enable/disable depending on its needs, I
don't trust hardware to behave properly if the stuff is reconfigured
while active.

> But importantly the hooks are at a whole different layer of the code
> and most likely at a completely different granularity.  You don't have
> per bus hypervisor support do you?
> 
> So as I see it that is a different layer and should be treated differently.

Well, I think that treating them differently will on the contrary
complicate the matter :-)

Now, as I said, I agree that Michael's current ops definition might
benefit from some changes.

I do agree for example that we might want to rework a bit what is done
in the area of the ->setup_msi_msg. An option is to remove it and
instead have the backend ->enable() hook be the one figuring out the
message and calling a low level -raw- helper rather than having a
generic raw helper hook directly in ->enable and itself then use
->setup_msi_msg as a lower level hook to get the message.

Since we need a low level raw helper to writeout the message
address/data anyway (for use by set_affinity among others), by doing so,
we avoid duplication.

Ben.





More information about the Linuxppc-dev mailing list