[RFC/PATCH 0/16] Ops based MSI Implementation

Mon Jan 29 10:38:28 EST 2007

Benjamin Herrenschmidt <benh at kernel.crashing.org> writes:

>> Because it mixes concerns that do not need to be mixed, and it complicates
>> the code.  The hypervisor has no need to understand how a hardware
>> device is built, and how it's registers operate.  It just needs to
>> know that the given hardware device will generate an msi message on
>> the bus.
>
> I'd be happy for you to go explain your view of what an hypervisor
> should or should not do to IBM HV architects :-) 

Sure think you can setup a meeting, or give me an email introduction to them.

> But in the meantime,
> that's how they have defined it and how it's been implemented and how we
> have to support it. And I have a strong feeling that they won't be the
> only ones to do it that way (I'd like to be proven wrong tho).

No the general linux kernel does not have to support it, and if we don't
I suspect that message would get back fairly clearly to the IBM HV architects.

I haven't been watching closely but I haven heard any rumors on the x86
side that they are looking in that direction.

>> Right, so some way needs to be found to cope with that situation.
>> Likely that involves bypassing all of the code that talks directly to
>> the hardware for MSI.
>
> Which can be done by having the alloc() and free() hooks do all the work
> provide they aren't done per-msi but per-call like in Michael's
> approach. That is, in the MSI-X case, alloc is called once for all of
> the MSI-X requested.
>
> I understand that this conflicts with your idea of requesting new MSI-X
> on the fly but I don't think that trying to add/remove MSI-X that way is
> a sane approach anyway. If you are concerned about HW problems, I think
> by doing so, you'll indeed hit them hard.

That isn't even the reason it is that way.  It is because allocating
4096 irqs in a single vector is a bad idea, and because it requires you
to pass type information of what kind of msi you are dealing with to the
lower levels in an allocation routine that make it bad idea.  Because
if you don't consider the IBM HV it provides not benefit and just puts
unnecessary loops, and type information in architecture code.

Face it.  Trying to make the allocation routine serve for both the
raw and the HV case unmodified is a layering violation.

>> But importantly the hooks are at a whole different layer of the code
>> and most likely at a completely different granularity.  You don't have
>> per bus hypervisor support do you?
>> 
>> So as I see it that is a different layer and should be treated differently.
>
> Well, I think that treating them differently will on the contrary
> complicate the matter :-)

It is tying unrelated concerns together in msi_ops and I am opposed to
that.

> Now, as I said, I agree that Michael's current ops definition might
> benefit from some changes.
>
> I do agree for example that we might want to rework a bit what is done
> in the area of the ->setup_msi_msg. An option is to remove it and
> instead have the backend ->enable() hook be the one figuring out the
> message and calling a low level -raw- helper rather than having a
> generic raw helper hook directly in ->enable and itself then use
> ->setup_msi_msg as a lower level hook to get the message.
>
> Since we need a low level raw helper to writeout the message
> address/data anyway (for use by set_affinity among others), by doing so,
> we avoid duplication.

Exactly.

Eric