[RFC/PATCH 0/16] Ops based MSI Implementation

Benjamin Herrenschmidt benh at kernel.crashing.org
Mon Jan 29 08:17:28 EST 2007


> Yes of some form.  Although only needing 2 ops instead of 6 is still
> simpler.  Until we can agree on a point where the ops lookup is
> generic I don't see the point in placing it in generic code.

At least 4 actually, since we would need suspend/resume as well.
 
> In addition I am extremely uncomfortable with making the interface to
> the architecture any wider than we need it to be, as refactoring code
> across multiple architectures is hard as usually the developer does
> not have the hardware to touch all of the code that is touched.

The interface to the arch in our model is the function to get ops :-)
Most "normal" backends would just "plug" those ops with the provided
raw_ functions.

> The argument that we need to support what the RTAS is doing to support
> other hypervisors seems to be a fallacy.  What the RTAS is doing is
> not sane from a hardware standpoint, so I do not expect it from other
> virtualized/hypervisor style environments. 
>
> If the hardware provides capabilities to isolate the MSI messages
> properly it does not need to prevent us from touching the msi setup
> registers. 

It does isolate and it doesn't -prevent- config space access. However,
in order to enable MSIs, we have to configure the device -and- the IRQ
controller on the bus on which the device sits on, that is, to obtain
vectors from the HV, configure the controller to receive MSIs from that
device and route them to us, etc...., and the only API the HV provides
for doing so is that RTAS function that configures both in one call.

I don't see what's fundamentally wrong with that approach.

>  If the hardware does not isolate the MSI messages properly
> there is another problem.  Especially in the context of MSI-X where
> the registers can be in the middle of any mmio bar I do not see a sane
> way of keeping us from touching the hardware directly in the first
> place.

They are not blocked as I said above, at least not for most devices,
(though the controller/receiver side is). However, we don't have an API
to get the address/value to write into the device, nor to
configure/enable MSIs in the PIC. The only API we have is basically
called "change-msi" which can be use to enable MSI, MSI-X or disable
them (though we can provide how many we want to enable out of what is
requested by the device... we can't enable sparse MSI-X though, we can
only enable the N first ones).

> However it is quite likely that supporting the RTAS is not going to 
> require much code to support.  So I don't see an argument against not
> supporting the RTAS.

It would imply 2 or 3 more hooks at the toplevel... so we are going from
your 2 initial hooks to 4 (bcs we need to hook suspend/resume), now to 6
or 7.... 

> There is the additional problem in all of this that our interface for
> MSI-X to the drivers is quite likely the wrong interface.  I believe
> we will want to incrementally allocate more irqs at run time as there
> are work queues or the like which can be attached to them.  We can get
> there with the current vector allocator by freeing and reallocating
> all of the msi-x irqs when the driver wants more so the current
> interface will suffice but it is far from optimal.

Our hypervisor will not unfortunately let us do that. We can only use
RTAS "change-msi" to allocate more/less MSIs and that is disruptive of
the device function (we might lose pending interrupts when doing so, in
fact, in the initial HV interface definition, we could only do that with
the device actually disabled in the command register !).

In general, I'd rather have the device pre-allocate the MSI-X it needs,
though it can later on decide to use more or less.

Ben.





More information about the Linuxppc-dev mailing list