[Skiboot] [PATCH 1/3] hw/psi-p9: Mask OPAL-owned LSIs without handlers

Tue Sep 3 16:40:03 AEST 2019

On Tue, Sep 3, 2019 at 1:59 AM Cédric Le Goater <clg at kaod.org> wrote:
>
> On 30/08/2019 06:30, Oliver O'Halloran wrote:
> > Some versions of Swift have the TPM interrupt line of the second chip
> > pulled up instead of down.
>
> Is this intentional or a bogus default latch ?

I think it was a schematic error. As far as I know the PSI is
unchanged and on Witherspoon the TPM interrupt pin on the 2nd chip is
tied to ground. It might be a logic bug, but considering it only
affects one of the chips I don't think so.

> The reset of the PSI controller does not seem to have any effect on the level.
>
> Does it stay high after an EOI ?

Yes, that's the problem. The EOI just clears the P bit and the PSI
immediately sets it again since the interrupt condition is active.

> we could check the level value in the "LSI Interrupt Level Register (0x19)"
> reg and complain in the OPAL logs if an HW default value is insane.
>
> > This causes the PSI's external (TPM) interrupt
> > to constantly re-fire since it's an LSI. There's nothing that can be done
> > to clear the underlying interrupt condition so we need to mask it.
>
> It seems that we don't use the P9 external interrupt in OPAL. We could
> not advertise such interrupts in the DT and Linux will ignore them.
>
> But it might be interesting to log that something unexpected is occurring
> and this is probably why you chose this solution.

We don't currently, but that seems like more of an oversight than
anything. We just need to add the DT interrupt properties. That said,
if we do that the IRQ would be targeted at Linux rather than OPAL,
so...

> > The problem isn't really specific to the external interrupt and will
> > occur for any of the PSI interrupts that don't have a "real" handler
> > (FSP, global error, and sometimes the external) so we should mask the
> > IRQ in OPAL rather than allowing it to re-fire.
>
> ok. this fits well the current need to stop an interrupt from firing and
> log what happened.
>
> > If an of the IRQs are directed at Linux then masking them is handled by
> > Linux.
>
> On P9, we can now (un)mask from Linux the interrupts handled by OPAL using
> the ESB pages. I don't think this will improve what we are trying to do here.

We can, and IIRC Linux will sometimes mask and unmask interrupts
directed at OPAL since it might wan to migrate them. I'd rather not
rely on the OS doing anything for us when the interrupt is supposed to
be handled by OPAL.

> > Cc: Cédric Le Goater <clg at kaod.org>
> > Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
> > ---
> >  hw/psi.c | 33 +++++++++++++++++++++++++++------
> >  1 file changed, 27 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/psi.c b/hw/psi.c
> > index a74c105ff0ec..d4b20f1f015f 100644
> > --- a/hw/psi.c
> > +++ b/hw/psi.c
> > @@ -509,6 +509,28 @@ static const struct irq_source_ops psi_p8_irq_ops = {
> >       .name = psi_p8_irq_name,
> >  };
> >
> > +static void psi_p9_mask_unhandled_irq(struct irq_source *is, int isn)
> > +{
> > +     struct psi *psi = is->data;
> > +     uint32_t idx = isn - psi->interrupt;
> > +     char *name = is->ops->name ? is->ops->name(is, isn) : NULL;
> > +
> > +     prerror("PSI: Masking unhandled LSI %d, %s!\n",
> > +                     idx, name ? name : "<unknown>");
> > +     free(name);
>
> The strdup() in the irq_name() handler is ugly. That's another problem.
>
> > +
> > +     /*
> > +      * All the PSI interrupts are LSIs and will be constantly re-fired
> > +      * unless the underlying interrupt condition is cleared. This can
> > +      * happen fast enough to hard-lockup a thread so any unhandled IRQs
> > +      * need to be masked. We can do that by storing to the 0xD00-0xDFF
> > +      * range which sets PQ=01 for that ESB.
> > +      *
> > +      * NB: The PSI only supports 4K ESB pages
>
> on P9, yes. P10 will use 64k ESB pages.

Is the PSI hardware going to be the same or are we going to have to
re-write it all similar to what happened from P8 -> P9?

> > +      */
> > +     out_be64(psi->esb_mmio + 0x1000 * idx + 0xD00, 0);
>
> Please use :
>
>         struct xive_src *s = container_of(is, struct xive_src, is);
>
>         out_be64(psi->esb_mmio + (1ull << s->esb_shift) * idx + 0xD00, 0)

The xive_src structure is internal to xive.c. Suppose we could move it
out into xive.h, but it would probably be cleaner if we put a ->mask()
into the interrupt source ops.

> We need to introduce a set of defines for the PQ loads :
>
> #define XIVE_ESB_STORE_EOI      0x400 /* Store */
> #define XIVE_ESB_LOAD_EOI       0x000 /* Load */
> #define XIVE_ESB_GET            0x800 /* Load */
> #define XIVE_ESB_SET_PQ_00      0xc00 /* Load */
> #define XIVE_ESB_SET_PQ_01      0xd00 /* Load */
> #define XIVE_ESB_SET_PQ_10      0xe00 /* Load */
> #define XIVE_ESB_SET_PQ_11      0xf00 /* Load */
>
> It can come in a later patch.

Makes sense, but you can send that one since you're doing a XIVE
rework anyway ;)

Oliver