[Skiboot] [PATCH 1/3] hw/psi-p9: Mask OPAL-owned LSIs without handlers

Cédric Le Goater clg at kaod.org
Tue Sep 3 01:59:24 AEST 2019


On 30/08/2019 06:30, Oliver O'Halloran wrote:
> Some versions of Swift have the TPM interrupt line of the second chip
> pulled up instead of down. 

Is this intentional or a bogus default latch ? 

The reset of the PSI controller does not seem to have any effect on the level.

Does it stay high after an EOI ? 

we could check the level value in the "LSI Interrupt Level Register (0x19)" 
reg and complain in the OPAL logs if an HW default value is insane.   

> This causes the PSI's external (TPM) interrupt
> to constantly re-fire since it's an LSI. There's nothing that can be done
> to clear the underlying interrupt condition so we need to mask it.

It seems that we don't use the P9 external interrupt in OPAL. We could 
not advertise such interrupts in the DT and Linux will ignore them.

But it might be interesting to log that something unexpected is occurring
and this is probably why you chose this solution.

> The problem isn't really specific to the external interrupt and will
> occur for any of the PSI interrupts that don't have a "real" handler
> (FSP, global error, and sometimes the external) so we should mask the
> IRQ in OPAL rather than allowing it to re-fire.

ok. this fits well the current need to stop an interrupt from firing and 
log what happened. 

> If an of the IRQs are directed at Linux then masking them is handled by
> Linux.

On P9, we can now (un)mask from Linux the interrupts handled by OPAL using 
the ESB pages. I don't think this will improve what we are trying to do here. 

> Cc: Cédric Le Goater <clg at kaod.org>
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
> ---
>  hw/psi.c | 33 +++++++++++++++++++++++++++------
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/psi.c b/hw/psi.c
> index a74c105ff0ec..d4b20f1f015f 100644
> --- a/hw/psi.c
> +++ b/hw/psi.c
> @@ -509,6 +509,28 @@ static const struct irq_source_ops psi_p8_irq_ops = {
>  	.name = psi_p8_irq_name,
>  };
>  
> +static void psi_p9_mask_unhandled_irq(struct irq_source *is, int isn)
> +{
> +	struct psi *psi = is->data;
> +	uint32_t idx = isn - psi->interrupt;
> +	char *name = is->ops->name ? is->ops->name(is, isn) : NULL;
> +
> +	prerror("PSI: Masking unhandled LSI %d, %s!\n",
> +			idx, name ? name : "<unknown>");
> +	free(name);

The strdup() in the irq_name() handler is ugly. That's another problem.

> +
> +	/*
> +	 * All the PSI interrupts are LSIs and will be constantly re-fired
> +	 * unless the underlying interrupt condition is cleared. This can
> +	 * happen fast enough to hard-lockup a thread so any unhandled IRQs
> +	 * need to be masked. We can do that by storing to the 0xD00-0xDFF
> +	 * range which sets PQ=01 for that ESB.
> +	 *
> +	 * NB: The PSI only supports 4K ESB pages

on P9, yes. P10 will use 64k ESB pages.

> +	 */
> +	out_be64(psi->esb_mmio + 0x1000 * idx + 0xD00, 0);

Please use :

	struct xive_src *s = container_of(is, struct xive_src, is);

	out_be64(psi->esb_mmio + (1ull << s->esb_shift) * idx + 0xD00, 0)


We need to introduce a set of defines for the PQ loads :

#define XIVE_ESB_STORE_EOI	0x400 /* Store */
#define XIVE_ESB_LOAD_EOI	0x000 /* Load */
#define XIVE_ESB_GET		0x800 /* Load */
#define XIVE_ESB_SET_PQ_00	0xc00 /* Load */
#define XIVE_ESB_SET_PQ_01	0xd00 /* Load */
#define XIVE_ESB_SET_PQ_10	0xe00 /* Load */
#define XIVE_ESB_SET_PQ_11	0xf00 /* Load */

It can come in a later patch.


> +}
> +
>  static void psihb_p9_interrupt(struct irq_source *is, uint32_t isn)
>  {
>  	struct psi *psi = is->data;
> @@ -521,21 +543,17 @@ static void psihb_p9_interrupt(struct irq_source *is, uint32_t isn)
>  	case P9_PSI_IRQ_OCC:
>  		occ_p9_interrupt(psi->chip_id);
>  		break;
> -	case P9_PSI_IRQ_FSI:
> -		printf("PSI: FSI irq received\n");
> -		break;
>  	case P9_PSI_IRQ_LPCHC:
>  		lpc_interrupt(psi->chip_id);
>  		break;
>  	case P9_PSI_IRQ_LOCAL_ERR:
>  		prd_psi_interrupt(psi->chip_id);
>  		break;
> -	case P9_PSI_IRQ_GLOBAL_ERR:
> -		printf("PSI: Global error irq received\n");
> -		break;
>  	case P9_PSI_IRQ_EXTERNAL:
>  		if (platform.external_irq)
>  			platform.external_irq(psi->chip_id);

I don't know any P9 platform using the external_irq() handler. So this should 
be moved to the 'default' case to simply log an error instead.

> +		else
> +			psi_p9_mask_unhandled_irq(is, isn);
>  		break;
>  	case P9_PSI_IRQ_LPC_SIRQ0:
>  	case P9_PSI_IRQ_LPC_SIRQ1:
> @@ -553,6 +571,9 @@ static void psihb_p9_interrupt(struct irq_source *is, uint32_t isn)
>  	case P9_PSI_IRQ_PSU:
>  		p9_sbe_interrupt(psi->chip_id);
>  		break;
> +
> +	default:
> +		psi_p9_mask_unhandled_irq(is, isn);
>  	}
>  }
>  
> 



More information about the Skiboot mailing list