[Skiboot-stable] [PATCH] phb4/5: Escalate page-level TCE kills

Oliver O'Halloran oohall at gmail.com
Fri Aug 27 01:47:08 AEST 2021


On Thu, Aug 26, 2021 at 1:09 AM Frederic Barrat <fbarrat at linux.ibm.com> wrote:
>
> An hw issue was found on P10 (HW560152) where a page-level TCE kill
> can be dropped if there are enough TCE kill requests already being
> processed. The net effect is that data integrity is not
> guaranteed.

Hmm, what is the actual problem? Is there a race between when the bit
in TCE_KILL says there's a free queue slot and when one actually comes
available? If so, how big is that race window?

> The circumvention is to stay away from page-level kills
> and escalate those to PE kills. Which hurts performance.

understatement

> It also affects P9.

lol


>
> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
> ---
>  hw/phb4.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/hw/phb4.c b/hw/phb4.c
> index 79083d4a..ddaa18f8 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -1051,6 +1051,14 @@ static int64_t phb4_tce_kill(struct phb *phb, uint32_t kill_type,
>         uint64_t val;
>         int64_t rc;
>
> +       /*
> +        * HW560152: a page-level kill can be dropped if the
> +        *       processing queue is backed-up, which can cause data
> +        *       integrity issues
> +        */
> +       if (kill_type == OPAL_PCI_TCE_KILL_PAGES)
> +               kill_type = OPAL_PCI_TCE_KILL_PE;
> +
>         sync();
>         switch(kill_type) {
>         case OPAL_PCI_TCE_KILL_PAGES:
> --
> 2.31.1
>
> --
> Skiboot-stable mailing list
> Skiboot-stable at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot-stable


More information about the Skiboot-stable mailing list