[Skiboot-stable] [PATCH] phb4/5: Escalate page-level TCE kills
Oliver O'Halloran
oohall at gmail.com
Fri Aug 27 01:47:08 AEST 2021
On Thu, Aug 26, 2021 at 1:09 AM Frederic Barrat <fbarrat at linux.ibm.com> wrote:
>
> An hw issue was found on P10 (HW560152) where a page-level TCE kill
> can be dropped if there are enough TCE kill requests already being
> processed. The net effect is that data integrity is not
> guaranteed.
Hmm, what is the actual problem? Is there a race between when the bit
in TCE_KILL says there's a free queue slot and when one actually comes
available? If so, how big is that race window?
> The circumvention is to stay away from page-level kills
> and escalate those to PE kills. Which hurts performance.
understatement
> It also affects P9.
lol
>
> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
> ---
> hw/phb4.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/hw/phb4.c b/hw/phb4.c
> index 79083d4a..ddaa18f8 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -1051,6 +1051,14 @@ static int64_t phb4_tce_kill(struct phb *phb, uint32_t kill_type,
> uint64_t val;
> int64_t rc;
>
> + /*
> + * HW560152: a page-level kill can be dropped if the
> + * processing queue is backed-up, which can cause data
> + * integrity issues
> + */
> + if (kill_type == OPAL_PCI_TCE_KILL_PAGES)
> + kill_type = OPAL_PCI_TCE_KILL_PE;
> +
> sync();
> switch(kill_type) {
> case OPAL_PCI_TCE_KILL_PAGES:
> --
> 2.31.1
>
> --
> Skiboot-stable mailing list
> Skiboot-stable at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot-stable
More information about the Skiboot-stable
mailing list