[Skiboot-stable] [PATCH] phb4/5: Escalate page-level TCE kills
Frederic Barrat
fbarrat at linux.ibm.com
Fri Aug 27 02:53:35 AEST 2021
On 26/08/2021 17:47, Oliver O'Halloran wrote:
> On Thu, Aug 26, 2021 at 1:09 AM Frederic Barrat <fbarrat at linux.ibm.com> wrote:
>>
>> An hw issue was found on P10 (HW560152) where a page-level TCE kill
>> can be dropped if there are enough TCE kill requests already being
>> processed. The net effect is that data integrity is not
>> guaranteed.
>
> Hmm, what is the actual problem? Is there a race between when the bit
> in TCE_KILL says there's a free queue slot and when one actually comes
> available? If so, how big is that race window?
Not quite. We need to have the queue backed up with a mix of PE-level
and page-level kills. And if there's the right sequence of those in the
queue, then a page level kill is dropped.
Fred
>> The circumvention is to stay away from page-level kills
>> and escalate those to PE kills. Which hurts performance.
>
> understatement
>
>> It also affects P9.
>
> lol
>
>
>>
>> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
>> ---
>> hw/phb4.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/hw/phb4.c b/hw/phb4.c
>> index 79083d4a..ddaa18f8 100644
>> --- a/hw/phb4.c
>> +++ b/hw/phb4.c
>> @@ -1051,6 +1051,14 @@ static int64_t phb4_tce_kill(struct phb *phb, uint32_t kill_type,
>> uint64_t val;
>> int64_t rc;
>>
>> + /*
>> + * HW560152: a page-level kill can be dropped if the
>> + * processing queue is backed-up, which can cause data
>> + * integrity issues
>> + */
>> + if (kill_type == OPAL_PCI_TCE_KILL_PAGES)
>> + kill_type = OPAL_PCI_TCE_KILL_PE;
>> +
>> sync();
>> switch(kill_type) {
>> case OPAL_PCI_TCE_KILL_PAGES:
>> --
>> 2.31.1
>>
>> --
>> Skiboot-stable mailing list
>> Skiboot-stable at lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/skiboot-stable
More information about the Skiboot-stable
mailing list