[PATCH] powerpc/64s: Fix THP PMD collapse serialisation

Nicholas Piggin npiggin at gmail.com
Mon Jun 3 17:33:02 AEST 2019


Aneesh Kumar K.V's on June 3, 2019 4:43 pm:
> On 6/3/19 11:35 AM, Nicholas Piggin wrote:
>> Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion
>> in pte helpers") changed the actual bitwise tests in pte_access_permitted
>> by using pte_write() and pte_present() helpers rather than raw bitwise
>> testing _PAGE_WRITE and _PAGE_PRESENT bits.
>> 
>> The pte_present change now returns true for ptes which are !_PAGE_PRESENT
>> and _PAGE_INVALID, which is the combination used by pmdp_invalidate to
>> synchronize access from lock-free lookups. pte_access_permitted is used by
>> pmd_access_permitted, so allowing GUP lock free access to proceed with
>> such PTEs breaks this synchronisation.
>> 
>> This bug has been observed on HPT host, with random crashes and corruption
>> in guests, usually together with bad PMD messages in the host.
>> 
>> Fix this by adding an explicit check in pmd_access_permitted, and
>> documenting the condition explicitly.
>> 
>> The pte_write() change should be okay, and would prevent GUP from falling
>> back to the slow path when encountering savedwrite ptes, which matches
>> what x86 (that does not implement savedwrite) does.
>> 
>> Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
>> Cc: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>> Cc: Christophe Leroy <christophe.leroy at c-s.fr>
>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> ---
>>   arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ++++++++++++++++++-
>>   arch/powerpc/mm/book3s64/pgtable.c           |  3 +++
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index 7dede2e34b70..aaa72aa1b765 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -1092,7 +1092,24 @@ static inline int pmd_protnone(pmd_t pmd)
>>   #define pmd_access_permitted pmd_access_permitted
>>   static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>>   {
>> -	return pte_access_permitted(pmd_pte(pmd), write);
>> +	pte_t pte = pmd_pte(pmd);
>> +	unsigned long pteval = pte_val(pte);
>> +
>> +	/*
>> +	 * pmdp_invalidate sets this combination (that is not caught by
>> +	 * !pte_present() check in pte_access_permitted), to prevent
>> +	 * lock-free lookups, as part of the serialize_against_pte_lookup()
>> +	 * synchronisation.
>> +	 *
>> +	 * This check inadvertently catches the case where the PTE's hardware
>> +	 * PRESENT bit is cleared while TLB is flushed, to work around
>> +	 * hardware TLB issues. This is suboptimal, but should not be hit
>> +	 * frequently and should be harmless.
>> +	 */
>> +	if ((pteval & _PAGE_INVALID) && !(pteval & _PAGE_PRESENT))
>> +		return false;
>> +
>> +	return pte_access_permitted(pte, write);
>>   }
>>   
> 
> 
> you need to do similar for other lockless page table walk like 
> find_linux_pte

Yeah good point as discussed offline. I was going to make that a
separate patch, it would have a different Fixes:. I have not been
able to trigger any bugs caused by it, whereas the bug caused by
this patch hits reliably in about 10 minutes or less.

Maybe the race window is just a lot smaller or the function is
less frequently used?

Thanks,
Nick




More information about the Linuxppc-dev mailing list