[PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page

Wed Aug 17 11:49:03 AEST 2022

Peter Xu <peterx at redhat.com> writes:

> On Tue, Aug 16, 2022 at 04:10:29PM +0800, huang ying wrote:
>> > @@ -193,11 +194,10 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>> >                         bool anon_exclusive;
>> >                         pte_t swp_pte;
>> >
>> > +                       flush_cache_page(vma, addr, pte_pfn(*ptep));
>> > +                       pte = ptep_clear_flush(vma, addr, ptep);
>>
>> Although I think it's possible to batch the TLB flushing just before
>> unlocking PTL.  The current code looks correct.
>
> If we're with unconditionally ptep_clear_flush(), does it mean we should
> probably drop the "unmapped" and the last flush_tlb_range() already since
> they'll be redundant?

This patch does that, unless I missed something?

> If that'll need to be dropped, it looks indeed better to still keep the
> batch to me but just move it earlier (before unlock iiuc then it'll be
> safe), then we can keep using ptep_get_and_clear() afaiu but keep "pte"
> updated.

I think we would also need to check should_defer_flush(). Looking at
try_to_unmap_one() there is this comment:

			if (should_defer_flush(mm, flags) && !anon_exclusive) {
				/*
				 * We clear the PTE but do not flush so potentially
				 * a remote CPU could still be writing to the folio.
				 * If the entry was previously clean then the
				 * architecture must guarantee that a clear->dirty
				 * transition on a cached TLB entry is written through
				 * and traps if the PTE is unmapped.
				 */

And as I understand it we'd need the same guarantee here. Given
try_to_migrate_one() doesn't do batched TLB flushes either I'd rather
keep the code as consistent as possible between
migrate_vma_collect_pmd() and try_to_migrate_one(). I could look at
introducing TLB flushing for both in some future patch series.

 - Alistair

> Thanks,