[PATCH 3/3] mm: optimise pte dirty/accessed bit setting by demand based pte insertion

Nicholas Piggin npiggin at gmail.com
Tue Sep 18 03:53:37 AEST 2018


On Wed, 5 Sep 2018 07:29:51 -0700
Guenter Roeck <linux at roeck-us.net> wrote:

> Hi,
> 
> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote:
> > Similarly to the previous patch, this tries to optimise dirty/accessed
> > bits in ptes to avoid access costs of hardware setting them.
> >   
> 
> This patch results in silent nios2 boot failures, silent meaning that
> the boot stalls.

Okay I just got back to looking at this. The reason for the hang is
I think a bug in the nios2 TLB code, but maybe other archs have similar
issues.

In case of a missing / !present Linux pte, nios2 installs a TLB entry
with no permissions via its fast TLB exception handler (software TLB
fill). Then it relies on that causing a TLB permission exception in a
slower handler that calls handle_mm_fault to set the Linux pte and
flushes the old TLB. Then the fast exception handler will find the new
Linux pte.

With this patch, nios2 has a case where handle_mm_fault does not flush
the old TLB, which results in the TLB permission exception continually
being retried.

What happens now is that fault paths like do_read_fault will install a
Linux pte with the young bit clear and return. That will cause nios2 to
fault again but this time go down the bottom of handle_pte_fault and to
the access flags update with the young bit set. The young bit is seen to
be different, so that causes ptep_set_access_flags to do a TLB flush and
that finally allows the fast TLB handler to fire and pick up the new
Linux pte.

With this patch, the young bit is set in the first handle_mm_fault, so
the second handle_mm_fault no longer sees the ptes are different and
does not flush the TLB. The spurious fault handler also does not flush
them unless FAULT_FLAG_WRITE is set.

What nios2 should do is invalidate the TLB in update_mmu_cache. What it
*really* should do is install the new TLB entry, I have some patches to
make that work in qemu I can submit. But I would like to try getting
these dirty/accessed bit optimisation in 4.20, so I will send a simple
path to just do the TLB invalidate that could go in Andrew's git tree.

Is that agreeable with the nios2 maintainers?

Thanks,
Nick


More information about the Linuxppc-dev mailing list