Unsafe pte_update() in do_page_fault() (4xx and Book-E)
Eugene Surovegin
ebs at ebshome.net
Fri Mar 3 07:26:34 EST 2006
Hi!
For the last couple of days I was debugging rare
swap_dup: Bad swap file entry 0x00000080
errors in my custom 2.4 kernel running on 405GPr system.
My current theory is that this error is caused by the special lazy
dcache/icache flush handling on 4xx and BookE. Because this code in my
2.4 was actually a backport from 2.6, I think we have a problem in
current 2.6 as well.
Here is what I think happens. On 4xx/BookE we use execute bit to
deffer dcache to icache flush, in do_page_fault() we flush page when
execute trap triggers and enable _PAGE_HWEXEC bit in PTE.
Unfortunately, we don't lock this PTE and it's possible that after
pte_present() check but _before_ pte_update() call this particular
page was purged from the memory, e.g. because of extreme memory
pressure (of course, I'm assuming enabled preempt).
If this happens, pte_update() sets _PAGE_HWEXEC bit in just cleared
PTE. Sometime later, another page fault happens for this page, but
because of that set bit, pte_none() test in handle_pte_fault() fails,
and we continue along the wrong path, thinking that this PTE was
swapped out to the swap file, and this triggers swap_dup error I
mentioned at the beginning.
_PAGE_HWXEC is 0x200 on 405GPr, and because swap entry is PTE shifted
2 bits to the right, we get that "0x00000080" value.
Paul, does my theory make any sense? I cannot test 2.6 on our hw. So
far, after I added additional page_table_lock locking to my 2.4 in
do_page_fault(), I haven't seen these errors, but it's too early to be
100% sure :).
I'll make a patch for 2.6 if you think my analysis is correct.
--
Eugene
More information about the Linuxppc-dev
mailing list