hash table

Benjamin Herrenschmidt benh at kernel.crashing.org
Wed Dec 10 11:02:52 EST 2003

> I can see a race between the find_linux_pte() and the use of ptep in
> __hash_page. Another CPU can come in during that window and deallocate
> the PTE, can't it? One solution for this is to set _PAGE_BUSY in
> find_linux_pte() atomically during lookup. There's even more subtle
> races in the sense that the tree is walked while someone might update it
> underneath of the lookup, but maybe they can be ignored?

Yup, this race is on my list already ;)

I want to move find_linux_pte down into __hash_page anyway, but that's
not how to fix this race.

AFAIK, the only race is (very unlikely but definitely there) if we free
a PTE page on one CPU while we are in hash_page() on another CPU.

Paulus proposed a fix for this which consist of delaying the actual
freeing of PTE pages. We gather them into a list that we free either
after a given threshold or after a while at idle time.

When we actually go to free it, we use an IPI to sync with othe CPUs,
making sure they aren't in hash_page(). At that point, we'll have
already cleared the pmd entries, so we know no CPU will go down to the
PTE any more on a further hash_page().

>Also two minor comments:
> * in pte_update, use _PAGE_BUSY instead of hardcoded 0x0800? Would
> increase readability a little.

Yah, maybe, I didn't feel like adding another argument to the asm
statement,  I hate that syntax, but you are probably right ;)

> * in __hash_page / htab_wrong_access: There's no check for failed stdcx.

That's normal, the only point of this stdcx. is to not leave a dangling
reservation, I don't care if it succeed as the value I'm writing back is
the original value intact.

Thanks for your comments,

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/

More information about the Linuxppc64-dev mailing list