[MOCKUP] x86/mm: Lightweight lazy mm refcounting
Nicholas Piggin
npiggin at gmail.com
Fri Dec 4 09:13:40 AEDT 2020
Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm:
> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:
>
>> power: same as ARM, except that the loop may be rather larger since
>> the systems are bigger. But I imagine it's still faster than Nick's
>> approach -- a cmpxchg to a remote cacheline should still be faster than
>> an IPI shootdown.
>
> While a single atomic might be cheaper than an IPI, the comparison
> doesn't work out nicely. You do the xchg() on every unlazy, while the
> IPI would be once per process exit.
>
> So over the life of the process, it might do very many unlazies, adding
> up to a total cost far in excess of what the single IPI would've been.
Yeah this is the concern, I looked at things that add cost to the
idle switch code and it gets hard to justify the scalability improvement
when you slow these fundmaental things down even a bit.
I still think working on the assumption that IPIs = scary expensive
might not be correct. An IPI itself is, but you only issue them when
you've left a lazy mm on another CPU which just isn't that often.
Thanks,
Nick
More information about the Linuxppc-dev
mailing list