[RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Fri Jul 17 01:34:52 AEST 2020
----- On Jul 16, 2020, at 7:00 AM, Peter Zijlstra peterz at infradead.org wrote:
> On Thu, Jul 16, 2020 at 08:03:36PM +1000, Nicholas Piggin wrote:
>> Excerpts from Peter Zijlstra's message of July 16, 2020 6:50 pm:
>> > On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote:
>> >> > On Jul 15, 2020, at 9:15 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
>
>> >> But I’m wondering if all this deferred sync stuff is wrong. In the
>> >> brave new world of io_uring and such, perhaps kernel access matter
>> >> too. Heck, even:
>> >
>> > IIRC the membarrier SYNC_CORE use-case is about user-space
>> > self-modifying code.
>> >
>> > Userspace re-uses a text address and needs to SYNC_CORE before it can be
>> > sure the old text is forgotten. Nothing the kernel does matters there.
>> >
>> > I suppose the manpage could be more clear there.
>>
>> True, but memory ordering of kernel stores from kernel threads for
>> regular mem barrier is the concern here.
>>
>> Does io_uring update completion queue from kernel thread or interrupt,
>> for example? If it does, then membarrier will not order such stores
>> with user memory accesses.
>
> So we're talking about regular membarrier() then? Not the SYNC_CORE
> variant per-se.
>
> Even there, I'll argue we don't care, but perhaps Mathieu has a
> different opinion.
I agree with Peter that we don't care about accesses to user-space
memory performed concurrently with membarrier.
What we'd care about in terms of accesses to user-space memory from the
kernel is something that would be clearly ordered as happening before
or after the membarrier call, for instance a read(2) followed by
membarrier(2) after the read returns, or a read(2) issued after return
from membarrier(2). The other scenario we'd care about is with the compiler
barrier paired with membarrier: e.g. read(2) returns, compiler barrier,
followed by a store. Or load, compiler barrier, followed by write(2).
All those scenarios imply before/after ordering wrt either membarrier or
the compiler barrier. I notice that io_uring has a "completion" queue.
Let's try to come up with realistic usage scenarios.
So the dependency chain would be provided by e.g.:
* Infrequent read / Frequent write, communicating read completion through variable X
wait for io_uring read request completion -> membarrier -> store X=1
with matching
load from X (waiting for X==1) -> asm volatile (::: "memory") -> submit io_uring write request
or this other scenario:
* Frequent read / Infrequent write, communicating read completion through variable X
load from X (waiting for X==1) -> membarrier -> submit io_uring write request
with matching
wait for io_uring read request completion -> asm volatile (::: "memory") -> store X=1
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the Linuxppc-dev
mailing list