[PATCH v3 2/6] x86/uaccess: Avoid barrier_nospec() in 64-bit __get_user()

Linus Torvalds torvalds at linux-foundation.org
Fri Nov 22 09:16:12 AEDT 2024


On Thu, 21 Nov 2024 at 13:40, Josh Poimboeuf <jpoimboe at kernel.org> wrote:
>
> The profile is showing futex_get_value_locked():

Ahh.

> That has several callers, so we can probably just use get_user() there?

Yeah, that's the simplest thing. That thing isn't even some inline
function, so the real cost is the call.

That said, exactly because it's not inlined, and calls are expensive,
and this is apparently really critical, we can just do it with the
full "unsafe_get_user()" model.

It's not so complicated. The attached patch is untested, but I did
check that it generates almost perfect code:

    mov    %gs:0x0,%rax                 # current
    incl   0x1a9c(%rax)                 # current->pagefault_disable++
    movabs $0x123456789abcdef,%rcx      # magic virtual address size
    cmp    %rsi,%rcx                    # address masking
    sbb    %rcx,%rcx
    or     %rsi,%rcx
    stac                                # enable user space acccess
    mov    (%rcx),%ecx                  # get the value
    clac                                # disable user space access
    decl   0x1a9c(%rax)                 # current->pagefault_disable--
    mov    %ecx,(%rdi)                  # save the value
    xor    %eax,%eax                    # return 0
    ret

(with the error case for the page fault all out-of-line).

So this should be _faster_ than the old __get_user(), because while
the address masking is not needed, it's cheaper than the function call
used to be and the error handling is better.

If you can test this and verify that it actually help, I'll take it as
a patch. Consider it signed-off after testing.

> Also, is there any harm in speeding up __get_user()?  It still has ~80
> callers and it's likely to be slowing down things we don't know about.

How would you speed it up?  We definitely can't replace the fence with
addressing tricks. So we can't just replace it with "get_user()",
because of those horrid architecture-specific kernel uses.

Now, we could possibly say "just remove the fence in __get_user()
entirely", but that would involve moving it to access_ok().

And then it wouldn't actually speed anything up (except the horrid
architecture-specific kernel uses that then don't call access_ok() at
all - and we don't care about *those*).

               Linus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.diff
Type: text/x-patch
Size: 1128 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20241121/4841c61d/attachment.bin>


More information about the Linuxppc-dev mailing list