[PATCH v3 2/6] x86/uaccess: Avoid barrier_nospec() in 64-bit __get_user()

Linus Torvalds torvalds at linux-foundation.org
Fri Nov 22 12:02:06 AEDT 2024


On Thu, 21 Nov 2024 at 16:12, Josh Poimboeuf <jpoimboe at kernel.org> wrote:
>
> The asm looks good, but the C exploded a bit... why not just have an
> inline get_user()?

That was originally one of my goals for the "unsafe" ones - if done
right, they'd be the proper building blocks for a get_user(), and we'd
only really need one single good implementation.

But it really does blow up the code size quite noticeably. The old
sign-based thing wasn't quite so bad, and was one of the main reasons
I really didn't like having to switch to the big constant thing, but
with the constant, the "get_user()" part is basically 27 bytes per
location.

The uninlined call is 5 bytes.

(And both then have the error handling - the inlined one can use the
asm goto to *maybe* make up for some of it because it avoids tests,
but I suspect it ends up being pretty similar in the end).

So I'm not really sure inlining makes sense - except if you have code
that you've profiled.

Part of the problem is that you can't really just make an inline
function. You need to make one for each size. Which we've done, but it
gets real messy real quick. I don't want to have yet another "switch
(sizeof..)" thing.

Or you'd make it a macro (which makes dealing with different types
easy), but then it would have to be a statement expression to return
the error, and to take advantage of that exception handling error
handling gets messed up real quick too.

End result: the

        if (can_do_masked_user_access())
                from = masked_user_access_begin(from);
        else if (!user_read_access_begin(from, sizeof(*from)))
                goto enable_and_error;

        unsafe_get_user(val, from, Efault);
        user_access_end();

pattern is very good for generating fine code, but it's rather nasty
to encapsulate as one single macro somewhere. It really ends up having
those two error cases: the one that just returns the error, and the
one that has to do user_access_end().

[ Time passes ]

Ugh. I tried it. It looks like this:

#define inlined_get_user(res, ptr) ({                           \
        __label__ fail2, fail1;                                 \
        __auto_type __up = (ptr);                               \
        int __ret = 0;                                          \
        if (can_do_masked_user_access())                        \
                __up = masked_user_access_begin(__up);          \
        else if (!user_read_access_begin(__up, sizeof(*__up)))  \
                goto fail1;                                     \
        unsafe_get_user(res, ptr, fail2);                       \
        user_access_end();                                      \
        if (0) {                                                \
fail2:  user_access_end();                                      \
fail1:  __ret = -EFAULT;                                        \
        }                                                       \
        __ret; })

and maybe that works. It generated ok code in this case, where I did

  int futex_get_value_locked(u32 *dest, u32 __user *from)
  {
        int ret;

        pagefault_disable();
        ret = inlined_get_user(*dest, from);
        pagefault_enable();
        return ret;
  }

but I'm not convinced it's better than open-coding it. We have some
ugly macros in the kernel, but that may be one of the ugliest I've
ever written.

> > How would you speed it up?
>
> I'm not sure if you saw the example code snippet I posted up-thread,
> here it is below.

Oh, ok, no I hadn't seen that one.

Yeah, it looks like that would work. What a horrendous crock. But your
point that it would find the nasty __get_user() cases is nicely made.

             Linus


More information about the Linuxppc-dev mailing list