[PATCH] x86/uaccess: Avoid barrier_nospec() in copy_from_user()

Andrew Cooper andrew.cooper3 at citrix.com
Sun Oct 13 04:23:46 AEDT 2024


On 12/10/2024 4:44 pm, Linus Torvalds wrote:
> On Sat, 12 Oct 2024 at 01:49, Andrew Cooper <andrew.cooper3 at citrix.com> wrote:
>> You do realise mask_user_address() is unsafe under speculation on AMD
>> systems?
> That had *better* not be true.

Yeah I'd prefer it wasn't true either.
>> Had the mask_user_address() patch been put for review, this feedback
>> would have been given then.
> That's BS.
>
> Why? Look at commit b19b74bc99b1 ("x86/mm: Rework address range check
> in get_user() and put_user()").

That looks like 3 Intel tags and 0 AMD tags.

But ok, I didn't spot this one, and it looks unsafe too.  It was not
reviewed by anyone that had a reasonable expectation to know AMD's
microarchitectural behaviour.

Previously, the STAC protected against bad prediction of the JAE and
prevented dereferencing the pointer if it was greater than TASK_SIZE.

Importantly for the issue at hand, the calculation against TASK_SIZE
excluded the whole non-canonical region.

> This mask_user_address() thing is how we've been doing a regular
> get/put_user() for the last 18 months. It's *exactly* the same
> pattern:
>
>         mov %rax, %rdx
>         sar $63, %rdx
>         or %rdx, %rax
>
> ie we saturate the sign bit.

This logic is asymmetric.

For an address in the upper half (canonical or non-canonical), it ORs
with -1 and fully replaces the prior address.

For an address in the lower half (canonical or non-canonical), it leaves
the value intact, as either canonical or non-canoncal.

Then the pointer is architecturally dereferenced, relying on catching
#PF/#GP for the slow path.  Architecturally, this is safe.


Micro-architecturally though, AMD CPUs use bit 47, not 63, in the TLB
lookup.  This behaviour dates from the K8, and is exposed somewhat in
the virt extensions.

When userspace passes in a non-canonical pointer in the low half of the
address space but with bit 47 set, it will be considered a high-half
pointer when sent for TLB lookup, and the pagetables say it's a
supervisor mapping, so the memory access will be permitted to go ahead
speculatively.  Only later does the pipeline realise the address was
non-canonical and raise #GP.

This lets userspace directly target and load anything cacheable in the
kernel mappings.  It's not as easy to exploit as Meltdown on Intel, but
it known behaviour, and been the subject of academic work for 4 years.

~Andrew


More information about the Linuxppc-dev mailing list