[PATCH] powerpc/fault: fix wrong KUAP fault for IO_URING

Zorro Lang zlang at redhat.com
Fri Jan 29 17:52:20 AEDT 2021


On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote:
> 
> 
> Le 28/01/2021 à 15:42, Jens Axboe a écrit :
> > On 1/28/21 6:52 AM, Zorro Lang wrote:
> > > On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
> > > > On 1/27/21 8:13 PM, Zorro Lang wrote:
> > > > > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
> > > > > > Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> > > > > > > On 1/27/21 9:38 AM, Christophe Leroy wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> > > > > > > > > On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> > > > > > > > > The fail source line is:
> > > > > > > > > 
> > > > > > > > >     if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> > > > > > > > >         return SIGSEGV;
> > > > > > > > > 
> > > > > > > > > The is_user() is based on user_mod(regs) only. This's not suit for
> > > > > > > > > io_uring, where the helper thread can assume the user app identity
> > > > > > > > > and could perform this fault just fine. So turn to use mm to decide
> > > > > > > > > if this is valid or not.
> > > > > > > > 
> > > > > > > > I don't understand why testing is_user would be an issue. KUAP purpose
> > > > > > > > it to block any unallowed access from kernel to user memory
> > > > > > > > (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> > > > > > > > that is what is_user provides.
> > > > > > > > 
> > > > > > > > If the kernel access is legitimate, kernel should have opened
> > > > > > > > userspace access then you shouldn't get this "Bug: Read fault blocked
> > > > > > > > by KUAP!".
> > > > > > > > 
> > > > > > > > As far as I understand, the fault occurs in
> > > > > > > > iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> > > > > > > > fault_in_pages_readable() uses __get_user() so it is a legitimate
> > > > > > > > access and you really should get a KUAP fault.
> > > > > > > > 
> > > > > > > > So the problem is somewhere else, I think you proposed patch just
> > > > > > > > hides the problem, it doesn't fix it.
> > > > > > > 
> > > > > > > If we do kthread_use_mm(), can we agree that the user access is valid?
> > > > > > 
> > > > > > Yeah the io uring code is fine, provided it uses the uaccess primitives
> > > > > > like any other kernel code. It's looking more like a an arch/powerpc bug.
> > > > > > 
> > > > > > > We should be able to copy to/from user space, and including faults, if
> > > > > > > that's been done and the new mm assigned. Because it really should be.
> > > > > > > If SMAP was a problem on x86, we would have seen it long ago.
> > > > > > > 
> > > > > > > I'm assuming this may be breakage related to the recent uaccess changes
> > > > > > > related to set_fs and friends? Or maybe recent changes on the powerpc
> > > > > > > side?
> > > > > > > 
> > > > > > > Zorro, did 5.10 work?
> > > > > > 
> > > > > > Would be interesting to know.
> > > > > 
> > > > > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
> > > > > commit(be the HEAD) in 5.10 phase?
> > > > 
> > > > I forget which versions had what series of this, but 5.10 final - and if
> > > > that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
> > > > 5.10 definitely has them.
> > > 
> > > I justed built linux v5.10 with same .config file, and gave it same test.
> > > v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
> > > 
> > > # ./check generic/013 generic/051
> > > FSTYP         -- xfs (non-debug)
> > > PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
> > > MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
> > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
> > > 
> > > generic/013 138s ...  77s
> > > generic/051 103s ...  143s
> > > Ran: generic/013 generic/051
> > > Passed all 2 tests
> > 
> > Thanks for testing that, so I think it's safe to conclude that there's a
> > regression in powerpc fault handling for kthreads that use
> > kthread_use_mm in this release. A bisect would definitely find it, but
> > might be pointless if Christophe or Nick already have an idea of what it
> > is.
> > 
> 
> I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops.

I just upload the vmlinux and .config file, the vmlinux it too big, I have to
upload it to my google store and share the link as below:

config file: https://drive.google.com/file/d/1pMszboxdjbMPqSNXnMH-1UCZC-vtDnI9/view?usp=sharing
vmlinux: https://drive.google.com/file/d/1s7g2eBPAFFV61aM2dO9bvVTERGQ9mLYk/view?usp=sharing

I used latest upstream mainline linux, HEAD commit is:
76c057c84d (HEAD -> master, origin/master, origin/HEAD) Merge branch 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

The test failed on this kernel as:

# dmesg
[   96.200296] ------------[ cut here ]------------
[   96.200304] Bug: Read fault blocked by KUAP!
[   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
[   96.200323] Modules linked in: bonding rfkill sunrpc pseries_rng uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks ip_tables xfs libcrc32c sd_mod t10_pi xts ibmvscsi ibmveth scsi_transport_srp vmx_crypto
[   96.200372] CPU: 3 PID: 1876 Comm: io_wqe_worker-0 Tainted: G        W         5.11.0-rc5+ #5
[   96.200380] NIP:  c00000000008f8a0 LR: c00000000008f89c CTR: 0000000000000000
[   96.200386] REGS: c00000000d3aafd0 TRAP: 0700   Tainted: G        W          (5.11.0-rc5+)
[   96.200393] MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 48082204  XER: 00000005
[   96.200416] CFAR: c00000000015ddac IRQMASK: 1 
               GPR00: c00000000008f89c c00000000d3ab270 c000000002116900 0000000000000020 
               GPR04: c000000001bec250 0000000000000001 00000001fbb80000 0000000000000027 
               GPR08: 0000000000000001 0000000000000000 c000000020fbba00 0000000000000001 
               GPR12: 0000000000002000 c00000001ecaae00 c00000000019dae8 c000000008d48040 
               GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
               GPR20: c0000000012d9650 fcffffffffffffff c000000002164018 c000000001262da0 
               GPR24: 0000000000000000 0000000000000000 0000000000000300 c000000010c27b80 
               GPR28: 0000000000200000 0000000000000000 0000010007ffce60 c00000000d3ab470 
[   96.200521] NIP [c00000000008f8a0] bad_kernel_fault+0x180/0x310
[   96.200528] LR [c00000000008f89c] bad_kernel_fault+0x17c/0x310
[   96.200535] Call Trace:
[   96.200539] [c00000000d3ab270] [c00000000008f89c] bad_kernel_fault+0x17c/0x310 (unreliable)
[   96.200551] [c00000000d3ab2f0] [c000000000090494] __do_page_fault+0x5f4/0x900
[   96.200561] [c00000000d3ab3b0] [c0000000000907dc] do_page_fault+0x3c/0x120
[   96.200570] [c00000000d3ab400] [c00000000000c748] handle_page_fault+0x10/0x2c
[   96.200579] --- interrupt: 300 at fault_in_pages_readable+0x104/0x350
[   96.200579] --- interrupt: 300 at fault_in_pages_readable+0x104/0x350
[   96.200586] NIP:  c000000000849424 LR: c00000000084952c CTR: c0000000006984a0
[   96.200592] REGS: c00000000d3ab470 TRAP: 0300   Tainted: G        W          (5.11.0-rc5+)
[   96.200598] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44082804  XER: 00000001
[   96.200628] CFAR: c000000000010330 DAR: 0000010007ffce60 DSISR: 00200000 IRQMASK: 0 
               GPR00: c00000000084952c c00000000d3ab710 c000000002116900 0000000000000000 
               GPR04: c000000010c27ce0 0000000000000001 00000001fbb80000 0000000000010000 
               GPR08: 00000000271cd0a4 0000000000000200 0000000000000200 0000000000000000 
               GPR12: 0000000000002000 c00000001ecaae00 c00000000019dae8 c000000008d48040 
               GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
               GPR20: c0000000012d9650 fcffffffffffffff c000000002164018 c000000001262da0 
               GPR24: c000000020fbba00 bfffffffffffffff 0000000000000000 a8aaaaaaaaaaaaaa 
               GPR28: 0000010008005d71 c00000000bec3e00 0000000000000000 0000010007ffce60 
[   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
[   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
[   96.200747] --- interrupt: 300
[   96.200752] [c00000000d3ab7a0] [c0000000008496d0] iov_iter_fault_in_readable+0x60/0x120
[   96.200761] [c00000000d3ab7e0] [c000000000698558] iomap_write_actor+0xb8/0x270
[   96.200771] [c00000000d3ab890] [c000000000693554] iomap_apply+0x2b4/0x740
[   96.200780] [c00000000d3ab9a0] [c000000000693dc0] iomap_file_buffered_write+0xa0/0x120
[   96.200790] [c00000000d3ab9f0] [c008000001d3efec] xfs_file_buffered_aio_write+0x354/0x590 [xfs]
[   96.200870] [c00000000d3aba90] [c0000000006691e4] io_write+0x104/0x4b0
[   96.200884] [c00000000d3abbb0] [c00000000066be54] io_issue_sqe+0x3d4/0xf50
[   96.200897] [c00000000d3abc60] [c00000000066f250] io_wq_submit_work+0xb0/0x2f0
[   96.200911] [c00000000d3abcb0] [c0000000006738a8] io_worker_handle_work+0x248/0x4a0
[   96.200925] [c00000000d3abd30] [c000000000673d28] io_wqe_worker+0x228/0x2a0
[   96.200939] [c00000000d3abda0] [c00000000019dc94] kthread+0x1b4/0x1c0
[   96.200950] [c00000000d3abe10] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[   96.200959] Instruction dump:
[   96.200965] e87f0100 4810b155 60000000 2c230000 4182ffa8 409200ac 3c82ff15 38847e38 
[   96.200987] 3c62ff15 38637ed0 480ce4ad 60000000 <0fe00000> e8010090 38210080 38600001 
[   96.201008] irq event stamp: 46
[   96.201013] hardirqs last  enabled at (45): [<c0000000005428c4>] __slab_free+0x414/0x610
[   96.201021] hardirqs last disabled at (46): [<c000000000008a04>] data_access_common_virt+0x1a4/0x1c0
[   96.201030] softirqs last  enabled at (0): [<c00000000015ae68>] copy_process+0x688/0x1600
[   96.201038] softirqs last disabled at (0): [<0000000000000000>] 0x0
[   96.201045] ---[ end trace c2373fad985a304b ]---

# ./scripts/faddr2line vmlinux bad_kernel_fault+0x180/0x310
bad_kernel_fault+0x180/0x310:
bad_kernel_fault at arch/powerpc/mm/fault.c:229 (discriminator 6)

    217         // Read/write fault blocked by KUAP is bad, it can never succeed.
    218         if (bad_kuap_fault(regs, address, is_write)) {
    219                 pr_crit_ratelimited("Kernel attempted to %s user page (%lx) - exploit attempt? (uid: %d)\n",
    220                                     is_write ? "write" : "read", address,
    221                                     from_kuid(&init_user_ns, current_uid()));
    222 
    223                 // Fault on user outside of certain regions (eg. copy_tofrom_user()) is bad
    224                 if (!search_exception_tables(regs->nip))
    225                         return true;
    226 
    227                 // Read/write fault in a valid region (the exception table search passed
    228                 // above), but blocked by KUAP is bad, it can never succeed.
    229                 return WARN(true, "Bug: %s fault blocked by KUAP!", is_write ? "Write" : "Read");

Thanks,
Zorro

> 
> Christophe
> 



More information about the Linuxppc-dev mailing list