remove the last set_fs() in common code, and remove it for x86 and powerpc v2

Christophe Leroy christophe.leroy at csgroup.eu
Wed Sep 2 03:13:00 AEST 2020


Hi Christoph,

Le 27/08/2020 à 17:00, Christoph Hellwig a écrit :
> Hi all,
> 
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.
> 
> The file system part has been posted a few times, and the read/write side
> has been pretty much unchanced.  For splice this series drops the
> conversion of the seq_file and sysctl code to the iter ops, and thus loses
> the splice support for them.  The reasons for that is that it caused a lot
> of churn for not much use - splice for these small files really isn't much
> of a win, even if existing userspace uses it.  All callers I found do the
> proper fallback, but if this turns out to be an issue the conversion can
> be resurrected.
> 
> Besides x86 and powerpc I plan to eventually convert all other
> architectures, although this will be a slow process, starting with the
> easier ones once the infrastructure is merged.  The process to convert
> architectures is roughtly:
> 
>   (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
>   (2) implement __get_kernel_nofault and __put_kernel_nofault
>   (3) remove the arch specific address limitation functionality
> 
> Changes since v1:
>   - drop the patch to remove the non-iter ops for /dev/zero and
>     /dev/null as they caused a performance regression
>   - don't enable user access in __get_kernel on powerpc
>   - xfail the set_fs() based lkdtm tests
> 
> Diffstat:
> 


I'm still sceptic with the results I get.

With 5.9-rc2:

root at vgoippro:~# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 5.585880 seconds, 91.7MB/s
real    0m 5.59s
user    0m 1.40s
sys     0m 4.19s


With your series:

root at vgoippro:/tmp# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 7.780540 seconds, 65.8MB/s
real    0m 7.79s
user    0m 2.12s
sys     0m 5.66s




Top of perf report of a standard perf record:

With 5.9-rc2:

     20.31%  dd       [kernel.kallsyms]  [k] __arch_clear_user
      8.37%  dd       [kernel.kallsyms]  [k] transfer_to_syscall
      7.37%  dd       [kernel.kallsyms]  [k] __fsnotify_parent
      6.95%  dd       [kernel.kallsyms]  [k] iov_iter_zero
      5.72%  dd       [kernel.kallsyms]  [k] new_sync_read
      4.87%  dd       [kernel.kallsyms]  [k] vfs_write
      4.47%  dd       [kernel.kallsyms]  [k] vfs_read
      3.07%  dd       [kernel.kallsyms]  [k] ksys_write
      2.77%  dd       [kernel.kallsyms]  [k] ksys_read
      2.65%  dd       [kernel.kallsyms]  [k] __fget_light
      2.37%  dd       [kernel.kallsyms]  [k] __fdget_pos
      2.35%  dd       [kernel.kallsyms]  [k] memset
      1.53%  dd       [kernel.kallsyms]  [k] rw_verify_area
      1.52%  dd       [kernel.kallsyms]  [k] read_iter_zero

With your series:
     19.60%  dd       [kernel.kallsyms]  [k] __arch_clear_user
     10.92%  dd       [kernel.kallsyms]  [k] iov_iter_zero
      9.50%  dd       [kernel.kallsyms]  [k] vfs_write
      8.97%  dd       [kernel.kallsyms]  [k] __fsnotify_parent
      5.46%  dd       [kernel.kallsyms]  [k] transfer_to_syscall
      5.42%  dd       [kernel.kallsyms]  [k] vfs_read
      3.58%  dd       [kernel.kallsyms]  [k] ksys_read
      2.84%  dd       [kernel.kallsyms]  [k] read_iter_zero
      2.24%  dd       [kernel.kallsyms]  [k] ksys_write
      1.80%  dd       [kernel.kallsyms]  [k] __fget_light
      1.34%  dd       [kernel.kallsyms]  [k] __fdget_pos
      0.91%  dd       [kernel.kallsyms]  [k] memset
      0.91%  dd       [kernel.kallsyms]  [k] rw_verify_area

Christophe


More information about the Linuxppc-dev mailing list