[PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

Chen Rong rong.a.chen at intel.com
Wed Feb 13 20:19:36 AEDT 2019


Hi all,

Kernel test robot reported a will-it-scale.per_thread_ops -64.1% regression on IVB-desktop for v4.20-rc1.
The first bad commit is: 9bc8039e715da3b53dbac89525323a9f2f69b7b5, Yang Shi <yang.shi at linux.alibaba.com>: mm: brk: downgrade mmap_sem to read when shrinking
(https://lists.01.org/pipermail/lkp/2018-November/009335.html).

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-ivb-d01/brk1/will-it-scale/0x20

commit: 
  85a06835f6 ("mm: mremap: downgrade mmap_sem to read when shrinking")
  9bc8039e71 ("mm: brk: downgrade mmap_sem to read when shrinking")

85a06835f6f1ba79 9bc8039e715da3b53dbac89525 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    196250 ±  8%     -64.1%      70494        will-it-scale.per_thread_ops
    127330 ± 19%     -98.0%       2525 ± 24%  will-it-scale.time.involuntary_context_switches
    727.50 ±  2%     -77.0%     167.25        will-it-scale.time.percent_of_cpu_this_job_got
      2141 ±  2%     -77.6%     479.12        will-it-scale.time.system_time
     50.48 ±  7%     -48.5%      25.98        will-it-scale.time.user_time
  34925294 ± 18%    +270.3%  1.293e+08 ±  4%  will-it-scale.time.voluntary_context_switches
   1570007 ±  8%     -64.1%     563958        will-it-scale.workload
      6435 ±  2%      -6.4%       6024        proc-vmstat.nr_shmem
      1298 ± 16%     -44.5%     721.00 ± 18%  proc-vmstat.pgactivate
      2341           +16.4%       2724        slabinfo.kmalloc-96.active_objs
      2341           +16.4%       2724        slabinfo.kmalloc-96.num_objs
      6346 ±150%     -87.8%     776.25 ±  9%  softirqs.NET_RX
    160107 ±  8%    +151.9%     403273        softirqs.SCHED
   1097999           -13.0%     955526        softirqs.TIMER
      5.50 ±  9%     -81.8%       1.00        vmstat.procs.r
    230700 ± 19%    +269.9%     853292 ±  4%  vmstat.system.cs
     26706 ±  3%     +15.7%      30910 ±  5%  vmstat.system.in
     11.24 ± 23%     +72.2       83.39        mpstat.cpu.idle%
      0.00 ±131%      +0.0        0.04 ± 99%  mpstat.cpu.iowait%
     86.32 ±  2%     -70.8       15.54        mpstat.cpu.sys%
      2.44 ±  7%      -1.4        1.04 ±  8%  mpstat.cpu.usr%
  20610709 ± 15%   +2376.0%  5.103e+08 ± 34%  cpuidle.C1.time
   3233399 ±  8%    +241.5%   11042785 ± 25%  cpuidle.C1.usage
  36172040 ±  6%    +931.3%   3.73e+08 ± 15%  cpuidle.C1E.time
    783605 ±  4%    +548.7%    5083041 ± 18%  cpuidle.C1E.usage
  28753819 ± 39%   +1054.5%  3.319e+08 ± 49%  cpuidle.C3.time
    283912 ± 25%    +688.4%    2238225 ± 34%  cpuidle.C3.usage
 1.507e+08 ± 47%    +292.3%  5.913e+08 ± 28%  cpuidle.C6.time
    339861 ± 37%    +549.7%    2208222 ± 24%  cpuidle.C6.usage
   2709719 ±  5%    +824.2%   25043444        cpuidle.POLL.time
  28602864 ± 18%    +173.7%   78276116 ± 10%  cpuidle.POLL.usage


We found that the patchset could fix the regression.

tests: 1
testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-brk1-ucode=0x20/lkp-ivb-d01

commit: 
  85a06835f6 ("mm: mremap: downgrade mmap_sem to read when shrinking")
  fb835fe7f0 ("locking/rwsem: Ensure an RT task will not spin on reader")

85a06835f6f1ba79  fb835fe7f0adbd7c2c074b98ec  
----------------  --------------------------  
         %stddev      change         %stddev
             \          |                \  
    120736 ± 22%        56%     188019 ±  6%  will-it-scale.time.involuntary_context_switches
      2126 ±  3%         4%       2215        will-it-scale.time.system_time
       722 ±  3%         4%        752        will-it-scale.time.percent_of_cpu_this_job_got
  36256485 ± 27%       -35%   23682989 ±  3%  will-it-scale.time.voluntary_context_switches
      3151 ±  9%        11%       3504        turbostat.Avg_MHz
    229285 ± 32%       -30%     160660 ±  3%  vmstat.system.cs
    120736 ± 22%        56%     188019 ±  6%  time.involuntary_context_switches
      2126 ±  3%         4%       2215        time.system_time
       722 ±  3%         4%        752        time.percent_of_cpu_this_job_got
  36256485 ± 27%       -35%   23682989 ±  3%  time.voluntary_context_switches
        23             643%        171 ±  3%  proc-vmstat.nr_zone_inactive_file
        23             643%        171 ±  3%  proc-vmstat.nr_inactive_file
      3664              12%       4121        proc-vmstat.nr_kernel_stack
      6392               6%       6785        proc-vmstat.nr_slab_unreclaimable
      9991                       10176        proc-vmstat.nr_slab_reclaimable
     63938                       62394        proc-vmstat.nr_zone_active_anon
     63938                       62394        proc-vmstat.nr_active_anon
    386388 ±  9%        -6%     362272        proc-vmstat.pgfree
    368296 ±  9%       -10%     333074        proc-vmstat.numa_hit
    368296 ±  9%       -10%     333074        proc-vmstat.numa_local
      5169 ± 13%       -28%       3745        proc-vmstat.nr_shmem
      1801 ± 21%       -83%        309        proc-vmstat.pgactivate
         0            1e+04      11441        latency_stats.avg.msleep.cpuinfo_open.proc_reg_open.do_dentry_open.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
     13165 ±222%     -1e+04          0        latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk.path_lookupat.filename_lookup
     22499 ±151%     -2e+04        657 ±  7%  latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup
    117414 ±181%     -9e+04      24418 ± 44%  latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
    666005 ±218%     -7e+05        198 ±141%  latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
   2600097 ±132%     -3e+06        572        latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat
  34391390 ±150%     -3e+07      21807 ±141%  latency_stats.avg.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
  34624774 ±149%     -3e+07      37668 ± 58%  latency_stats.avg.max
         0            1e+04      11441        latency_stats.max.msleep.cpuinfo_open.proc_reg_open.do_dentry_open.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22499 ±151%     -2e+04        657 ±  7%  latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup
     37845 ±222%     -4e+04          0        latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk.path_lookupat.filename_lookup
     80096 ± 59%     -8e+04          0        latency_stats.max.call_rwsem_down_write_failed_killable.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
    177149 ±195%     -2e+05      24418 ± 44%  latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
    689417 ±209%     -7e+05        200 ±141%  latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
  18679699 ±129%     -2e+07        656        latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat
  83587334 ±129%     -8e+07      43457 ±141%  latency_stats.max.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
  84867236 ±126%     -8e+07      59318 ± 86%  latency_stats.max.max
         0            1e+04      11441        latency_stats.sum.msleep.cpuinfo_open.proc_reg_open.do_dentry_open.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22499 ±151%     -2e+04        657 ±  7%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup
     39431 ±222%     -4e+04          0        latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup_revalidate.lookup_fast.walk_component.link_path_walk.path_lookupat.filename_lookup
    216448 ±200%     -2e+05      24418 ± 44%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
    691960 ±208%     -7e+05        397 ±141%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_access.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat.filename_lookup
  24239011 ±140%     -2e+07       4768 ± 10%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat
 1.771e+08 ±122%     -2e+08      43614 ±141%  latency_stats.sum.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
 1.939e+08 ± 36%     -2e+08          0        latency_stats.sum.call_rwsem_down_write_failed_killable.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
 2.943e+08 ± 51%     -2e+08   51929782        latency_stats.sum.max
    407463 ± 10%      -100%          0        perf-stat.total.page-faults
  74225651 ± 26%      -100%          0        perf-stat.total.context-switches
     55293 ± 25%      -100%          0        perf-stat.total.cpu-migrations
    407463 ± 10%      -100%          0        perf-stat.total.minor-faults


tests: 1
testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-brk1-ucode=0x20/lkp-ivb-d01

commit: 
  9bc8039e71 ("mm: brk: downgrade mmap_sem to read when shrinking")
  fb835fe7f0 ("locking/rwsem: Ensure an RT task will not spin on reader")

9bc8039e715da3b5  fb835fe7f0adbd7c2c074b98ec  
----------------  --------------------------  
         %stddev      change         %stddev
             \          |                \  
      3500 ± 36%      5272%     188019 ±  6%  will-it-scale.time.involuntary_context_switches
       483             358%       2215        will-it-scale.time.system_time
       168             346%        752        will-it-scale.time.percent_of_cpu_this_job_got
     71190             180%     199232 ±  4%  will-it-scale.per_thread_ops
    569524             180%    1593862 ±  4%  will-it-scale.workload
     25.85              93%      49.95 ±  3%  will-it-scale.time.user_time
 1.314e+08 ±  3%       -82%   23682989 ±  3%  will-it-scale.time.voluntary_context_switches
     30501 ±  9%       -15%      25813 ±  4%  vmstat.system.in
    799593 ± 10%       -80%     160660 ±  3%  vmstat.system.cs
       887 ± 11%       295%       3504        turbostat.Avg_MHz
     23.60 ± 10%        68%      39.54        turbostat.CorWatt
     28.38 ±  8%        57%      44.43        turbostat.PkgWatt
      3500 ± 36%      5272%     188019 ±  6%  time.involuntary_context_switches
       483             358%       2215        time.system_time
       168             346%        752        time.percent_of_cpu_this_job_got
     25.85              93%      49.95 ±  3%  time.user_time
 1.314e+08 ±  3%       -82%   23682989 ±  3%  time.voluntary_context_switches
         0 ± 44%     46220%        386        proc-vmstat.nr_zone_active_file
         0 ± 44%     46220%        386        proc-vmstat.nr_active_file
        23             643%        171 ±  3%  proc-vmstat.nr_zone_inactive_file
        23             643%        171 ±  3%  proc-vmstat.nr_inactive_file
      3690              12%       4121        proc-vmstat.nr_kernel_stack
      6419               6%       6785        proc-vmstat.nr_slab_unreclaimable
      9961                       10176        proc-vmstat.nr_slab_reclaimable
    229251                      231278        proc-vmstat.nr_zone_unevictable
    229251                      231278        proc-vmstat.nr_unevictable
      1008                        1005        proc-vmstat.nr_page_table_pages
     63178                       62394        proc-vmstat.nr_zone_active_anon
     63178                       62394        proc-vmstat.nr_active_anon
    432061 ± 12%       -11%     385372        proc-vmstat.pgfault
    408099 ± 10%       -11%     362272        proc-vmstat.pgfree
    422206 ±  9%       -11%     373690        proc-vmstat.pgalloc_normal
    382357 ± 11%       -13%     333074        proc-vmstat.numa_hit
    382357 ± 11%       -13%     333074        proc-vmstat.numa_local
      4428 ± 17%       -15%       3745        proc-vmstat.nr_shmem
         0            1e+04      11441        latency_stats.avg.msleep.cpuinfo_open.proc_reg_open.do_dentry_open.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
     11180 ±168%     -1e+04        657 ±  7%  latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup
     19239 ±223%     -2e+04          0        latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
     63702 ±169%     -4e+04      24418 ± 44%  latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
     77617 ±205%     -8e+04        510 ± 11%  latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
   3043762 ±124%     -3e+06        572        latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat
  11630441 ±139%     -1e+07      21807 ±141%  latency_stats.avg.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
  12242832 ±129%     -1e+07      37668 ± 58%  latency_stats.avg.max
         0            1e+04      11441        latency_stats.max.msleep.cpuinfo_open.proc_reg_open.do_dentry_open.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
     11180 ±168%     -1e+04        657 ±  7%  latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup
     19239 ±223%     -2e+04          0        latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
     29152 ± 11%     -3e+04          0        latency_stats.max.call_rwsem_down_write_failed_killable.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     65909 ±164%     -4e+04      24418 ± 44%  latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
     77617 ±205%     -8e+04        510 ± 11%  latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
  17301268 ±125%     -2e+07        656        latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat
  44248611 ±140%     -4e+07      43457 ±141%  latency_stats.max.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
  46380610 ±130%     -5e+07      59318 ± 86%  latency_stats.max.max
         0            1e+04      11441        latency_stats.sum.msleep.cpuinfo_open.proc_reg_open.do_dentry_open.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
     11180 ±168%     -1e+04        657 ±  7%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.__lookup_slow.lookup_slow.walk_component.path_lookupat.filename_lookup
     19239 ±223%     -2e+04          0        latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_get_acl.get_acl.posix_acl_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open
     74047 ±148%     -5e+04      24418 ± 44%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_do_create.nfs3_proc_create.nfs_create.path_openat.do_filp_open.do_sys_open.do_syscall_64
     77617 ±205%     -8e+04        510 ± 11%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_lookup.nfs_lookup.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
  26043088 ±130%     -3e+07       4768 ± 10%  latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_run_task.rpc_call_sync.nfs3_rpc_wrapper.nfs3_proc_getattr.__nfs_revalidate_inode.nfs_do_access.nfs_permission.inode_permission.link_path_walk.path_lookupat
  82480038 ±152%     -8e+07      43614 ±141%  latency_stats.sum.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
 1.771e+09           -2e+09   51929782        latency_stats.sum.max
 1.771e+09           -2e+09          0        latency_stats.sum.call_rwsem_down_write_failed_killable.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
    420016 ± 12%      -100%          0        perf-stat.total.page-faults
 2.648e+08 ±  3%      -100%          0        perf-stat.total.context-switches
     52212 ± 18%      -100%          0        perf-stat.total.cpu-migrations
    420016 ± 12%      -100%          0        perf-stat.total.minor-faults

Best Regards,
Rong Chen


More information about the Linuxppc-dev mailing list