[linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression

Mon Nov 20 18:11:31 AEDT 2023

Hello,

kernel test robot noticed a -2.9% regression of will-it-scale.per_thread_ops on:

commit: 0ede61d8589cc2d93aa78230d74ac58b5b8d0244 ("file: convert to SLAB_TYPESAFE_BY_RCU")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: will-it-scale
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

	nr_task: 16
	mode: thread
	test: poll2
	cpufreq_governor: performance

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang at intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@intel.com

Details are as below:
-------------------------------------------------------------------------------------------------->

The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231120/202311201406.2022ca3f-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale

commit: 
  93faf426e3 ("vfs: shave work on failed file open")
  0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU")

93faf426e3cc000c 0ede61d8589cc2d93aa78230d74 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.01 ±  9%  +58125.6%       4.17 ±175%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     89056            -2.0%      87309        proc-vmstat.nr_slab_unreclaimable
     97958 ±  7%      -9.7%      88449 ±  4%  sched_debug.cpu.avg_idle.stddev
      0.00 ± 12%     +24.2%       0.00 ± 17%  sched_debug.cpu.next_balance.stddev
   6391048            -2.9%    6208584        will-it-scale.16.threads
    399440            -2.9%     388036        will-it-scale.per_thread_ops
   6391048            -2.9%    6208584        will-it-scale.workload
     19.99 ±  4%      -2.2       17.74        perf-profile.calltrace.cycles-pp.fput.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
      1.27 ±  5%      +0.8        2.11 ±  3%  perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
     32.69 ±  4%      +5.0       37.70        perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
      0.00           +27.9       27.85        perf-profile.calltrace.cycles-pp.__get_file_rcu.__fget_light.do_poll.do_sys_poll.__x64_sys_poll
     20.00 ±  4%      -2.3       17.75        perf-profile.children.cycles-pp.fput
      0.24 ± 10%      -0.1        0.18 ±  2%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.48 ±  5%      +0.5        1.98 ±  3%  perf-profile.children.cycles-pp.__fdget
     31.85 ±  4%      +6.0       37.86        perf-profile.children.cycles-pp.__fget_light
      0.00           +27.7       27.67        perf-profile.children.cycles-pp.__get_file_rcu
     30.90 ±  4%     -20.6       10.35 ±  2%  perf-profile.self.cycles-pp.__fget_light
     19.94 ±  4%      -2.4       17.53        perf-profile.self.cycles-pp.fput
      9.81 ±  4%      -2.4        7.42 ±  2%  perf-profile.self.cycles-pp.do_poll
      0.23 ± 11%      -0.1        0.17 ±  4%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.00           +26.5       26.48        perf-profile.self.cycles-pp.__get_file_rcu
 2.146e+10 ±  2%      +8.5%  2.329e+10 ±  2%  perf-stat.i.branch-instructions
      0.22 ± 14%      -0.0        0.19 ± 14%  perf-stat.i.branch-miss-rate%
 1.404e+10 ±  2%      +8.7%  1.526e+10 ±  2%  perf-stat.i.dTLB-stores
     70.87            -2.3       68.59        perf-stat.i.iTLB-load-miss-rate%
   5267608            -5.5%    4979133 ±  2%  perf-stat.i.iTLB-load-misses
   2102507            +5.4%    2215725        perf-stat.i.iTLB-loads
     18791 ±  3%     +10.5%      20757 ±  2%  perf-stat.i.instructions-per-iTLB-miss
    266.67 ±  2%      +6.8%     284.75 ±  2%  perf-stat.i.metric.M/sec
      0.01 ± 10%     -10.5%       0.01 ±  5%  perf-stat.overall.MPKI
      0.19            -0.0        0.17        perf-stat.overall.branch-miss-rate%
      0.65            -3.1%       0.63        perf-stat.overall.cpi
      0.00 ±  4%      -0.0        0.00 ±  4%  perf-stat.overall.dTLB-store-miss-rate%
     71.48            -2.3       69.21        perf-stat.overall.iTLB-load-miss-rate%
     18757           +10.0%      20629        perf-stat.overall.instructions-per-iTLB-miss
      1.54            +3.2%       1.59        perf-stat.overall.ipc
   4795147            +6.4%    5100406        perf-stat.overall.path-length
  2.14e+10 ±  2%      +8.5%  2.322e+10 ±  2%  perf-stat.ps.branch-instructions
   1.4e+10 ±  2%      +8.7%  1.522e+10 ±  2%  perf-stat.ps.dTLB-stores
   5253923            -5.5%    4966218 ±  2%  perf-stat.ps.iTLB-load-misses
   2095770            +5.4%    2208605        perf-stat.ps.iTLB-loads
 3.065e+13            +3.3%  3.167e+13        perf-stat.total.instructions

Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki