[PATCH 0/5] Correct perf sampling with guest VMs
Colton Lewis
coltonlewis at google.com
Thu Sep 5 06:41:28 AEST 2024
This series cleans up perf recording around guest events and improves
the accuracy of the resulting perf reports.
Perf was incorrectly counting any PMU overflow interrupt that occured
while a VCPU was loaded as a guest event even when the events were not
truely guest events. This lead to much less accurate and useful perf
recordings.
See as an example the below reports of `perf record
dirty_log_perf_test -m 2 -v 4` before and after the series on ARM64.
Without series:
Samples: 15K of event 'instructions', Event count (approx.): 31830580924
Overhead Command Shared Object Symbol
54.54% dirty_log_perf_ dirty_log_perf_test [.] run_test
5.39% dirty_log_perf_ dirty_log_perf_test [.] vcpu_worker
0.89% dirty_log_perf_ [kernel.vmlinux] [k] release_pages
0.70% dirty_log_perf_ [kernel.vmlinux] [k] free_pcppages_bulk
0.62% dirty_log_perf_ dirty_log_perf_test [.] userspace_mem_region_find
0.49% dirty_log_perf_ dirty_log_perf_test [.] sparsebit_is_set
0.46% dirty_log_perf_ dirty_log_perf_test [.] _virt_pg_map
0.46% dirty_log_perf_ dirty_log_perf_test [.] node_add
0.37% dirty_log_perf_ dirty_log_perf_test [.] node_reduce
0.35% dirty_log_perf_ [kernel.vmlinux] [k] free_unref_page_commit
0.33% dirty_log_perf_ [kernel.vmlinux] [k] __kvm_pgtable_walk
0.31% dirty_log_perf_ [kernel.vmlinux] [k] stage2_attr_walker
0.29% dirty_log_perf_ [kernel.vmlinux] [k] unmap_page_range
0.29% dirty_log_perf_ dirty_log_perf_test [.] test_assert
0.26% dirty_log_perf_ [kernel.vmlinux] [k] __mod_memcg_lruvec_state
0.24% dirty_log_perf_ [kernel.vmlinux] [k] kvm_s2_put_page
With series:
Samples: 15K of event 'instructions', Event count (approx.): 31830580924
Samples: 15K of event 'instructions', Event count (approx.): 30898031385
Overhead Command Shared Object Symbol
54.05% dirty_log_perf_ dirty_log_perf_test [.] run_test
5.48% dirty_log_perf_ [kernel.kallsyms] [k] kvm_arch_vcpu_ioctl_run
4.70% dirty_log_perf_ dirty_log_perf_test [.] vcpu_worker
3.11% dirty_log_perf_ [kernel.kallsyms] [k] kvm_handle_guest_abort
2.24% dirty_log_perf_ [kernel.kallsyms] [k] up_read
1.98% dirty_log_perf_ [kernel.kallsyms] [k] __kvm_tlb_flush_vmid_ipa_nsh
1.97% dirty_log_perf_ [kernel.kallsyms] [k] __pi_clear_page
1.30% dirty_log_perf_ [kernel.kallsyms] [k] down_read
1.13% dirty_log_perf_ [kernel.kallsyms] [k] release_pages
1.12% dirty_log_perf_ [kernel.kallsyms] [k] __kvm_pgtable_walk
1.08% dirty_log_perf_ [kernel.kallsyms] [k] folio_batch_move_lru
1.06% dirty_log_perf_ [kernel.kallsyms] [k] __srcu_read_lock
1.03% dirty_log_perf_ [kernel.kallsyms] [k] get_page_from_freelist
1.01% dirty_log_perf_ [kernel.kallsyms] [k] __pte_offset_map_lock
0.82% dirty_log_perf_ [kernel.kallsyms] [k] handle_mm_fault
0.74% dirty_log_perf_ [kernel.kallsyms] [k] mas_state_walk
Colton Lewis (5):
arm: perf: Drop unused functions
perf: Hoist perf_instruction_pointer() and perf_misc_flags()
powerpc: perf: Use perf_arch_instruction_pointer()
x86: perf: Refactor misc flag assignments
perf: Correct perf sampling with guest VMs
arch/arm/include/asm/perf_event.h | 7 ---
arch/arm/kernel/perf_callchain.c | 17 -------
arch/arm64/include/asm/perf_event.h | 4 --
arch/arm64/kernel/perf_callchain.c | 28 ------------
arch/powerpc/include/asm/perf_event_server.h | 6 +--
arch/powerpc/perf/callchain.c | 2 +-
arch/powerpc/perf/callchain_32.c | 2 +-
arch/powerpc/perf/callchain_64.c | 2 +-
arch/powerpc/perf/core-book3s.c | 4 +-
arch/s390/include/asm/perf_event.h | 6 +--
arch/s390/kernel/perf_event.c | 4 +-
arch/x86/events/core.c | 47 +++++++++++---------
arch/x86/include/asm/perf_event.h | 12 ++---
include/linux/perf_event.h | 26 +++++++++--
kernel/events/core.c | 27 ++++++++++-
15 files changed, 95 insertions(+), 99 deletions(-)
--
2.46.0.469.g59c65b2a67-goog
More information about the Linuxppc-dev
mailing list