[PATCH] tools/perf/test: Check for perf stat return code in perf all PMU test
Ian Rogers
irogers at google.com
Sat Apr 4 02:39:10 AEDT 2026
On Fri, Apr 3, 2026 at 12:36 AM Mi, Dapeng <dapeng1.mi at linux.intel.com> wrote:
>
>
> On 4/3/2026 1:32 AM, Falcon, Thomas wrote:
> > On Wed, 2026-04-01 at 13:40 -0700, Ian Rogers wrote:
> >> On Mon, Mar 23, 2026 at 3:40 AM Venkat <venkat88 at linux.ibm.com>
> >> wrote:
> >>>
> >>>
> >>>> On 15 Mar 2026, at 4:27 PM, Athira Rajeev
> >>>> <atrajeev at linux.ibm.com> wrote:
> >>>>
> >>>> Currently in "perf all PMU test", for "perf stat -e <event>
> >>>> true",
> >>>> below checks are done:
> >>>> - if return code is zero, look for "not supported" to decide pass
> >>>> scenario
> >>>> - check for "not supported" to ignore the event
> >>>> - looks for "No permission to enable" to skip the event.
> >>>> - If output has "Bad event name", fail the test.
> >>>> - Use "Access to performance monitoring and observability
> >>>> operations is
> >>>> limited." to ignore fail due to access limitations
> >>>>
> >>>> If we failed to see event and it is supported, retries with
> >>>> longer
> >>>> workload "perf bench internals synthesize".
> >>>> - Here if output has <event>, the test is a pass.
> >>>>
> >>>> Snippet of code check:
> >>>> ```
> >>>> output=$(perf stat -e "$p" perf bench internals synthesize 2>&1)
> >>>> if echo "$output" | grep -q "$p"
> >>>> ```
> >>>> - if output doesn't have event printed in logs, considers it
> >>>> fail.
> >>>>
> >>>> But this results in false pass for events in some cases.
> >>>> Example, if perf stat fails as below:
> >>>>
> >>>> # ./perf stat -e pmu/event/ true
> >>>> event syntax error: 'pmu/event/'
> >>>> \___ Bad event or PMU
> >>>>
> >>>> Unable to find PMU or event on a PMU of 'pmu'
> >>>> Run 'perf list' for a list of valid events
> >>>>
> >>>> Usage: perf stat [<options>] [<command>]
> >>>>
> >>>> -e, --event <event> event selector. use 'perf list' to list
> >>>> available events
> >>>> # echo $?
> >>>> 129
> >>>>
> >>>> Since this has non-zero return code and doesn't have the
> >>>> fail strings being checked in the test, it will enter check using
> >>>> longer workload. and since the output fail log has event, it
> >>>> declares test as "supported".
> >>>>
> >>>> Since all the fail strings can't be added in the check, update
> >>>> the testcase to check return code before proceeding to longer
> >>>> workload run.
> >>>>
> >>>> Another missing scenario is when system wide monitoring is
> >>>> supported
> >>>> example:
> >>>> # ./perf stat -e pmu/event/ true
> >>>> Error:
> >>>> No supported events found.
> >>>> Unsupported event (pmu/event/H) in per-thread mode, enable
> >>>> system wide with '-a'.
> >>>>
> >>>> Update testcase to check with "perf stat -a -e $p" as well
> >>>>
> >>>> Signed-off-by: Athira Rajeev <atrajeev at linux.ibm.com>
> >>>> ---
> >>> Tested this patch.
> >>>
> >>>
> >>> With this patch:
> >>>
> >>> Testing hv_24x7/CPM_ADJUNCT_INST/ -- perf stat failed with non-zero
> >>> return code
> >>> Testing hv_24x7/CPM_ADJUNCT_PCYC/ -- perf stat failed with non-zero
> >>> return code
> >>>
> >>>
> >>>
> >>> Tested-by: Venkat Rao Bagalkote <venkat88 at linux.ibm.com>
> >> Testing on an Intel Alderlake the test is now failing:
> >> ```
> >> ...
> >> Testing offcore_requests_outstanding.l3_miss_demand_data_rd --
> >> supported
> >> Testing ocr.full_streaming_wr.any_response -- perf stat failed with
> >> non-zero return code
> >> Testing ocr.partial_streaming_wr.any_response -- perf stat failed
> >> with
> >> non-zero return code
> >> Testing ocr.streaming_wr.any_response -- supported
> >> ...
> >> ```
> >>
> >> Running `perf stat` manually reveals an issue with the event:
> >> ```
> >> $ sudo perf stat -vv -e ocr.full_streaming_wr.any_response -a sleep
> >> 1
> >> Using CPUID GenuineIntel-6-B7-1
> >> Attempt to add: cpu_atom/ocr.full_streaming_wr.any_response/
> >> ..after resolving event:
> >> cpu_atom/event=0xb7,period=0x186a3,umask=0x1,offcore_rsp=0x8000000100
> >> 00/
> >> ocr.full_streaming_wr.any_response ->
> >> cpu_atom/ocr.full_streaming_wr.any_response/
> >> Control descriptor is not initialized
> >> ------------------------------------------------------------
> >> perf_event_attr:
> >> type 10 (cpu_atom)
> >> size 144
> >> ------------------------------------------------------------
> >> perf_event_attr:
> >> type 0 (PERF_TYPE_HARDWARE)
> >> config 0xa00000000
> >> (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
> >> disabled 1
> >> ------------------------------------------------------------
> >> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
> >> ------------------------------------------------------------
> >> perf_event_attr:
> >> type 0 (PERF_TYPE_HARDWARE)
> >> config 0x400000000
> >> (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
> >> disabled 1
> >> ------------------------------------------------------------
> >> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
> >> config 0x1b7
> >> (ocr.demand_data_rd.l3_hit.snoop_hit_no_fwd)
> >> sample_type IDENTIFIER
> >> read_format
> >> TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >> disabled 1
> >> inherit 1
> >> { bp_addr, config1 } 0x800000010000
> >> ------------------------------------------------------------
> >> sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8
> >> sys_perf_event_open failed, error -22
> >> switching off deferred callchain support
> >> Warning:
> >> ocr.full_streaming_wr.any_response event is not supported by the
> >> kernel.
> >> The sys_perf_event_open() syscall failed for event
> >> (ocr.full_streaming_wr.any_response): Invalid argument
> >> "dmesg | grep -i perf" may provide additional information.
> >>
> >> Error:
> >> No supported events found.
> >> The sys_perf_event_open() syscall failed for event
> >> (ocr.full_streaming_wr.any_response): Invalid argument
> >> "dmesg | grep -i perf" may provide additional information.
> >> ```
> >>
> >> This looks like a latent Intel cpu_atom PMU bug. Thomas, wdyt?
>
> Hmm, it looks the error is caused by the invalid bitmask of OFFCORE_RSP_x
> MSRs. Currently the valid bitmask of OFFCORE_RSP_x MSR is set to
> 0x3fffffffff in intel_grt_extra_regs[], while the msr value is set
> 0x800000010000 for the ocr.full_streaming_wr.any_response event. The bit 47
> is recognized an invalid bit and then abort the event creation.
>
> Base on the description "Table 21-56. MSR_OFFCORE_RSPx Request Type
> Definition" in SDM, bit 47 should be a valid bit now. Suppose bit 47 should
> not be a valid bit when adding the ADL PMU support, but it's updated and
> becomes valid later.
>
> Along with the constant updates of perf event lists
> (https://github.com/intel/perfmon), we have noticed there are mismatches
> more or less between the driver hardcoded events and perfmon event list.
> Currently we are summarizing the mismatches. Once these mismatches are
> finalized. we would submit a patchset to fix these mismatches.
That's great, if it takes too long perhaps we could just remove the
events for now.
Thanks,
Ian
> Thanks.
>
> > +Dapeng, Zide, Andi
> >
> > Thanks,
> > Tom
> >
> >> Thanks,
> >> Ian
> >>
> >>> Regards,
> >>> Venkat.
> >>>
> >>>
> >>>
> >>>> tools/perf/tests/shell/stat_all_pmu.sh | 20 ++++++++++++++++++++
> >>>> 1 file changed, 20 insertions(+)
> >>>>
> >>>> diff --git a/tools/perf/tests/shell/stat_all_pmu.sh
> >>>> b/tools/perf/tests/shell/stat_all_pmu.sh
> >>>> index 9c466c0efa85..6c4d59cbfa5f 100755
> >>>> --- a/tools/perf/tests/shell/stat_all_pmu.sh
> >>>> +++ b/tools/perf/tests/shell/stat_all_pmu.sh
> >>>> @@ -53,6 +53,26 @@ do
> >>>> continue
> >>>> fi
> >>>>
> >>>> + # check with system wide if it is supported.
> >>>> + output=$(perf stat -a -e "$p" true 2>&1)
> >>>> + stat_result=$?
> >>>> + if echo "$output" | grep -q "not supported"
> >>>> + then
> >>>> + # Event not supported, so ignore.
> >>>> + echo "not supported"
> >>>> + continue
> >>>> + fi
> >>>> +
> >>>> + # checked through possible access limitations and permissions.
> >>>> + # At this step, non-zero return code from "perf stat" needs to
> >>>> + # reported as fail for the user to investigate
> >>>> + if [ $stat_result -ne 0 ]
> >>>> + then
> >>>> + echo "perf stat failed with non-zero return code"
> >>>> + err=1
> >>>> + continue
> >>>> + fi
> >>>> +
> >>>> # We failed to see the event and it is supported. Possibly the
> >>>> workload was
> >>>> # too small so retry with something longer.
> >>>> output=$(perf stat -e "$p" perf bench internals synthesize
> >>>> 2>&1)
> >>>> --
> >>>> 2.47.3
> >>>>
More information about the Linuxppc-dev
mailing list