[PATCH] tools/perf/test: Check for perf stat return code in perf all PMU test
Mi, Dapeng
dapeng1.mi at linux.intel.com
Fri Apr 3 18:36:20 AEDT 2026
On 4/3/2026 1:32 AM, Falcon, Thomas wrote:
> On Wed, 2026-04-01 at 13:40 -0700, Ian Rogers wrote:
>> On Mon, Mar 23, 2026 at 3:40 AM Venkat <venkat88 at linux.ibm.com>
>> wrote:
>>>
>>>
>>>> On 15 Mar 2026, at 4:27 PM, Athira Rajeev
>>>> <atrajeev at linux.ibm.com> wrote:
>>>>
>>>> Currently in "perf all PMU test", for "perf stat -e <event>
>>>> true",
>>>> below checks are done:
>>>> - if return code is zero, look for "not supported" to decide pass
>>>> scenario
>>>> - check for "not supported" to ignore the event
>>>> - looks for "No permission to enable" to skip the event.
>>>> - If output has "Bad event name", fail the test.
>>>> - Use "Access to performance monitoring and observability
>>>> operations is
>>>> limited." to ignore fail due to access limitations
>>>>
>>>> If we failed to see event and it is supported, retries with
>>>> longer
>>>> workload "perf bench internals synthesize".
>>>> - Here if output has <event>, the test is a pass.
>>>>
>>>> Snippet of code check:
>>>> ```
>>>> output=$(perf stat -e "$p" perf bench internals synthesize 2>&1)
>>>> if echo "$output" | grep -q "$p"
>>>> ```
>>>> - if output doesn't have event printed in logs, considers it
>>>> fail.
>>>>
>>>> But this results in false pass for events in some cases.
>>>> Example, if perf stat fails as below:
>>>>
>>>> # ./perf stat -e pmu/event/ true
>>>> event syntax error: 'pmu/event/'
>>>> \___ Bad event or PMU
>>>>
>>>> Unable to find PMU or event on a PMU of 'pmu'
>>>> Run 'perf list' for a list of valid events
>>>>
>>>> Usage: perf stat [<options>] [<command>]
>>>>
>>>> -e, --event <event> event selector. use 'perf list' to list
>>>> available events
>>>> # echo $?
>>>> 129
>>>>
>>>> Since this has non-zero return code and doesn't have the
>>>> fail strings being checked in the test, it will enter check using
>>>> longer workload. and since the output fail log has event, it
>>>> declares test as "supported".
>>>>
>>>> Since all the fail strings can't be added in the check, update
>>>> the testcase to check return code before proceeding to longer
>>>> workload run.
>>>>
>>>> Another missing scenario is when system wide monitoring is
>>>> supported
>>>> example:
>>>> # ./perf stat -e pmu/event/ true
>>>> Error:
>>>> No supported events found.
>>>> Unsupported event (pmu/event/H) in per-thread mode, enable
>>>> system wide with '-a'.
>>>>
>>>> Update testcase to check with "perf stat -a -e $p" as well
>>>>
>>>> Signed-off-by: Athira Rajeev <atrajeev at linux.ibm.com>
>>>> ---
>>> Tested this patch.
>>>
>>>
>>> With this patch:
>>>
>>> Testing hv_24x7/CPM_ADJUNCT_INST/ -- perf stat failed with non-zero
>>> return code
>>> Testing hv_24x7/CPM_ADJUNCT_PCYC/ -- perf stat failed with non-zero
>>> return code
>>>
>>>
>>>
>>> Tested-by: Venkat Rao Bagalkote <venkat88 at linux.ibm.com>
>> Testing on an Intel Alderlake the test is now failing:
>> ```
>> ...
>> Testing offcore_requests_outstanding.l3_miss_demand_data_rd --
>> supported
>> Testing ocr.full_streaming_wr.any_response -- perf stat failed with
>> non-zero return code
>> Testing ocr.partial_streaming_wr.any_response -- perf stat failed
>> with
>> non-zero return code
>> Testing ocr.streaming_wr.any_response -- supported
>> ...
>> ```
>>
>> Running `perf stat` manually reveals an issue with the event:
>> ```
>> $ sudo perf stat -vv -e ocr.full_streaming_wr.any_response -a sleep
>> 1
>> Using CPUID GenuineIntel-6-B7-1
>> Attempt to add: cpu_atom/ocr.full_streaming_wr.any_response/
>> ..after resolving event:
>> cpu_atom/event=0xb7,period=0x186a3,umask=0x1,offcore_rsp=0x8000000100
>> 00/
>> ocr.full_streaming_wr.any_response ->
>> cpu_atom/ocr.full_streaming_wr.any_response/
>> Control descriptor is not initialized
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 10 (cpu_atom)
>> size 144
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 0 (PERF_TYPE_HARDWARE)
>> config 0xa00000000
>> (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
>> disabled 1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
>> ------------------------------------------------------------
>> perf_event_attr:
>> type 0 (PERF_TYPE_HARDWARE)
>> config 0x400000000
>> (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
>> disabled 1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
>> config 0x1b7
>> (ocr.demand_data_rd.l3_hit.snoop_hit_no_fwd)
>> sample_type IDENTIFIER
>> read_format
>> TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>> disabled 1
>> inherit 1
>> { bp_addr, config1 } 0x800000010000
>> ------------------------------------------------------------
>> sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8
>> sys_perf_event_open failed, error -22
>> switching off deferred callchain support
>> Warning:
>> ocr.full_streaming_wr.any_response event is not supported by the
>> kernel.
>> The sys_perf_event_open() syscall failed for event
>> (ocr.full_streaming_wr.any_response): Invalid argument
>> "dmesg | grep -i perf" may provide additional information.
>>
>> Error:
>> No supported events found.
>> The sys_perf_event_open() syscall failed for event
>> (ocr.full_streaming_wr.any_response): Invalid argument
>> "dmesg | grep -i perf" may provide additional information.
>> ```
>>
>> This looks like a latent Intel cpu_atom PMU bug. Thomas, wdyt?
Hmm, it looks the error is caused by the invalid bitmask of OFFCORE_RSP_x
MSRs. Currently the valid bitmask of OFFCORE_RSP_x MSR is set to
0x3fffffffff in intel_grt_extra_regs[], while the msr value is set
0x800000010000 for the ocr.full_streaming_wr.any_response event. The bit 47
is recognized an invalid bit and then abort the event creation.
Base on the description "Table 21-56. MSR_OFFCORE_RSPx Request Type
Definition" in SDM, bit 47 should be a valid bit now. Suppose bit 47 should
not be a valid bit when adding the ADL PMU support, but it's updated and
becomes valid later.
Along with the constant updates of perf event lists
(https://github.com/intel/perfmon), we have noticed there are mismatches
more or less between the driver hardcoded events and perfmon event list.
Currently we are summarizing the mismatches. Once these mismatches are
finalized. we would submit a patchset to fix these mismatches.
Thanks.
> +Dapeng, Zide, Andi
>
> Thanks,
> Tom
>
>> Thanks,
>> Ian
>>
>>> Regards,
>>> Venkat.
>>>
>>>
>>>
>>>> tools/perf/tests/shell/stat_all_pmu.sh | 20 ++++++++++++++++++++
>>>> 1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/tools/perf/tests/shell/stat_all_pmu.sh
>>>> b/tools/perf/tests/shell/stat_all_pmu.sh
>>>> index 9c466c0efa85..6c4d59cbfa5f 100755
>>>> --- a/tools/perf/tests/shell/stat_all_pmu.sh
>>>> +++ b/tools/perf/tests/shell/stat_all_pmu.sh
>>>> @@ -53,6 +53,26 @@ do
>>>> continue
>>>> fi
>>>>
>>>> + # check with system wide if it is supported.
>>>> + output=$(perf stat -a -e "$p" true 2>&1)
>>>> + stat_result=$?
>>>> + if echo "$output" | grep -q "not supported"
>>>> + then
>>>> + # Event not supported, so ignore.
>>>> + echo "not supported"
>>>> + continue
>>>> + fi
>>>> +
>>>> + # checked through possible access limitations and permissions.
>>>> + # At this step, non-zero return code from "perf stat" needs to
>>>> + # reported as fail for the user to investigate
>>>> + if [ $stat_result -ne 0 ]
>>>> + then
>>>> + echo "perf stat failed with non-zero return code"
>>>> + err=1
>>>> + continue
>>>> + fi
>>>> +
>>>> # We failed to see the event and it is supported. Possibly the
>>>> workload was
>>>> # too small so retry with something longer.
>>>> output=$(perf stat -e "$p" perf bench internals synthesize
>>>> 2>&1)
>>>> --
>>>> 2.47.3
>>>>
More information about the Linuxppc-dev
mailing list