[PATCH] tools/perf/test: Check for perf stat return code in perf all PMU test
Mi, Dapeng
dapeng1.mi at linux.intel.com
Tue Apr 7 10:48:41 AEST 2026
On 4/3/2026 11:39 PM, Ian Rogers wrote:
> On Fri, Apr 3, 2026 at 12:36 AM Mi, Dapeng <dapeng1.mi at linux.intel.com> wrote:
>>
>> On 4/3/2026 1:32 AM, Falcon, Thomas wrote:
>>> On Wed, 2026-04-01 at 13:40 -0700, Ian Rogers wrote:
>>>> On Mon, Mar 23, 2026 at 3:40 AM Venkat <venkat88 at linux.ibm.com>
>>>> wrote:
>>>>>
>>>>>> On 15 Mar 2026, at 4:27 PM, Athira Rajeev
>>>>>> <atrajeev at linux.ibm.com> wrote:
>>>>>>
>>>>>> Currently in "perf all PMU test", for "perf stat -e <event>
>>>>>> true",
>>>>>> below checks are done:
>>>>>> - if return code is zero, look for "not supported" to decide pass
>>>>>> scenario
>>>>>> - check for "not supported" to ignore the event
>>>>>> - looks for "No permission to enable" to skip the event.
>>>>>> - If output has "Bad event name", fail the test.
>>>>>> - Use "Access to performance monitoring and observability
>>>>>> operations is
>>>>>> limited." to ignore fail due to access limitations
>>>>>>
>>>>>> If we failed to see event and it is supported, retries with
>>>>>> longer
>>>>>> workload "perf bench internals synthesize".
>>>>>> - Here if output has <event>, the test is a pass.
>>>>>>
>>>>>> Snippet of code check:
>>>>>> ```
>>>>>> output=$(perf stat -e "$p" perf bench internals synthesize 2>&1)
>>>>>> if echo "$output" | grep -q "$p"
>>>>>> ```
>>>>>> - if output doesn't have event printed in logs, considers it
>>>>>> fail.
>>>>>>
>>>>>> But this results in false pass for events in some cases.
>>>>>> Example, if perf stat fails as below:
>>>>>>
>>>>>> # ./perf stat -e pmu/event/ true
>>>>>> event syntax error: 'pmu/event/'
>>>>>> \___ Bad event or PMU
>>>>>>
>>>>>> Unable to find PMU or event on a PMU of 'pmu'
>>>>>> Run 'perf list' for a list of valid events
>>>>>>
>>>>>> Usage: perf stat [<options>] [<command>]
>>>>>>
>>>>>> -e, --event <event> event selector. use 'perf list' to list
>>>>>> available events
>>>>>> # echo $?
>>>>>> 129
>>>>>>
>>>>>> Since this has non-zero return code and doesn't have the
>>>>>> fail strings being checked in the test, it will enter check using
>>>>>> longer workload. and since the output fail log has event, it
>>>>>> declares test as "supported".
>>>>>>
>>>>>> Since all the fail strings can't be added in the check, update
>>>>>> the testcase to check return code before proceeding to longer
>>>>>> workload run.
>>>>>>
>>>>>> Another missing scenario is when system wide monitoring is
>>>>>> supported
>>>>>> example:
>>>>>> # ./perf stat -e pmu/event/ true
>>>>>> Error:
>>>>>> No supported events found.
>>>>>> Unsupported event (pmu/event/H) in per-thread mode, enable
>>>>>> system wide with '-a'.
>>>>>>
>>>>>> Update testcase to check with "perf stat -a -e $p" as well
>>>>>>
>>>>>> Signed-off-by: Athira Rajeev <atrajeev at linux.ibm.com>
>>>>>> ---
>>>>> Tested this patch.
>>>>>
>>>>>
>>>>> With this patch:
>>>>>
>>>>> Testing hv_24x7/CPM_ADJUNCT_INST/ -- perf stat failed with non-zero
>>>>> return code
>>>>> Testing hv_24x7/CPM_ADJUNCT_PCYC/ -- perf stat failed with non-zero
>>>>> return code
>>>>>
>>>>>
>>>>>
>>>>> Tested-by: Venkat Rao Bagalkote <venkat88 at linux.ibm.com>
>>>> Testing on an Intel Alderlake the test is now failing:
>>>> ```
>>>> ...
>>>> Testing offcore_requests_outstanding.l3_miss_demand_data_rd --
>>>> supported
>>>> Testing ocr.full_streaming_wr.any_response -- perf stat failed with
>>>> non-zero return code
>>>> Testing ocr.partial_streaming_wr.any_response -- perf stat failed
>>>> with
>>>> non-zero return code
>>>> Testing ocr.streaming_wr.any_response -- supported
>>>> ...
>>>> ```
>>>>
>>>> Running `perf stat` manually reveals an issue with the event:
>>>> ```
>>>> $ sudo perf stat -vv -e ocr.full_streaming_wr.any_response -a sleep
>>>> 1
>>>> Using CPUID GenuineIntel-6-B7-1
>>>> Attempt to add: cpu_atom/ocr.full_streaming_wr.any_response/
>>>> ..after resolving event:
>>>> cpu_atom/event=0xb7,period=0x186a3,umask=0x1,offcore_rsp=0x8000000100
>>>> 00/
>>>> ocr.full_streaming_wr.any_response ->
>>>> cpu_atom/ocr.full_streaming_wr.any_response/
>>>> Control descriptor is not initialized
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>> type 10 (cpu_atom)
>>>> size 144
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>> type 0 (PERF_TYPE_HARDWARE)
>>>> config 0xa00000000
>>>> (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
>>>> disabled 1
>>>> ------------------------------------------------------------
>>>> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
>>>> ------------------------------------------------------------
>>>> perf_event_attr:
>>>> type 0 (PERF_TYPE_HARDWARE)
>>>> config 0x400000000
>>>> (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
>>>> disabled 1
>>>> ------------------------------------------------------------
>>>> sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
>>>> config 0x1b7
>>>> (ocr.demand_data_rd.l3_hit.snoop_hit_no_fwd)
>>>> sample_type IDENTIFIER
>>>> read_format
>>>> TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>>>> disabled 1
>>>> inherit 1
>>>> { bp_addr, config1 } 0x800000010000
>>>> ------------------------------------------------------------
>>>> sys_perf_event_open: pid -1 cpu 16 group_fd -1 flags 0x8
>>>> sys_perf_event_open failed, error -22
>>>> switching off deferred callchain support
>>>> Warning:
>>>> ocr.full_streaming_wr.any_response event is not supported by the
>>>> kernel.
>>>> The sys_perf_event_open() syscall failed for event
>>>> (ocr.full_streaming_wr.any_response): Invalid argument
>>>> "dmesg | grep -i perf" may provide additional information.
>>>>
>>>> Error:
>>>> No supported events found.
>>>> The sys_perf_event_open() syscall failed for event
>>>> (ocr.full_streaming_wr.any_response): Invalid argument
>>>> "dmesg | grep -i perf" may provide additional information.
>>>> ```
>>>>
>>>> This looks like a latent Intel cpu_atom PMU bug. Thomas, wdyt?
>> Hmm, it looks the error is caused by the invalid bitmask of OFFCORE_RSP_x
>> MSRs. Currently the valid bitmask of OFFCORE_RSP_x MSR is set to
>> 0x3fffffffff in intel_grt_extra_regs[], while the msr value is set
>> 0x800000010000 for the ocr.full_streaming_wr.any_response event. The bit 47
>> is recognized an invalid bit and then abort the event creation.
>>
>> Base on the description "Table 21-56. MSR_OFFCORE_RSPx Request Type
>> Definition" in SDM, bit 47 should be a valid bit now. Suppose bit 47 should
>> not be a valid bit when adding the ADL PMU support, but it's updated and
>> becomes valid later.
>>
>> Along with the constant updates of perf event lists
>> (https://github.com/intel/perfmon), we have noticed there are mismatches
>> more or less between the driver hardcoded events and perfmon event list.
>> Currently we are summarizing the mismatches. Once these mismatches are
>> finalized. we would submit a patchset to fix these mismatches.
> That's great, if it takes too long perhaps we could just remove the
> events for now.
Suppose it won't be too long. I plan to post the patchset in next release
cycle. The code changes are simple but need much time to verify on all
kinds of platforms. Thanks.
>
> Thanks,
> Ian
>
>> Thanks.
>>
>>> +Dapeng, Zide, Andi
>>>
>>> Thanks,
>>> Tom
>>>
>>>> Thanks,
>>>> Ian
>>>>
>>>>> Regards,
>>>>> Venkat.
>>>>>
>>>>>
>>>>>
>>>>>> tools/perf/tests/shell/stat_all_pmu.sh | 20 ++++++++++++++++++++
>>>>>> 1 file changed, 20 insertions(+)
>>>>>>
>>>>>> diff --git a/tools/perf/tests/shell/stat_all_pmu.sh
>>>>>> b/tools/perf/tests/shell/stat_all_pmu.sh
>>>>>> index 9c466c0efa85..6c4d59cbfa5f 100755
>>>>>> --- a/tools/perf/tests/shell/stat_all_pmu.sh
>>>>>> +++ b/tools/perf/tests/shell/stat_all_pmu.sh
>>>>>> @@ -53,6 +53,26 @@ do
>>>>>> continue
>>>>>> fi
>>>>>>
>>>>>> + # check with system wide if it is supported.
>>>>>> + output=$(perf stat -a -e "$p" true 2>&1)
>>>>>> + stat_result=$?
>>>>>> + if echo "$output" | grep -q "not supported"
>>>>>> + then
>>>>>> + # Event not supported, so ignore.
>>>>>> + echo "not supported"
>>>>>> + continue
>>>>>> + fi
>>>>>> +
>>>>>> + # checked through possible access limitations and permissions.
>>>>>> + # At this step, non-zero return code from "perf stat" needs to
>>>>>> + # reported as fail for the user to investigate
>>>>>> + if [ $stat_result -ne 0 ]
>>>>>> + then
>>>>>> + echo "perf stat failed with non-zero return code"
>>>>>> + err=1
>>>>>> + continue
>>>>>> + fi
>>>>>> +
>>>>>> # We failed to see the event and it is supported. Possibly the
>>>>>> workload was
>>>>>> # too small so retry with something longer.
>>>>>> output=$(perf stat -e "$p" perf bench internals synthesize
>>>>>> 2>&1)
>>>>>> --
>>>>>> 2.47.3
>>>>>>
More information about the Linuxppc-dev
mailing list