[PATCH 1/3] perf/core: Flush PMU internal buffers for per-CPU events

Liang, Kan kan.liang at linux.intel.com
Wed Nov 25 03:04:31 AEDT 2020



On 11/24/2020 12:42 AM, Madhavan Srinivasan wrote:
> 
> On 11/24/20 10:21 AM, Namhyung Kim wrote:
>> Hello,
>>
>> On Mon, Nov 23, 2020 at 8:00 PM Michael Ellerman <mpe at ellerman.id.au> 
>> wrote:
>>> Namhyung Kim <namhyung at kernel.org> writes:
>>>> Hi Peter and Kan,
>>>>
>>>> (Adding PPC folks)
>>>>
>>>> On Tue, Nov 17, 2020 at 2:01 PM Namhyung Kim <namhyung at kernel.org> 
>>>> wrote:
>>>>> Hello,
>>>>>
>>>>> On Thu, Nov 12, 2020 at 4:54 AM Liang, Kan 
>>>>> <kan.liang at linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 11/11/2020 11:25 AM, Peter Zijlstra wrote:
>>>>>>> On Mon, Nov 09, 2020 at 09:49:31AM -0500, Liang, Kan wrote:
>>>>>>>
>>>>>>>> - When the large PEBS was introduced (9c964efa4330), the 
>>>>>>>> sched_task() should
>>>>>>>> be invoked to flush the PEBS buffer in each context switch. 
>>>>>>>> However, The
>>>>>>>> perf_sched_events in account_event() is not updated accordingly. 
>>>>>>>> The
>>>>>>>> perf_event_task_sched_* never be invoked for a pure per-CPU 
>>>>>>>> context. Only
>>>>>>>> per-task event works.
>>>>>>>>      At that time, the perf_pmu_sched_task() is outside of
>>>>>>>> perf_event_context_sched_in/out. It means that perf has to double
>>>>>>>> perf_pmu_disable() for per-task event.
>>>>>>>> - The patch 1 tries to fix broken per-CPU events. The CPU 
>>>>>>>> context cannot be
>>>>>>>> retrieved from the task->perf_event_ctxp. So it has to be 
>>>>>>>> tracked in the
>>>>>>>> sched_cb_list. Yes, the code is very similar to the original 
>>>>>>>> codes, but it
>>>>>>>> is actually the new code for per-CPU events. The optimization 
>>>>>>>> for per-task
>>>>>>>> events is still kept.
>>>>>>>>     For the case, which has both a CPU context and a task 
>>>>>>>> context, yes, the
>>>>>>>> __perf_pmu_sched_task() in this patch is not invoked. Because the
>>>>>>>> sched_task() only need to be invoked once in a context switch. The
>>>>>>>> sched_task() will be eventually invoked in the task context.
>>>>>>> The thing is; your first two patches rely on PERF_ATTACH_SCHED_CB 
>>>>>>> and
>>>>>>> only set that for large pebs. Are you sure the other users (Intel 
>>>>>>> LBR
>>>>>>> and PowerPC BHRB) don't need it?
>>>>>> I didn't set it for LBR, because the perf_sched_events is always 
>>>>>> enabled
>>>>>> for LBR. But, yes, we should explicitly set the PERF_ATTACH_SCHED_CB
>>>>>> for LBR.
>>>>>>
>>>>>>          if (has_branch_stack(event))
>>>>>>                  inc = true;
>>>>>>
>>>>>>> If they indeed do not require the pmu::sched_task() callback for CPU
>>>>>>> events, then I still think the whole perf_sched_cb_{inc,dec}() 
>>>>>>> interface
>>>>>> No, LBR requires the pmu::sched_task() callback for CPU events.
>>>>>>
>>>>>> Now, The LBR registers have to be reset in sched in even for CPU 
>>>>>> events.
>>>>>>
>>>>>> To fix the shorter LBR callstack issue for CPU events, we also 
>>>>>> need to
>>>>>> save/restore LBRs in pmu::sched_task().
>>>>>> https://lore.kernel.org/lkml/1578495789-95006-4-git-send-email-kan.liang@linux.intel.com/ 
>>>>>>
>>>>>>
>>>>>>> is confusing at best.
>>>>>>>
>>>>>>> Can't we do something like this instead?
>>>>>>>
>>>>>> I think the below patch may have two issues.
>>>>>> - PERF_ATTACH_SCHED_CB is required for LBR (maybe PowerPC BHRB as 
>>>>>> well) now.
>>>>>> - We may disable the large PEBS later if not all PEBS events support
>>>>>> large PEBS. The PMU need a way to notify the generic code to decrease
>>>>>> the nr_sched_task.
>>>>> Any updates on this?  I've reviewed and tested Kan's patches
>>>>> and they all look good.
>>>>>
>>>>> Maybe we can talk to PPC folks to confirm the BHRB case?
>>>> Can we move this forward?  I saw patch 3/3 also adds 
>>>> PERF_ATTACH_SCHED_CB
>>>> for PowerPC too.  But it'd be nice if ppc folks can confirm the change.
>>> Sorry I've read the whole thread, but I'm still not entirely sure I
>>> understand the question.
>> Thanks for your time and sorry about not being clear enough.
>>
>> We found per-cpu events are not calling pmu::sched_task()
>> on context switches.  So PERF_ATTACH_SCHED_CB was
>> added to indicate the core logic that it needs to invoke the
>> callback.
>>
>> The patch 3/3 added the flag to PPC (for BHRB) with other
>> changes (I think it should be split like in the patch 2/3) and
>> want to get ACKs from the PPC folks.
> 
> Sorry for delay.
> 
> I guess first it will be better to split the ppc change to a separate 
> patch,

Both PPC and X86 invokes the perf_sched_cb_inc() directly. The patch 
changes the parameters of the perf_sched_cb_inc(). I think we have to 
update the PPC and X86 codes together. Otherwise, there will be a 
compile error, if someone may only applies the change for the 
perf_sched_cb_inc() but forget to applies the changes in PPC or X86 
specific codes.

> 
> secondly, we are missing the changes needed in the power_pmu_bhrb_disable()
> 
> where perf_sched_cb_dec() needs the "state" to be included.
> 

Ah, right. The below patch should fix the issue.

diff --git a/arch/powerpc/perf/core-book3s.c 
b/arch/powerpc/perf/core-book3s.c
index bced502f64a1..6756d1602a67 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -391,13 +391,18 @@ static void power_pmu_bhrb_enable(struct 
perf_event *event)
  static void power_pmu_bhrb_disable(struct perf_event *event)
  {
  	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+	int state = PERF_SCHED_CB_SW_IN;

  	if (!ppmu->bhrb_nr)
  		return;

  	WARN_ON_ONCE(!cpuhw->bhrb_users);
  	cpuhw->bhrb_users--;
-	perf_sched_cb_dec(event->ctx->pmu);
+
+	if (!(event->attach_state & PERF_ATTACH_TASK))
+		state |= PERF_SCHED_CB_CPU;
+
+	perf_sched_cb_dec(event->ctx->pmu, state);

  	if (!cpuhw->disabled && !cpuhw->bhrb_users) {
  		/* BHRB cannot be turned off when other



Thanks,
Kan


More information about the Linuxppc-dev mailing list