<html><head></head><body dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="ApplePlainTextBody"><div class="ApplePlainTextBody"><br><br><blockquote type="cite">On 20-Jan-2021, at 8:39 AM, Nicholas Piggin <npiggin@gmail.com> wrote:<br><br>Excerpts from Athira Rajeev's message of January 19, 2021 8:24 pm:<br><blockquote type="cite"><br><br><blockquote type="cite">On 15-Jan-2021, at 10:19 PM, Nicholas Piggin <npiggin@gmail.com> wrote:<br><br>This is required in order to allow more significant differences between<br>NMI type interrupt handlers and regular asynchronous handlers.<br><br>Signed-off-by: Nicholas Piggin <npiggin@gmail.com><br>---<br>arch/powerpc/kernel/traps.c | 31 +++++++++++++++++++++++++++-<br>arch/powerpc/perf/core-book3s.c | 35 ++------------------------------<br>arch/powerpc/perf/core-fsl-emb.c | 25 -----------------------<br>3 files changed, 32 insertions(+), 59 deletions(-)<br></blockquote><br>Hi Nick,<br><br>Reviewed this perf patch which moves the nmi_enter/irq_enter to traps.c and code-wise changes<br>for perf looks fine to me. Further, I was trying to test this by picking the whole Patch series on top<br>of 5.11.0-rc3 kernel and using below test scenario:<br><br>Intention of testcase is to check whether the perf nmi and asynchronous interrupts are getting<br>captured as expected. My test kernel module below tries to create one of performance monitor<br>counter ( PMC6 ) overflow between local_irq_save/local_irq_restore.<br>[ Here interrupts are disabled and has IRQS_DISABLED as regs->softe ].<br>I am expecting the PMI to come as an NMI in this case. I am also configuring ftrace so that I<br>can confirm whether it comes as an NMI or a replayed interrupt from the trace.<br><br>Environment :One CPU online<br>prerequisite for ftrace:<br># cd /sys/kernel/debug/tracing<br># echo 100 > buffer_percent<br># echo 200000 > buffer_size_kb <br># echo ppc-tb > trace_clock<br># echo function > current_tracer<br><br>Part of sample kernel test module to trigger a PMI between <br>local_irq_save and local_irq_restore:<br><br><<>><br>static ulong delay = 1;<br>static void busy_wait(ulong time)<br>{<br> udelay(delay);<br>}<br>static __always_inline void irq_test(void)<br>{<br> unsigned long flags;<br> local_irq_save(flags);<br> trace_printk("IN IRQ TEST\n");<br> mtspr(SPRN_MMCR0, 0x80000000);<br> mtspr(SPRN_PMC6, 0x80000000 - 100);<br> mtspr(SPRN_MMCR0, 0x6004000);<br> busy_wait(delay);<br> trace_printk("IN IRQ TEST DONE\n");<br> local_irq_restore(flags);<br> mtspr(SPRN_MMCR0, 0x80000000);<br> mtspr(SPRN_PMC6, 0);<br>}<br><<>><br><br>But this resulted in soft lockup, Adding a snippet of call-trace below:<br></blockquote><br>I'm not getting problems with your test case, but I am testing in a VM <br>so may not be getting device interrupts so much (your 0xea0 interrupt).<br>I'll try test on bare metal next. Does it reproduce easily, and <br>unpatched kernel definitely does not have the problem?<br><br>A different issue, after my series, I don't see the perf "NMI" interrupt <br>in any traces under local_irq_disable, because it's disabling ftrace the<br>same as the other NMI interrupts, so your test wouldn't see them.<br><br>I don't know if this is exactly right. Can tracing cope with such NMIs<br>okay even if it's interrupted in the middle of the tracing code? Machine<br>check at least has to disable tracing because it's in real-mode, machine<br>check and sreset also want to disable tracing because something is going<br>wrong and we don't want to make it worse (e.g., to get a cleaner crash).<br>Should we still permit tracing of perf NMIs?<br></blockquote><br><br>Hi Nick,<br><br>Having tracing of perf NMI's enabled is actually helpful for debugging PMU issues. <br>Especially since for perf, we decide at runtime whether PMI is going to be delivered<br>as an NMI or an asynchronous interrupt. So having the PMI captured in trace will be good.<br><br>Also while debugging interrupt/overflow issues captured with testsuites like perf fuzzer, <br>where lot of test combinations are run, having the PMI's ( nmi and async ) in traces will<br>help in debug which otherwise will need to be analysed by adding printk's etc.<br><br>Thanks<br>Athira<br><br><blockquote type="cite"><br><blockquote type="cite"><br>[ 883.900762] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]<br>[ 883.901381] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE 5.11.0-rc3+ #34<br>--<br>[ 883.901999] NIP [c0000000000168d0] replay_soft_interrupts+0x70/0x2f0<br>[ 883.902032] LR [c00000000003b2b8] interrupt_exit_kernel_prepare+0x1e8/0x240<br>[ 883.902063] Call Trace:<br>[ 883.902085] [c000000001c96f50] [c00000000003b2b8] interrupt_exit_kernel_prepare+0x1e8/0x240 (unreliable)<br>[ 883.902139] [c000000001c96fb0] [c00000000000fd88] interrupt_return+0x158/0x200<br>[ 883.902185] --- interrupt: ea0 at __rb_reserve_next+0xc0/0x5b0<br>[ 883.902224] NIP: c0000000002d8980 LR: c0000000002d897c CTR: c0000000001aad90<br>[ 883.902262] REGS: c000000001c97020 TRAP: 0ea0 Tainted: G OE (5.11.0-rc3+)<br>[ 883.902301] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28000484 XER: 20040000<br>[ 883.902387] CFAR: c00000000000fe00 IRQMASK: 0 <br>--<br>[ 883.902757] NIP [c0000000002d8980] __rb_reserve_next+0xc0/0x5b0<br>[ 883.902786] LR [c0000000002d897c] __rb_reserve_next+0xbc/0x5b0<br>[ 883.902824] --- interrupt: ea0<br>[ 883.902848] [c000000001c97360] [c0000000002d8fcc] ring_buffer_lock_reserve+0x15c/0x580<br>[ 883.902894] [c000000001c973f0] [c0000000002e82fc] trace_function+0x4c/0x1c0<br>[ 883.902930] [c000000001c97440] [c0000000002f6f50] function_trace_call+0x140/0x190<br>[ 883.902976] [c000000001c97470] [c00000000007d6f8] ftrace_call+0x4/0x44<br>[ 883.903021] [c000000001c97660] [c000000000dcf70c] __do_softirq+0x15c/0x3d4<br>[ 883.903066] [c000000001c97750] [c00000000015fc68] irq_exit+0x198/0x1b0<br>[ 883.903102] [c000000001c97780] [c000000000dc1790] timer_interrupt+0x170/0x3b0<br>[ 883.903148] [c000000001c977e0] [c000000000016994] replay_soft_interrupts+0x134/0x2f0<br>[ 883.903193] [c000000001c979d0] [c00000000003b2b8] interrupt_exit_kernel_prepare+0x1e8/0x240<br>[ 883.903240] [c000000001c97a30] [c00000000000fd88] interrupt_return+0x158/0x200<br>[ 883.903276] --- interrupt: ea0 at arch_local_irq_restore+0x70/0xc0<br></blockquote><br>You got a 0xea0 interrupt in the ftrace code. I wonder where it is <br>looping. Do you see more soft lockup messages?<br><br>Thanks,<br>Nick<br><br><br><blockquote type="cite"><br>Thanks<br>Athira<br><blockquote type="cite"><br>diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c<br>index 738370519937..bd55f201115b 100644<br>--- a/arch/powerpc/kernel/traps.c<br>+++ b/arch/powerpc/kernel/traps.c<br>@@ -1892,11 +1892,40 @@ void vsx_unavailable_tm(struct pt_regs *regs)<br>}<br>#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */<br><br>-void performance_monitor_exception(struct pt_regs *regs)<br>+static void performance_monitor_exception_nmi(struct pt_regs *regs)<br>+{<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>nmi_enter();<br>+<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>__this_cpu_inc(irq_stat.pmu_irqs);<br>+<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>perf_irq(regs);<br>+<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>nmi_exit();<br>+}<br>+<br>+static void performance_monitor_exception_async(struct pt_regs *regs)<br>{<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>irq_enter();<br>+<br><span class="Apple-tab-span" style="white-space:pre"> </span>__this_cpu_inc(irq_stat.pmu_irqs);<br><br><span class="Apple-tab-span" style="white-space:pre"> </span>perf_irq(regs);<br>+<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>irq_exit();<br>+}<br>+<br>+void performance_monitor_exception(struct pt_regs *regs)<br>+{<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>/*<br>+<span class="Apple-tab-span" style="white-space:pre"> </span> * On 64-bit, if perf interrupts hit in a local_irq_disable<br>+<span class="Apple-tab-span" style="white-space:pre"> </span> * (soft-masked) region, we consider them as NMIs. This is required to<br>+<span class="Apple-tab-span" style="white-space:pre"> </span> * prevent hash faults on user addresses when reading callchains (and<br>+<span class="Apple-tab-span" style="white-space:pre"> </span> * looks better from an irq tracing perspective).<br>+<span class="Apple-tab-span" style="white-space:pre"> </span> */<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>if (IS_ENABLED(CONFIG_PPC64) && unlikely(arch_irq_disabled_regs(regs)))<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>performance_monitor_exception_nmi(regs);<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>else<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>performance_monitor_exception_async(regs);<br>}<br><br>#ifdef CONFIG_PPC_ADV_DEBUG_REGS<br>diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c<br>index 28206b1fe172..9fd06010e8b6 100644<br>--- a/arch/powerpc/perf/core-book3s.c<br>+++ b/arch/powerpc/perf/core-book3s.c<br>@@ -110,10 +110,6 @@ static inline void perf_read_regs(struct pt_regs *regs)<br>{<br><span class="Apple-tab-span" style="white-space:pre"> </span>regs->result = 0;<br>}<br>-static inline int perf_intr_is_nmi(struct pt_regs *regs)<br>-{<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>return 0;<br>-}<br><br>static inline int siar_valid(struct pt_regs *regs)<br>{<br>@@ -353,15 +349,6 @@ static inline void perf_read_regs(struct pt_regs *regs)<br><span class="Apple-tab-span" style="white-space:pre"> </span>regs->result = use_siar;<br>}<br><br>-/*<br>- * If interrupts were soft-disabled when a PMU interrupt occurs, treat<br>- * it as an NMI.<br>- */<br>-static inline int perf_intr_is_nmi(struct pt_regs *regs)<br>-{<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>return (regs->softe & IRQS_DISABLED);<br>-}<br>-<br>/*<br>* On processors like P7+ that have the SIAR-Valid bit, marked instructions<br>* must be sampled only if the SIAR-valid bit is set.<br>@@ -2279,7 +2266,6 @@ static void __perf_event_interrupt(struct pt_regs *regs)<br><span class="Apple-tab-span" style="white-space:pre"> </span>struct perf_event *event;<br><span class="Apple-tab-span" style="white-space:pre"> </span>unsigned long val[8];<br><span class="Apple-tab-span" style="white-space:pre"> </span>int found, active;<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>int nmi;<br><br><span class="Apple-tab-span" style="white-space:pre"> </span>if (cpuhw->n_limited)<br><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>freeze_limited_counters(cpuhw, mfspr(SPRN_PMC5),<br>@@ -2287,18 +2273,6 @@ static void __perf_event_interrupt(struct pt_regs *regs)<br><br><span class="Apple-tab-span" style="white-space:pre"> </span>perf_read_regs(regs);<br><br>-<span class="Apple-tab-span" style="white-space:pre"> </span>/*<br>-<span class="Apple-tab-span" style="white-space:pre"> </span> * If perf interrupts hit in a local_irq_disable (soft-masked) region,<br>-<span class="Apple-tab-span" style="white-space:pre"> </span> * we consider them as NMIs. This is required to prevent hash faults on<br>-<span class="Apple-tab-span" style="white-space:pre"> </span> * user addresses when reading callchains. See the NMI test in<br>-<span class="Apple-tab-span" style="white-space:pre"> </span> * do_hash_page.<br>-<span class="Apple-tab-span" style="white-space:pre"> </span> */<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>nmi = perf_intr_is_nmi(regs);<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>if (nmi)<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>nmi_enter();<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>else<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>irq_enter();<br>-<br><span class="Apple-tab-span" style="white-space:pre"> </span>/* Read all the PMCs since we'll need them a bunch of times */<br><span class="Apple-tab-span" style="white-space:pre"> </span>for (i = 0; i < ppmu->n_counter; ++i)<br><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>val[i] = read_pmc(i + 1);<br>@@ -2344,8 +2318,8 @@ static void __perf_event_interrupt(struct pt_regs *regs)<br><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>}<br><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>}<br><span class="Apple-tab-span" style="white-space:pre"> </span>}<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>if (!found && !nmi && printk_ratelimit())<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>printk(KERN_WARNING "Can't find PMC that caused IRQ\n");<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>if (unlikely(!found) && !arch_irq_disabled_regs(regs))<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>printk_ratelimited(KERN_WARNING "Can't find PMC that caused IRQ\n");<br><br><span class="Apple-tab-span" style="white-space:pre"> </span>/*<br><span class="Apple-tab-span" style="white-space:pre"> </span> * Reset MMCR0 to its normal value. This will set PMXE and<br>@@ -2355,11 +2329,6 @@ static void __perf_event_interrupt(struct pt_regs *regs)<br><span class="Apple-tab-span" style="white-space:pre"> </span> * we get back out of this interrupt.<br><span class="Apple-tab-span" style="white-space:pre"> </span> */<br><span class="Apple-tab-span" style="white-space:pre"> </span>write_mmcr0(cpuhw, cpuhw->mmcr.mmcr0);<br>-<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>if (nmi)<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>nmi_exit();<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>else<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>irq_exit();<br>}<br><br>static void perf_event_interrupt(struct pt_regs *regs)<br>diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c<br>index e0e7e276bfd2..ee721f420a7b 100644<br>--- a/arch/powerpc/perf/core-fsl-emb.c<br>+++ b/arch/powerpc/perf/core-fsl-emb.c<br>@@ -31,19 +31,6 @@ static atomic_t num_events;<br>/* Used to avoid races in calling reserve/release_pmc_hardware */<br>static DEFINE_MUTEX(pmc_reserve_mutex);<br><br>-/*<br>- * If interrupts were soft-disabled when a PMU interrupt occurs, treat<br>- * it as an NMI.<br>- */<br>-static inline int perf_intr_is_nmi(struct pt_regs *regs)<br>-{<br>-#ifdef __powerpc64__<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>return (regs->softe & IRQS_DISABLED);<br>-#else<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>return 0;<br>-#endif<br>-}<br>-<br>static void perf_event_interrupt(struct pt_regs *regs);<br><br>/*<br>@@ -659,13 +646,6 @@ static void perf_event_interrupt(struct pt_regs *regs)<br><span class="Apple-tab-span" style="white-space:pre"> </span>struct perf_event *event;<br><span class="Apple-tab-span" style="white-space:pre"> </span>unsigned long val;<br><span class="Apple-tab-span" style="white-space:pre"> </span>int found = 0;<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>int nmi;<br>-<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>nmi = perf_intr_is_nmi(regs);<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>if (nmi)<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>nmi_enter();<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>else<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>irq_enter();<br><br><span class="Apple-tab-span" style="white-space:pre"> </span>for (i = 0; i < ppmu->n_counter; ++i) {<br><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>event = cpuhw->event[i];<br>@@ -690,11 +670,6 @@ static void perf_event_interrupt(struct pt_regs *regs)<br><span class="Apple-tab-span" style="white-space:pre"> </span>mtmsr(mfmsr() | MSR_PMM);<br><span class="Apple-tab-span" style="white-space:pre"> </span>mtpmr(PMRN_PMGC0, PMGC0_PMIE | PMGC0_FCECE);<br><span class="Apple-tab-span" style="white-space:pre"> </span>isync();<br>-<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>if (nmi)<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>nmi_exit();<br>-<span class="Apple-tab-span" style="white-space:pre"> </span>else<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>irq_exit();<br>}<br><br>void hw_perf_event_setup(int cpu)<br>-- <br>2.23.0<br></blockquote></blockquote></blockquote><br></div></body></html>