[PATCH] powerpc/imc: Add documentation for IMC and trace-mode
Anju T Sudhakar
anju at linux.vnet.ibm.com
Sat May 11 00:17:52 AEST 2019
Documentation for IMC(In-Memory Collection Counters) infrastructure
and trace-mode of IMC.
Signed-off-by: Anju T Sudhakar <anju at linux.vnet.ibm.com>
---
Documentation/powerpc/imc.txt | 195 ++++++++++++++++++++++++++++++++++
1 file changed, 195 insertions(+)
create mode 100644 Documentation/powerpc/imc.txt
diff --git a/Documentation/powerpc/imc.txt b/Documentation/powerpc/imc.txt
new file mode 100644
index 000000000000..9c32e059f3be
--- /dev/null
+++ b/Documentation/powerpc/imc.txt
@@ -0,0 +1,195 @@
+ ===================================
+ IMC (In-Memory Collection Counters)
+ ===================================
+ Date created: 10 May 2019
+
+Table of Contents:
+------------------
+ - Basic overview
+ - IMC example Usage
+ - IMC Trace Mode
+ - LDBAR Register Layout
+ - TRACE_IMC_SCOM bit representation
+ - Trace IMC example usage
+ - Benefits of using IMC trace-mode
+
+
+Basic overview
+==============
+
+IMC (In-Memory collection counters) is a hardware monitoring facility
+that collects large number of hardware performance events at Nest level
+(these are on-chip but off-core), Core level and Thread level.
+
+The Nest PMU counters are handled by a Nest IMC microcode which runs
+in the OCC (On-Chip Controller) complex. The microcode collects the
+counter data and moves the nest IMC counter data to memory.
+
+The Core and Thread IMC PMU counters are handled in the core. Core
+level PMU counters give us the IMC counters' data per core and thread
+level PMU counters give us the IMC counters' data per CPU thread.
+
+OPAL obtains the IMC PMU and supported events information from the
+IMC Catalog and passes on to the kernel via the device tree. The event's
+information contains :
+ - Event name
+ - Event Offset
+ - Event description
+and, maybe :
+ - Event scale
+ - Event unit
+
+Some PMUs may have a common scale and unit values for all their
+supported events. For those cases, the scale and unit properties for
+those events must be inherited from the PMU.
+
+The event offset in the memory is where the counter data gets
+accumulated.
+
+IMC catalog is available at:
+ https://github.com/open-power/ima-catalog
+
+The kernel discovers the IMC counters information in the device tree
+at the "imc-counters" device node which has a compatible field
+"ibm,opal-in-memory-counters". From the device tree, the kernel parses
+the PMUs and their event's information and register the PMU and it
+attributes in the kernel.
+
+IMC example usage
+=================
+
+# perf list
+
+ [...]
+ nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/ [Kernel PMU event]
+ nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/ [Kernel PMU event]
+
+ [...]
+ core_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event]
+ core_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event]
+
+ [...]
+ thread_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event]
+ thread_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event]
+
+To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/ :
+ # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket
+
+To see non-idle instructions for core 0 :
+ # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000
+
+To see non-idle instructions for a "make" :
+ # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make
+
+
+IMC Trace-mode
+===============
+
+POWER9 support two modes for IMC which are the Accumulation mode and
+Trace mode. In Accumulation mode, event counts are accumulated in system
+Memory. Hypervisor then reads the posted counts periodically or when
+requested. In IMC Trace mode, the 64 bit trace scom value is initialized
+with the event information. The CPMC*SEL and CPMC_LOAD in the trace scom,
+specifies the event to be monitored and the sampling duration. On each
+overflow in the CPMC*SEL, hardware snapshots the program counter along
+with event counts and writes into memory pointed by LDBAR.
+
+LDBAR is a 64 bit special purpose per thread register, it has bits to
+indicate whether hardware is configured for accumulation or trace mode.
+
+* LDBAR Register Layout:
+ 0 : Enable/Disable
+ 1 : 0 -> Accumulation Mode
+ 1 -> Trace Mode
+ 2:3 : Reserved
+ 4-6 : PB scope
+ 7 : Reserved
+ 8:50 : Counter Address
+ 51:63 : Reserved
+
+* TRACE_IMC_SCOM bit representation:
+
+ 0:1 : SAMPSEL
+ 2:33 : CPMC_LOAD
+ 34:40 : CPMC1SEL
+ 41:47 : CPMC2SEL
+ 48:50 : BUFFERSIZE
+ 51:63 : RESERVED
+
+CPMC_LOAD contains the sampling duration. SAMPSEL and CPMC*SEL determines
+the event to count. BUFFRSIZE indicates the memory range. On each overflow,
+hardware snapshots program counter along with event counts and update the
+memory and reloads the CMPC_LOAD value for the next sampling duration.
+IMC hardware does not support exceptions, so it quietly wraps around if
+memory buffer reaches the end.
+
+*Currently the event monitored for trace-mode is fixed as cycle.
+
+Trace IMC example usage
+=======================
+
+# perf list
+
+ [....]
+ trace_imc/trace_cycles/ [Kernel PMU event]
+
+To record an application/process with trace-imc event
+# perf record -e trace_imc/trace_cycles/ yes > /dev/nul
+[ perf record: Woken up 1 times to write data ]
+[ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ]
+
+The perf.data generated, can be read using perf report.
+
+Benefits of using IMC trace-mode
+================================
+
+PMI interrupt handling is avoided, since IMC trace mode snapshots the
+program counter and update to the memory. And this also provide a way for
+the operating system to do instruction sampling in real time without
+PMI(Performance Monitoring Interrupts) processing overhead.
+Example:-
+
+Performance data using 'perf top' with and without trace-imc event:
+
+PMI interrupts count when `perf top` command is executed without trace-imc event.
+
+# cat /proc/interrupts (a snippet from the output)
+9944 1072 804 804 1644 804 1306
+804 804 804 804 804 804 804
+804 804 1961 1602 804 804 1258
+[-----------------------------------------------------------------]
+803 803 803 803 803 803 803
+803 803 803 803 804 804 804
+804 804 804 804 804 804 803
+803 803 803 803 803 1306 803
+803 Performance monitoring interrupts
+
+
+`perf top` with trace-imc (executed right after 'perf top' without trace-imc event):
+
+# perf top -e trace_imc/trace_cycles/
+12.50% [kernel] [k] arch_cpu_idle
+11.81% [kernel] [k] __next_timer_interrupt
+11.22% [kernel] [k] rcu_idle_enter
+10.25% [kernel] [k] find_next_bit
+ 7.91% [kernel] [k] do_idle
+ 7.69% [kernel] [k] rcu_dynticks_eqs_exit
+ 5.20% [kernel] [k] tick_nohz_idle_stop_tick
+ [-----------------------]
+
+# cat /proc/interrupts (a snippet from the output)
+
+9944 1072 804 804 1644 804 1306
+804 804 804 804 804 804 804
+804 804 1961 1602 804 804 1258
+[-----------------------------------------------------------------]
+803 803 803 803 803 803 803
+803 803 803 804 804 804 804
+804 804 804 804 804 804 803
+803 803 803 803 803 1306 803
+803 Performance monitoring interrupts
+
+The PMI interrupts count remains the same.
+
+----------------------------------------------------------
+Author(s) : Anju T Sudhakar <anju at linux.vnet.ibm.com>
--
2.17.2
More information about the Linuxppc-dev
mailing list