[PATCH] selftests/powerpc: New TM signal self test

Breno Leitao leitao at debian.org
Wed Dec 5 04:51:47 AEDT 2018


Hi Mikey,

On 11/29/18 12:11 AM, Michael Neuling wrote:
> On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote:
>> A new self test that forces MSR[TS] to be set without calling any TM
>> instruction. This test also tries to cause a page fault at a signal
>> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
>> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
>> when tm_recheckpoint() is called.
>>
>> This test is not deterministic since it is hard to guarantee that the page
>> access will cause a page fault. Tests have shown that the bug could be
>> exposed with few interactions in a buggy kernel. This test is configured to
>> loop 5000x, having a good chance to hit the kernel issue in just one run.
>> This self test takes less than two seconds to run.
> 
> You could try using sigaltstack() to put the ucontext somewhere else. Then you
> could play tricks with that memory to try to force a fault.
> madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick.

Yes, it sounded interesting and I implemented the test using madvice(). Thanks
for the suggestion!

The current approach didn't seem to improve the amount of page faults at it
seems that MADV_DONTNEED makes no difference when using a Lazy page loading.
This is the test I did, where 'original' is my current patch and 'madvice` is
the patch below:

  Performance counter stats for './original':

                 0      major-faults                                                
           125,100      minor-faults                                                

       2.575479619 seconds time elapsed


  Performance counter stats for './madvice':

                 0      major-faults                                                
           125,099      minor-faults         



Other than that, I didn't see any improvements in the reproduction rate also, although
it is a bit challenging to measure, since it crashes the machine and I can't run a
full statistical model.

This is the current patch I compared to the original one

---

commit 082a9fe29412943adfa2d6a363f44bac8e81d0ce
Author: Breno Leitao <leitao at debian.org>
Date:   Tue Nov 13 18:02:57 2018 -0500

    selftests/powerpc: New TM signal self test
    
    A new self test that forces MSR[TS] to be set without calling any TM
    instruction. This test also tries to cause a page fault at a signal
    handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
    thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
    when tm_recheckpoint() is called.
    
    This test is not deterministic, since it is hard to guarantee that the page
    access will cause a page fault. In order to force more page faults at
    signal context, the signal handler and the ucontext are being mapped into a
    MADV_DONTNEED memory chunks.
    
    Tests have shown that the bug could be exposed with few interactions in a
    buggy kernel. This test is configured to loop 5000x, having a good chance
    to hit the kernel issue in just one run.  This self test takes less than
    two seconds to run.
    
    This test uses set/getcontext because the kernel will recheckpoint
    zeroed structures, causing the test to segfault, which is undesired because
    the test needs to rerun, so, there is a signal handler for SIGSEGV which
    will restart the test.
    
    Signed-off-by: Breno Leitao <leitao at debian.org>

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore
index c3ee8393dae8..89679822ebc9 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
 tm-signal-context-chk-gpr
 tm-signal-context-chk-vmx
 tm-signal-context-chk-vsx
+tm-signal-force-msr
 tm-vmx-unavail
 tm-unavailable
 tm-trap
diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile
index 9fc2cf6fbc92..58a2ebd13958 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu
 
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
 	tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \
-	$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
+	$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr
 
 top_srcdir = ../../../../..
 include ../../lib.mk
@@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
 $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
 $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx
 $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
+$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread
 
 SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
 $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
new file mode 100644
index 000000000000..496596f3c1bf
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
+ *
+ * This test raises a SIGUSR1 signal, and toggle the MSR[TS]
+ * fields at the signal handler. With MSR[TS] being set, the kernel will
+ * force a recheckpoint, which may cause a segfault when returning to
+ * user space. Since the kernel needs to re-run, the segfault needs to be
+ * caught and handled.
+ *
+ * In order to continue the test even after a segfault, the context is
+ * saved prior to the signal being raised, and it is restored when there is
+ * a segmentation fault. This happens for COUNT_MAX times.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <string.h>
+#include <ucontext.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+#include "tm.h"
+#include "utils.h"
+#include "reg.h"
+
+#define COUNT_MAX       5000		/* Number of interactions */
+
+/* Setting contexts because the test will crash and we want to recover */
+ucontext_t init_context, main_context;
+
+static int count, first_time;
+
+void usr_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+	ucontext_t *ucp = uc;
+	int ret;
+
+	/*
+	 * Allocating memory in a signal handler, and never freeing it on
+	 * purpose, forcing the heap increase, so, the memory leak is what
+	 * we want here.
+	 */
+	ucp->uc_link = mmap(NULL, sizeof(ucontext_t),
+			    PROT_READ | PROT_WRITE,
+			    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	if (ucp->uc_link == (void *)-1) {
+		perror("Mmap failed");
+		exit(-1);
+	}
+
+	/* Forcing the page to be allocated in a page fault */
+	ret = madvise(ucp->uc_link, sizeof(ucontext_t), MADV_DONTNEED);
+	if (ret) {
+		perror("madvise failed");
+		exit(-1);
+	}
+
+	memcpy(&ucp->uc_link, &ucp->uc_mcontext, sizeof(ucp->uc_mcontext));
+
+	/* Forcing to enable MSR[TM] */
+	ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
+
+	/*
+	 * A fork inside a signal handler seems to be more efficient than a
+	 * fork() prior to the signal being raised.
+	 */
+	if (fork() == 0) {
+		/*
+		 * Both child and parent will return, but, child returns
+		 * with count set so it will exit in the next segfault.
+		 * Parent will continue to loop.
+		 */
+		count = COUNT_MAX;
+	}
+
+	/*
+	 * If the change above does not hit the bug, it will cause a
+	 * segmentation fault, since the ck structures are NULL.
+	 */
+}
+
+void seg_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+	if (count == COUNT_MAX) {
+		/* Return to tm_signal_force_msr() and exit */
+		setcontext(&main_context);
+	}
+
+	count++;
+
+	/* Reexecute the test */
+	setcontext(&init_context);
+}
+
+void tm_trap_test(void)
+{
+	struct sigaction usr_sa, seg_sa;
+	stack_t ss;
+
+	usr_sa.sa_flags = SA_SIGINFO | SA_ONSTACK;
+	usr_sa.sa_sigaction = usr_signal_handler;
+
+	seg_sa.sa_flags = SA_SIGINFO;
+	seg_sa.sa_sigaction = seg_signal_handler;
+
+	/*
+	 * Set initial context. Will get back here from
+	 * seg_signal_handler()
+	 */
+	getcontext(&init_context);
+
+	/* Allocated am alternative signal stack area */
+	ss.ss_sp = mmap(NULL, SIGSTKSZ, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	ss.ss_size = SIGSTKSZ;
+	ss.ss_flags = 0;
+
+	if (ss.ss_sp == (void *)-1) {
+		perror("mmap error\n");
+		exit(-1);
+	}
+
+	/* Force the allocation through a page fault */
+	if (madvise(ss.ss_sp, SIGSTKSZ, MADV_DONTNEED)) {
+		perror("madvise\n");
+		exit(-1);
+	}
+
+	/* Setting a alternative stack to generate a page fault when
+	 * the signal is raised.
+	 */
+	if (sigaltstack(&ss, NULL)) {
+		perror("sigaltstack\n");
+		exit(-1);
+	}
+
+	/* The signal handler will enable MSR_TS */
+	sigaction(SIGUSR1, &usr_sa, NULL);
+	/* If it does not crash, it will segfault, avoid it to retest */
+	sigaction(SIGSEGV, &seg_sa, NULL);
+
+	raise(SIGUSR1);
+}
+
+int tm_signal_force_msr(void)
+{
+	SKIP_IF(!have_htm());
+
+	/* Will get back here after COUNT_MAX interactions */
+	getcontext(&main_context);
+
+	if (!first_time++)
+		tm_trap_test();
+
+	return EXIT_SUCCESS;
+}
+
+int main(int argc, char **argv)
+{
+	test_harness(tm_signal_force_msr, "tm_signal_force_msr");
+}


More information about the Linuxppc-dev mailing list