[PATCH] selftests/powerpc: New TM signal self test
Breno Leitao
leitao at debian.org
Wed Dec 5 04:51:47 AEDT 2018
Hi Mikey,
On 11/29/18 12:11 AM, Michael Neuling wrote:
> On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote:
>> A new self test that forces MSR[TS] to be set without calling any TM
>> instruction. This test also tries to cause a page fault at a signal
>> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
>> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
>> when tm_recheckpoint() is called.
>>
>> This test is not deterministic since it is hard to guarantee that the page
>> access will cause a page fault. Tests have shown that the bug could be
>> exposed with few interactions in a buggy kernel. This test is configured to
>> loop 5000x, having a good chance to hit the kernel issue in just one run.
>> This self test takes less than two seconds to run.
>
> You could try using sigaltstack() to put the ucontext somewhere else. Then you
> could play tricks with that memory to try to force a fault.
> madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick.
Yes, it sounded interesting and I implemented the test using madvice(). Thanks
for the suggestion!
The current approach didn't seem to improve the amount of page faults at it
seems that MADV_DONTNEED makes no difference when using a Lazy page loading.
This is the test I did, where 'original' is my current patch and 'madvice` is
the patch below:
Performance counter stats for './original':
0 major-faults
125,100 minor-faults
2.575479619 seconds time elapsed
Performance counter stats for './madvice':
0 major-faults
125,099 minor-faults
Other than that, I didn't see any improvements in the reproduction rate also, although
it is a bit challenging to measure, since it crashes the machine and I can't run a
full statistical model.
This is the current patch I compared to the original one
---
commit 082a9fe29412943adfa2d6a363f44bac8e81d0ce
Author: Breno Leitao <leitao at debian.org>
Date: Tue Nov 13 18:02:57 2018 -0500
selftests/powerpc: New TM signal self test
A new self test that forces MSR[TS] to be set without calling any TM
instruction. This test also tries to cause a page fault at a signal
handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
when tm_recheckpoint() is called.
This test is not deterministic, since it is hard to guarantee that the page
access will cause a page fault. In order to force more page faults at
signal context, the signal handler and the ucontext are being mapped into a
MADV_DONTNEED memory chunks.
Tests have shown that the bug could be exposed with few interactions in a
buggy kernel. This test is configured to loop 5000x, having a good chance
to hit the kernel issue in just one run. This self test takes less than
two seconds to run.
This test uses set/getcontext because the kernel will recheckpoint
zeroed structures, causing the test to segfault, which is undesired because
the test needs to rerun, so, there is a signal handler for SIGSEGV which
will restart the test.
Signed-off-by: Breno Leitao <leitao at debian.org>
diff --git a/tools/testing/selftests/powerpc/tm/.gitignore b/tools/testing/selftests/powerpc/tm/.gitignore
index c3ee8393dae8..89679822ebc9 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
tm-signal-context-chk-gpr
tm-signal-context-chk-vmx
tm-signal-context-chk-vsx
+tm-signal-force-msr
tm-vmx-unavail
tm-unavailable
tm-trap
diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile
index 9fc2cf6fbc92..58a2ebd13958 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu
TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable tm-trap \
- $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
+ $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr
top_srcdir = ../../../../..
include ../../lib.mk
@@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
$(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
$(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-error=uninitialized -mvsx
$(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
+$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread
SIGNAL_CONTEXT_CHK_TESTS := $(patsubst %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
$(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
new file mode 100644
index 000000000000..496596f3c1bf
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
+ *
+ * This test raises a SIGUSR1 signal, and toggle the MSR[TS]
+ * fields at the signal handler. With MSR[TS] being set, the kernel will
+ * force a recheckpoint, which may cause a segfault when returning to
+ * user space. Since the kernel needs to re-run, the segfault needs to be
+ * caught and handled.
+ *
+ * In order to continue the test even after a segfault, the context is
+ * saved prior to the signal being raised, and it is restored when there is
+ * a segmentation fault. This happens for COUNT_MAX times.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <string.h>
+#include <ucontext.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+#include "tm.h"
+#include "utils.h"
+#include "reg.h"
+
+#define COUNT_MAX 5000 /* Number of interactions */
+
+/* Setting contexts because the test will crash and we want to recover */
+ucontext_t init_context, main_context;
+
+static int count, first_time;
+
+void usr_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+ ucontext_t *ucp = uc;
+ int ret;
+
+ /*
+ * Allocating memory in a signal handler, and never freeing it on
+ * purpose, forcing the heap increase, so, the memory leak is what
+ * we want here.
+ */
+ ucp->uc_link = mmap(NULL, sizeof(ucontext_t),
+ PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+ if (ucp->uc_link == (void *)-1) {
+ perror("Mmap failed");
+ exit(-1);
+ }
+
+ /* Forcing the page to be allocated in a page fault */
+ ret = madvise(ucp->uc_link, sizeof(ucontext_t), MADV_DONTNEED);
+ if (ret) {
+ perror("madvise failed");
+ exit(-1);
+ }
+
+ memcpy(&ucp->uc_link, &ucp->uc_mcontext, sizeof(ucp->uc_mcontext));
+
+ /* Forcing to enable MSR[TM] */
+ ucp->uc_mcontext.gp_regs[PT_MSR] |= MSR_TS_S;
+
+ /*
+ * A fork inside a signal handler seems to be more efficient than a
+ * fork() prior to the signal being raised.
+ */
+ if (fork() == 0) {
+ /*
+ * Both child and parent will return, but, child returns
+ * with count set so it will exit in the next segfault.
+ * Parent will continue to loop.
+ */
+ count = COUNT_MAX;
+ }
+
+ /*
+ * If the change above does not hit the bug, it will cause a
+ * segmentation fault, since the ck structures are NULL.
+ */
+}
+
+void seg_signal_handler(int signo, siginfo_t *si, void *uc)
+{
+ if (count == COUNT_MAX) {
+ /* Return to tm_signal_force_msr() and exit */
+ setcontext(&main_context);
+ }
+
+ count++;
+
+ /* Reexecute the test */
+ setcontext(&init_context);
+}
+
+void tm_trap_test(void)
+{
+ struct sigaction usr_sa, seg_sa;
+ stack_t ss;
+
+ usr_sa.sa_flags = SA_SIGINFO | SA_ONSTACK;
+ usr_sa.sa_sigaction = usr_signal_handler;
+
+ seg_sa.sa_flags = SA_SIGINFO;
+ seg_sa.sa_sigaction = seg_signal_handler;
+
+ /*
+ * Set initial context. Will get back here from
+ * seg_signal_handler()
+ */
+ getcontext(&init_context);
+
+ /* Allocated am alternative signal stack area */
+ ss.ss_sp = mmap(NULL, SIGSTKSZ, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+ ss.ss_size = SIGSTKSZ;
+ ss.ss_flags = 0;
+
+ if (ss.ss_sp == (void *)-1) {
+ perror("mmap error\n");
+ exit(-1);
+ }
+
+ /* Force the allocation through a page fault */
+ if (madvise(ss.ss_sp, SIGSTKSZ, MADV_DONTNEED)) {
+ perror("madvise\n");
+ exit(-1);
+ }
+
+ /* Setting a alternative stack to generate a page fault when
+ * the signal is raised.
+ */
+ if (sigaltstack(&ss, NULL)) {
+ perror("sigaltstack\n");
+ exit(-1);
+ }
+
+ /* The signal handler will enable MSR_TS */
+ sigaction(SIGUSR1, &usr_sa, NULL);
+ /* If it does not crash, it will segfault, avoid it to retest */
+ sigaction(SIGSEGV, &seg_sa, NULL);
+
+ raise(SIGUSR1);
+}
+
+int tm_signal_force_msr(void)
+{
+ SKIP_IF(!have_htm());
+
+ /* Will get back here after COUNT_MAX interactions */
+ getcontext(&main_context);
+
+ if (!first_time++)
+ tm_trap_test();
+
+ return EXIT_SUCCESS;
+}
+
+int main(int argc, char **argv)
+{
+ test_harness(tm_signal_force_msr, "tm_signal_force_msr");
+}
More information about the Linuxppc-dev
mailing list