[Skiboot] [PATCH] core: POWER9 implement OPAL_SIGNAL_SYSTEM_RESET

Nicholas Piggin npiggin at gmail.com
Wed Sep 13 18:50:03 AEST 2017


This implements OPAL_SIGNAL_SYSTEM_RESET, using scom registers to
quiesce the target thread and raise a system reset exception on it.

This has been tested on DD1 and DD2 including ESL=1 power saving modes.
It has not yet been tested with deep idle states, because those have
not yet been enabled. If those cannot be supported, it should be
possible to query PSSCR[PLS] from scoms and fail in that case (Linux
could fall back to a doorbell).

Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
---
This is now tested and seems to be working fine on a DD2. Only
changes since last post are to add documentation and tidy up the
constants and things a bit.

With this patch and some enablement, Linux is able to detect hard
locked (MSR[EE]=0) threads and get stack traces from them, bring
them into xmon or panic and restart the box, etc. Here is a test case
where CPU60 disables interrupts then spins, with hardlockup_panic=1:

Watchdog CPU:64 detected Hard LOCKUP other CPUS:60
Watchdog CPU:60 Hard LOCKUP
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm_hv kvm iptable_filter ib_ipoib ib_cm ib_core vmx_crypto binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_alua ip_tables x_tables autofs4 crc32c_vpmsum
CPU: 60 PID: 4918 Comm: bash Not tainted 4.13.0-11559-g6d8ef53e8b2f-dirty #10
task: c000000f15a61600 task.stack: c000000f144fc000
NIP:  c0000000000aa81c LR: c0000000000accc8 CTR: c0000000000aa800
REGS: c00000003fd2bd80 TRAP: 0100   Not tainted  (4.13.0-11559-g6d8ef53e8b2f-dirty)
MSR:  9000000000081033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28422228  XER: 20040000
CFAR: c0000000000aa81c SOFTE: 0 
GPR00: c000000000638bf8 c000000f144ffba0 c00000000106c000 0000000000000000 
GPR04: c00000000109bac8 c0000000000b1710 c00000000109bae0 c0000000000b1690 
GPR08: 0000000000000000 c000000000f0fcf0 0000000000000001 c00000000109bad0 
GPR12: c0000000000aa800 c00000000fd54a00 0000000010180df8 0000000010189e60 
GPR16: 0000000010189ed8 0000000010151270 000000001018bd88 000000001018de78 
GPR20: 0000000038147048 0000000000000001 00000000101645e0 0000000010163c10 
GPR24: 00007ffff6f9efb4 00007ffff6f9efb0 c000000000fab868 0000000000000004 
GPR28: c000000000f254b8 0000000000000078 c000000000f30dbc c00000000109bac0 
NIP [c0000000000aa81c] xmon+0x1c/0x20
LR [c0000000000accc8] sysrq_handle_xmon+0xc8/0xd0
Call Trace:
[c000000f144ffba0] [c00000000014c584] printk+0x48/0x5c (unreliable)
[c000000f144ffbd0] [c000000000638bf8] __handle_sysrq+0xe8/0x280
[c000000f144ffc70] [c0000000006393a8] write_sysrq_trigger+0x78/0xa0
[c000000f144ffca0] [c0000000003c93d0] proc_reg_write+0xb0/0x110
[c000000f144ffcf0] [c00000000033555c] __vfs_write+0x6c/0x1d0
[c000000f144ffd90] [c000000000337434] vfs_write+0xd4/0x240
[c000000f144ffde0] [c00000000033932c] SyS_write+0x6c/0x110
[c000000f144ffe30] [c00000000000b220] system_call+0x58/0x6c
Instruction dump:
4e800020 00000000 00000000 00000000 00000000 e94d0020 7d410164 894d027b 
39000000 990d027a 614a0001 994d027b <48000000> 3c4c00fc 384217e0 2ba30980 
Kernel panic - not syncing: Hard LOCKUP
CPU: 64 PID: 0 Comm: swapper/64 Not tainted 4.13.0-11559-g6d8ef53e8b2f-dirty #10
Call Trace:
[c000000f229ab560] [c000000000ae31d0] dump_stack+0xb0/0xf0 (unreliable)
[c000000f229ab5a0] [c0000000000d3d3c] panic+0x164/0x408
[c000000f229ab640] [c0000000000d3764] nmi_panic+0xa4/0xb0
[c000000f229ab6b0] [c00000000002f700] watchdog_timer_interrupt+0x380/0x390
[c000000f229ab760] [c00000000002f7e0] wd_timer_fn+0x40/0x60
[c000000f229ab790] [c000000000172574] call_timer_fn+0x64/0x1d0
[c000000f229ab820] [c000000000172860] expire_timers+0x140/0x1e0
[c000000f229ab890] [c0000000001729d8] run_timer_softirq+0xd8/0x240
[c000000f229ab920] [c000000000b04410] __do_softirq+0x180/0x3f8
[c000000f229aba20] [c0000000000dbec8] irq_exit+0xf8/0x130
[c000000f229aba40] [c0000000000250c4] timer_interrupt+0xa4/0x110
[c000000f229aba80] [c000000000009018] decrementer_common+0x128/0x130
--- interrupt: 901 at snooze_loop+0xac/0x190
    LR = snooze_loop+0x170/0x190
[c000000f229abd70] [c000000f229abdb0] 0xc000000f229abdb0 (unreliable)
[c000000f229abdb0] [c00000000094270c] cpuidle_enter_state+0x16c/0x450
[c000000f229abe10] [c000000000135b40] call_cpuidle+0x70/0xd0
[c000000f229abe50] [c000000000135f88] do_idle+0x1f8/0x2c0
[c000000f229abec0] [c000000000136278] cpu_startup_entry+0x38/0x40
[c000000f229abef0] [c000000000040a10] start_secondary+0x4c0/0x4f0
[c000000f229abf90] [c00000000000ab6c] start_secondary_prolog+0x10/0x14
Rebooting in 10 seconds..

*** snip pages of "Trying to free IRQ blah from IRQ context!" ***

--== Welcome to Hostboot hostboot-c68be97/hbicore.bin ==--
...


 core/Makefile.inc                             |   1 +
 core/sreset.c                                 | 261 ++++++++++++++++++++++++++
 doc/opal-api/opal-signal-system-reset-145.rst |  23 ++-
 hw/xscom.c                                    |   4 +
 include/skiboot.h                             |   3 +
 5 files changed, 282 insertions(+), 10 deletions(-)
 create mode 100644 core/sreset.c

diff --git a/core/Makefile.inc b/core/Makefile.inc
index f2de2f64..16204978 100644
--- a/core/Makefile.inc
+++ b/core/Makefile.inc
@@ -9,6 +9,7 @@ CORE_OBJS += vpd.o hostservices.o platform.o nvram.o nvram-format.o hmi.o
 CORE_OBJS += console-log.o ipmi.o time-utils.o pel.o pool.o errorlog.o
 CORE_OBJS += timer.o i2c.o rtc.o flash.o sensor.o ipmi-opal.o
 CORE_OBJS += flash-subpartition.o bitmap.o buddy.o pci-quirk.o powercap.o psr.o
+CORE_OBJS += sreset.o
 
 ifeq ($(SKIBOOT_GCOV),1)
 CORE_OBJS += gcov-profiling.o
diff --git a/core/sreset.c b/core/sreset.c
new file mode 100644
index 00000000..5081e1c2
--- /dev/null
+++ b/core/sreset.c
@@ -0,0 +1,261 @@
+/* Copyright 2017 IBM Corp.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * 	http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ * implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <skiboot.h>
+#include <cpu.h>
+#include <fsp.h>
+#include <psi.h>
+#include <opal.h>
+#include <xscom.h>
+#include <interrupts.h>
+#include <cec.h>
+#include <timebase.h>
+#include <pci.h>
+#include <chip.h>
+#include <chiptod.h>
+#include <ipmi.h>
+
+#define P9_RAS_STATUS			0x10a02
+#define P9_RSTAT_QUIESCED(t)		PPC_BITMASK(0 + 8*(t), 3 + 8*(t))
+#define P9_RSTAT_RETRIES		100
+
+#define P9_RAS_MODEREG			0x10a9d
+#define P9_DIRECT_CONTROLS		0x10a9c
+#define P9_DCTL_STOP(t)			PPC_BIT(7 + 8*(t))
+#define P9_DCTL_CONT(t)			PPC_BIT(6 + 8*(t))
+#define P9_DCTL_SRESET(t)		PPC_BIT(4 + 8*(t))
+#define P9_DCTL_PWR(t)			PPC_BIT(32 + 8*(t))
+
+#define P9_CORE_THREAD_STATE		0x10ab3
+#define P9_CTS_STOP(t)			PPC_BIT(56 + (t))
+
+#define P9_PPM_GPMMR			0xf0100
+#define P9_GPMMR_SPWKUP_DONE		PPC_BIT(1)
+#define P9_GPMMR_SPWKUP_TIMEOUT		10
+
+#define P9_PPM_SPWKUP_OTR		0xf010a
+#define P9_SPWKUP_SET			PPC_BIT(0)
+
+
+static int core_set_special_wakeup(struct cpu_thread *cpu)
+{
+	uint32_t chip_id = pir_to_chip_id(cpu->pir);
+	uint32_t core_id = pir_to_core_id(cpu->pir);
+	uint32_t swake_addr;
+	uint32_t gpmmr_addr;
+	uint64_t val;
+	int i;
+
+	swake_addr = XSCOM_ADDR_P9_EC(core_id, P9_PPM_SPWKUP_OTR);
+	gpmmr_addr = XSCOM_ADDR_P9_EC(core_id, P9_PPM_GPMMR);
+
+	/*
+	 * The read-write-read pattern with read errors ignored comes from
+	 * P8 code. This should be revisited, but it does not appear to have
+	 * any ill effects.
+	 */
+	xscom_read(chip_id, swake_addr, &val);
+	if (xscom_write(chip_id, swake_addr, P9_SPWKUP_SET)) {
+		prlog(PR_WARNING, "SRESET: Unable to write SPWKUP_OTR register\n");
+		return OPAL_HARDWARE;
+	}
+	xscom_read(chip_id, swake_addr, &val);
+
+	for (i = 0; i < P9_GPMMR_SPWKUP_TIMEOUT; i++) {
+		if (xscom_read(chip_id, gpmmr_addr, &val)) {
+			prlog(PR_WARNING, "SRESET: Unable to read GPMMR register\n");
+			return OPAL_HARDWARE;
+		}
+		if (val & P9_GPMMR_SPWKUP_DONE)
+			return 0;
+
+		time_wait_us(1);
+	}
+
+	/* De-assert special wakeup bit */
+	xscom_read(chip_id, swake_addr, &val);
+	xscom_write(chip_id, swake_addr, 0);
+	xscom_read(chip_id, swake_addr, &val);
+
+	prlog(PR_WARNING, "SRESET: Special wakeup mode could not be set.\n");
+	return OPAL_HARDWARE;
+}
+
+static void core_clear_special_wakeup(struct cpu_thread *cpu)
+{
+	uint32_t chip_id = pir_to_chip_id(cpu->pir);
+	uint32_t core_id = pir_to_core_id(cpu->pir);
+	uint32_t swake_addr;
+	uint64_t val;
+
+	swake_addr = XSCOM_ADDR_P9_EC(core_id, P9_PPM_SPWKUP_OTR);
+
+	/* De-assert special wakeup bit */
+	xscom_read(chip_id, swake_addr, &val);
+	xscom_write(chip_id, swake_addr, 0);
+	xscom_read(chip_id, swake_addr, &val);
+}
+
+static int thread_quiesced(struct cpu_thread *cpu)
+{
+	uint32_t chip_id = pir_to_chip_id(cpu->pir);
+	uint32_t core_id = pir_to_core_id(cpu->pir);
+	uint32_t thread_id = pir_to_thread_id(cpu->pir);
+	uint32_t ras_addr;
+	uint64_t ras_status;
+
+	ras_addr = XSCOM_ADDR_P9_EC(core_id, P9_RAS_STATUS);
+	if (xscom_read(chip_id, ras_addr, &ras_status)) {
+		prlog(PR_WARNING, "SRESET: Unable to read status register\n");
+		return OPAL_HARDWARE;
+	}
+
+	if ((ras_status & P9_RSTAT_QUIESCED(thread_id))
+			== P9_RSTAT_QUIESCED(thread_id))
+		return 1;
+
+	return 0;
+}
+
+static int stop_thread(struct cpu_thread *cpu)
+{
+	uint32_t chip_id = pir_to_chip_id(cpu->pir);
+	uint32_t core_id = pir_to_core_id(cpu->pir);
+	uint32_t thread_id = pir_to_thread_id(cpu->pir);
+	uint32_t dctl_addr;
+	int i;
+
+	dctl_addr = XSCOM_ADDR_P9_EC(core_id, P9_DIRECT_CONTROLS);
+
+	xscom_write(chip_id, dctl_addr, P9_DCTL_STOP(thread_id));
+
+	for (i = 0; i < P9_RSTAT_RETRIES; i++) {
+		int rc = thread_quiesced(cpu);
+		if (rc < 0)
+			break;
+		if (rc)
+			return 0;
+	}
+
+	xscom_write(chip_id, dctl_addr, P9_DCTL_CONT(thread_id));
+	prlog(PR_WARNING, "SRESET: Could not quiesce thread\n");
+	return OPAL_HARDWARE;
+}
+
+static int sreset_thread(struct cpu_thread *cpu)
+{
+	uint32_t chip_id = pir_to_chip_id(cpu->pir);
+	uint32_t core_id = pir_to_core_id(cpu->pir);
+	uint32_t thread_id = pir_to_thread_id(cpu->pir);
+	uint32_t dctl_addr;
+	uint32_t cts_addr;
+	uint64_t cts_val;
+
+	dctl_addr = XSCOM_ADDR_P9_EC(core_id, P9_DIRECT_CONTROLS);
+	cts_addr = XSCOM_ADDR_P9_EC(core_id, P9_CORE_THREAD_STATE);
+
+	if (xscom_read(chip_id, cts_addr, &cts_val)) {
+		prlog(PR_WARNING, "SRESET: Unable to read CORE_THREAD_STATE register\n");
+		return OPAL_HARDWARE;
+	}
+	if (!(cts_val & P9_CTS_STOP(thread_id))) {
+		/*
+		 * Quiescing a thread causes SRR1[46:47] to be set by the
+		 * system reset interrupt as though it was in a power saving
+		 * mode even if it was not.
+		 *
+		 * Setting the DCTL_PWR bit causes SRR1[46:47] to be clear,
+		 * so poke that if thread state says we were in stop.
+		 */
+		if (xscom_write(chip_id, dctl_addr, P9_DCTL_PWR(thread_id))) {
+			prlog(PR_WARNING, "SRESET: Unable to set power saving mode\n");
+			return OPAL_HARDWARE;
+		}
+	}
+
+	if (xscom_write(chip_id, dctl_addr, P9_DCTL_SRESET(thread_id))) {
+		prlog(PR_WARNING, "SRESET: Unable to write DIRECT_CONTROLS register\n");
+		return OPAL_HARDWARE;
+	}
+
+	return 0;
+}
+
+static int64_t sreset_cpu(struct cpu_thread *cpu)
+{
+	int rc;
+
+	if (this_cpu() == cpu) {
+		prlog(PR_WARNING, "SRESET: Unable to reset self\n");
+		return OPAL_UNSUPPORTED;
+	}
+	if (this_cpu()->primary == cpu->primary) {
+		prlog(PR_WARNING, "SRESET: Unable to reset threads on same core\n");
+		return OPAL_PARTIAL;
+	}
+
+	rc = thread_quiesced(cpu);
+	if (rc < 0)
+		return rc;
+	if (rc) {
+		prlog(PR_WARNING, "SRESET: Thread is quiesced already\n");
+		return OPAL_WRONG_STATE;
+	}
+
+	rc = core_set_special_wakeup(cpu);
+	if (rc)
+		return rc;
+
+	rc = stop_thread(cpu);
+	if (rc) {
+		core_clear_special_wakeup(cpu);
+		return rc;
+	}
+
+	rc = sreset_thread(cpu);
+
+	core_clear_special_wakeup(cpu);
+
+	return 0;
+}
+
+static struct lock sreset_lock = LOCK_UNLOCKED;
+
+int64_t signal_system_reset(int cpu_nr)
+{
+	struct cpu_thread *cpu;
+	int64_t ret;
+
+	if (proc_gen != proc_gen_p9)
+		return OPAL_UNSUPPORTED;
+
+	/* Broadcasts unsupported because we can't signal siblings */
+	if (cpu_nr < 0)
+		return OPAL_PARTIAL;
+
+	/* Reset a single CPU */
+	cpu = find_cpu_by_server(cpu_nr);
+	if (!cpu) {
+		prlog(PR_WARNING, "SRESET: could not find cpu by server %d\n", cpu_nr);
+		return OPAL_PARAMETER;
+	}
+
+	lock(&sreset_lock);
+	ret = sreset_cpu(cpu);
+	unlock(&sreset_lock);
+
+	return ret;
+}
diff --git a/doc/opal-api/opal-signal-system-reset-145.rst b/doc/opal-api/opal-signal-system-reset-145.rst
index 3ddb6845..6fc7a20b 100644
--- a/doc/opal-api/opal-signal-system-reset-145.rst
+++ b/doc/opal-api/opal-signal-system-reset-145.rst
@@ -9,12 +9,13 @@ OPAL_SIGNAL_SYSTEM_RESET
 This OPAL call causes the specified cpu(s) to be reset to the system
 reset exception handler (0x100).
 
-The exact contents of system registers (e.g., SRR1 wakeup causes) may
-vary depending on implementation and should not be relied upon.
+The SRR1 register will indicate a power-saving wakeup when appropriate,
+and the wake reason will be System Reset (see Power ISA).
 
-Resetting active threads on the same core as this call is run may
-not be supported by some platforms. In that case, OPAL_PARTIAL will be
-returned and NONE of the interrupts will be delivered.
+This interrupt may not be recoverable in some cases (e.g., if it is
+raised when the target has MSR[RI]=0), so it should not be used in
+normal operation, but only for crashing, debugging, and similar
+exceptional cases.
 
 Arguments
 ---------
@@ -28,18 +29,20 @@ Arguments
 Returns
 -------
 OPAL_SUCCESS
-  The power down was updated successful.
+  The system reset requests to target CPU(s) was successful. This returns
+  asynchronously without acknowledgement that system reset interrupt
+  processing has completed or even started.
 
 OPAL_PARAMETER
   A parameter was incorrect.
 
 OPAL_HARDWARE
-  Hardware indicated failure during reset.
+  Hardware indicated failure during reset, some or all of the target CPUs
+  may have the system reset delivered.
 
 OPAL_PARTIAL
-  Platform can not reset all requested CPUs at this time. This requires
-  platform-specific code to work around, otherwise to be treated as
-  failure. No CPUs are reset.
+  Platform can not reset sibling threads on the same core as requested.
+  None of the specified CPUs are reset in this case.
 
 OPAL_UNSUPPORTED
   This processor/platform is not supported.
diff --git a/hw/xscom.c b/hw/xscom.c
index 7bd78bf9..4a6d91f4 100644
--- a/hw/xscom.c
+++ b/hw/xscom.c
@@ -705,6 +705,10 @@ static void xscom_init_chip_info(struct proc_chip *chip)
 		printf("P9 DD%i.%i%d detected\n", 0xf & (chip->ec_level >> 4),
 		       chip->ec_level & 0xf, rev);
 		chip->ec_rev = rev;
+
+		if (!chip_quirk(QUIRK_MAMBO_CALLOUTS))
+			opal_register(OPAL_SIGNAL_SYSTEM_RESET,
+					signal_system_reset, 1);
 	}
 }
 
diff --git a/include/skiboot.h b/include/skiboot.h
index 0ab9f388..55aa9b8e 100644
--- a/include/skiboot.h
+++ b/include/skiboot.h
@@ -198,6 +198,9 @@ extern char __sym_map_end[];
 extern unsigned long get_symbol(unsigned long addr,
 				char **sym, char **sym_end);
 
+/* System reset */
+extern int64_t signal_system_reset(int cpu_nr);
+
 /* Fast reboot support */
 extern void disable_fast_reboot(const char *reason);
 extern void fast_reboot(void);
-- 
2.13.3



More information about the Skiboot mailing list