[Skiboot] [PATCH 2/2] core/direct-controls: improve p9_stop_thread error handling

Nicholas Piggin npiggin at gmail.com
Thu May 3 18:38:09 AEST 2018


p9_stop_thread should fail the operation if it finds the thread was
already quiescd. This implies something else is doing direct controls
on the thread (e.g., pdbg) or there is some exceptional condition we
don't know how to deal with. Proceeding here would cause things to
trample on each other, for example the hard lockup watchdog trying to
send a sreset to the core while it is stopped for debugging with pdbg
will end in tears.

If p9_stop_thread times out waiting for the thread to quiesce, do
not hit it with a core_start direct control, because we don't know
what state things are in and doing more things at this point is worse
than doing nothing. There is no good recipe described in the workbook
to de-assert the core_stop control if it fails to quiesce the thread.
After timing out here, the thread may eventually quiesce and get
stuck, but that's simpler to debug than undefied behaviour.

Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
---
 core/direct-controls.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/core/direct-controls.c b/core/direct-controls.c
index 4511a113..04b93a16 100644
--- a/core/direct-controls.c
+++ b/core/direct-controls.c
@@ -496,10 +496,12 @@ static int p9_stop_thread(struct cpu_thread *cpu)
 	rc = p9_thread_quiesced(cpu);
 	if (rc < 0)
 		return rc;
-	if (rc)
-		prlog(PR_WARNING, "Stopping thread %u:%u:%u warning:"
-				" thread is quiesced already.\n",
+	if (rc) {
+		prlog(PR_ERR, "Could not stop thread %u:%u:%u:"
+				" Thread is quiesced already.\n",
 				chip_id, core_id, thread_id);
+		return OPAL_BUSY;
+	}
 
 	if (xscom_write(chip_id, dctl_addr, P9_THREAD_STOP(thread_id))) {
 		prlog(PR_ERR, "Could not stop thread %u:%u:%u:"
@@ -522,12 +524,6 @@ static int p9_stop_thread(struct cpu_thread *cpu)
 			" Unable to quiesce thread.\n",
 			chip_id, core_id, thread_id);
 
-	if (xscom_write(chip_id, dctl_addr, P9_THREAD_CONT(thread_id))) {
-		prlog(PR_ERR, "Could not resume thread %u:%u:%u:"
-				" Unable to write EC_DIRECT_CONTROLS.\n",
-				chip_id, core_id, thread_id);
-	}
-
 	return OPAL_HARDWARE;
 }
 
-- 
2.17.0



More information about the Skiboot mailing list