[Skiboot] [PATCH] fix lock error when BT IRQ preempt BT timer

lixg lixgemail at gmail.com
Wed Jan 6 19:33:18 AEDT 2021


BT IRQ may preempt BT timer if BMC response host when bt msg timeout.
When BT IRQ preempt BT timer, the infight_bt_msg did not protected by bt.lock very well.

And we will see the following log:
[29006114.163785853,3] BT: seq 0x81 netfn 0x0a cmd 0x23: Timeout sending message
[29006114.288029290,3] BT: seq 0x81 netfn 0x0b cmd 0x23: Timeout sending message
[29006114.288917798,3] IPMI: Incorrect netfn 0x0b in response

It may cause 'CPU Hardlock UP', 'memory refree', 'kernel crash' or something else...

Signed-off-by: lixg <867314078 at qq.com>
---
 hw/bt.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/bt.c b/hw/bt.c
index cf967f89..24e6ef7f 100644
--- a/hw/bt.c
+++ b/hw/bt.c
@@ -111,7 +111,7 @@ struct bt {
 };
 
 static struct bt bt;
-static struct bt_msg *inflight_bt_msg; /* Holds in flight message */
+static struct bt_msg * volatile inflight_bt_msg; /* Holds in flight message */
 
 static int ipmi_seq;
 
@@ -211,6 +211,11 @@ static void bt_msg_del(struct bt_msg *bt_msg)
 {
 	list_del(&bt_msg->link);
 	bt.queue_len--;
+
+	/* once inflight_bt_msg out of list, it should be emptyed */
+	if (bt_msg == inflight_bt_msg)
+		inflight_bt_msg = NULL;
+
 	unlock(&bt.lock);
 	ipmi_cmd_done(bt_msg->ipmi_msg.cmd,
 		      IPMI_NETFN_RETURN_CODE(bt_msg->ipmi_msg.netfn),
@@ -394,7 +399,7 @@ static void bt_expire_old_msg(uint64_t tb)
 			bt_msg_del(bt_msg);
 
 			/* Ready to send next message */
-			inflight_bt_msg = NULL;
+			//inflight_bt_msg = NULL;
 
 			/*
 			 * Timing out a message is inherently racy as the BMC
-- 
2.17.1



More information about the Skiboot mailing list