[Skiboot] [PATCH] ipmi: call check_timers() while waiting for synchronous messages to complete

Stewart Smith stewart at linux.vnet.ibm.com
Wed May 1 16:59:35 AEST 2019


Cédric Le Goater <clg at kaod.org> writes:
> BT responses are handled using a timer doing the polling. To hope to
> get an answer to an IPMI synchronous message, the timer needs to run.
>
> This issue shows up very quickly under QEMU when loading the first
> flash resource with the IPMI HIOMAP backend.
>
> Adding a timeout would also help in reporting errors instead of
> looping indefinitely waiting for a response.
>
> Signed-off-by: Cédric Le Goater <clg at kaod.org>
> ---
>  core/ipmi.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/core/ipmi.c b/core/ipmi.c
> index 2bf3f4dabe19..78b410fd1aea 100644
> --- a/core/ipmi.c
> +++ b/core/ipmi.c
> @@ -182,8 +182,16 @@ void ipmi_queue_msg_sync(struct ipmi_msg *msg)
>  	ipmi_queue_msg_head(msg);
>  	unlock(&sync_lock);
>
> -	while (sync_msg == msg)
> +	/*
> +	 * BT response handling relies on a timer. Run timers once in
> +	 * a while.
> +	 *
> +	 * TODO (clg): implement a timeout for IPMI synchronous messages
> +	 */
> +	while (sync_msg == msg) {
> +		check_timers(0);
>  		time_wait_ms(10);
> +	}
>  }

So, I guess the concern here is if we can ever get into a locking
problem where we have ipmi_queue_msg_sync() called with a lock held that
a timer may need to take. Considering that pretty much all the time that
ipmi_queue_msg_sync() is called it's because Something Is Terrible(TM),
I'd prefer going down the route of just polling what we need to for
advancing the IPMI message.

What about this? (somewhat untested)

commit d1ee4a5f23a622f2d609be48534131b3249879c2
Author: Stewart Smith <stewart at linux.ibm.com>
Date:   Thu Apr 4 15:47:29 2019 +0200

    ipmi: ensure forward progress on ipmi_queue_msg_sync()
    
    BT responses are handled using a timer doing the polling. To hope to
    get an answer to an IPMI synchronous message, the timer needs to run.
    
    We can't just check all timers though as there may be a timer that
    wants a lock that's held by a code path calling ipmi_queue_msg_sync(),
    and if we did enforce that as a requirement, it's a pretty subtle
    API that is asking to be broken.
    
    So, if we just run a poll function to crank anything that the IPMI
    backend needs, then we should be fine.
    
    This issue shows up very quickly under QEMU when loading the first
    flash resource with the IPMI HIOMAP backend.
    
    Reported-by: Cédric Le Goater <clg at kaod.org>
    Signed-off-by: Stewart Smith <stewart at linux.ibm.com>

diff --git a/core/ipmi.c b/core/ipmi.c
index 2bf3f4dabe19..9cf5aa626b02 100644
--- a/core/ipmi.c
+++ b/core/ipmi.c
@@ -182,8 +182,18 @@ void ipmi_queue_msg_sync(struct ipmi_msg *msg)
 	ipmi_queue_msg_head(msg);
 	unlock(&sync_lock);
 
-	while (sync_msg == msg)
+	/*
+	 * BT response handling relies on a timer. We can't just run all
+	 * timers because we may have been called with a lock that a timer
+	 * wants, and they're generally not written to cope with that.
+	 * So, just run whatever the IPMI backend needs to make forward
+	 * progress.
+	 */
+	while (sync_msg == msg) {
+		if (msg->backend->poll)
+			msg->backend->poll();
 		time_wait_ms(10);
+	}
 }
 
 static void ipmi_read_event_complete(struct ipmi_msg *msg)
diff --git a/hw/bt.c b/hw/bt.c
index 9febe8e5ce08..a0ff0db4c479 100644
--- a/hw/bt.c
+++ b/hw/bt.c
@@ -526,6 +526,11 @@ static void bt_poll(struct timer *t __unused, void *data __unused,
 		       bt.irq_ok ? TIMER_POLL : msecs_to_tb(BT_DEFAULT_POLL_MS));
 }
 
+static void bt_ipmi_poll(void)
+{
+	bt_poll(NULL, NULL, mftb());
+}
+
 static void bt_add_msg(struct bt_msg *bt_msg)
 {
 	bt_msg->tb = 0;
@@ -647,6 +652,7 @@ static struct ipmi_backend bt_backend = {
 	.queue_msg_head = bt_add_ipmi_msg_head,
 	.dequeue_msg = bt_del_ipmi_msg,
 	.disable_retry = bt_disable_ipmi_msg_retry,
+	.poll = bt_ipmi_poll,
 };
 
 static struct lpc_client bt_lpc_client = {
diff --git a/hw/fsp/fsp-ipmi.c b/hw/fsp/fsp-ipmi.c
index d262cee6591d..8c65e6c77f88 100644
--- a/hw/fsp/fsp-ipmi.c
+++ b/hw/fsp/fsp-ipmi.c
@@ -254,6 +254,8 @@ static struct ipmi_backend fsp_ipmi_backend = {
 	.queue_msg	= fsp_ipmi_queue_msg,
 	.queue_msg_head	= fsp_ipmi_queue_msg_head,
 	.dequeue_msg	= fsp_ipmi_dequeue_msg,
+	/* FIXME if ever use ipmi_queue_msg_sync on FSP */
+	.poll           = NULL,
 };
 
 static bool fsp_ipmi_send_response(uint32_t cmd)
diff --git a/include/ipmi.h b/include/ipmi.h
index 4999bb5a3f4c..ea5a0a971daf 100644
--- a/include/ipmi.h
+++ b/include/ipmi.h
@@ -182,6 +182,15 @@ struct ipmi_backend {
 	int (*queue_msg_head)(struct ipmi_msg *);
 	int (*dequeue_msg)(struct ipmi_msg *);
 	void (*disable_retry)(struct ipmi_msg *);
+	/*
+	 * When processing a synchronous IPMI message, pollers may not run, and
+	 * neither may timers (as the synchronous IPMI message may be being
+	 * done with locks held, which a timer may then try to also take).
+	 *
+	 * So, ensure we have a way to drive any state machines that an IPMI
+	 * backend may neeed to crank to ensure forward progress.
+	 */
+	void (*poll)(void);
 };
 
 extern struct ipmi_backend *ipmi_backend;



-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list