[Skiboot] [PATCH] ipmi: call check_timers() while waiting for synchronous messages to complete

Andrew Jeffery andrew at aj.id.au
Thu May 2 13:30:30 AEST 2019



On Wed, 1 May 2019, at 16:29, Stewart Smith wrote:
> Cédric Le Goater <clg at kaod.org> writes:
> > BT responses are handled using a timer doing the polling. To hope to
> > get an answer to an IPMI synchronous message, the timer needs to run.
> >
> > This issue shows up very quickly under QEMU when loading the first
> > flash resource with the IPMI HIOMAP backend.
> >
> > Adding a timeout would also help in reporting errors instead of
> > looping indefinitely waiting for a response.
> >
> > Signed-off-by: Cédric Le Goater <clg at kaod.org>
> > ---
> >  core/ipmi.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/core/ipmi.c b/core/ipmi.c
> > index 2bf3f4dabe19..78b410fd1aea 100644
> > --- a/core/ipmi.c
> > +++ b/core/ipmi.c
> > @@ -182,8 +182,16 @@ void ipmi_queue_msg_sync(struct ipmi_msg *msg)
> >  	ipmi_queue_msg_head(msg);
> >  	unlock(&sync_lock);
> >
> > -	while (sync_msg == msg)
> > +	/*
> > +	 * BT response handling relies on a timer. Run timers once in
> > +	 * a while.
> > +	 *
> > +	 * TODO (clg): implement a timeout for IPMI synchronous messages
> > +	 */
> > +	while (sync_msg == msg) {
> > +		check_timers(0);
> >  		time_wait_ms(10);
> > +	}
> >  }
> 
> So, I guess the concern here is if we can ever get into a locking
> problem where we have ipmi_queue_msg_sync() called with a lock held that
> a timer may need to take. Considering that pretty much all the time that
> ipmi_queue_msg_sync() is called it's because Something Is Terrible(TM),
> I'd prefer going down the route of just polling what we need to for
> advancing the IPMI message.
> 
> What about this? (somewhat untested)
> 
> commit d1ee4a5f23a622f2d609be48534131b3249879c2
> Author: Stewart Smith <stewart at linux.ibm.com>
> Date:   Thu Apr 4 15:47:29 2019 +0200
> 
>     ipmi: ensure forward progress on ipmi_queue_msg_sync()
>     
>     BT responses are handled using a timer doing the polling. To hope to
>     get an answer to an IPMI synchronous message, the timer needs to run.
>     
>     We can't just check all timers though as there may be a timer that
>     wants a lock that's held by a code path calling ipmi_queue_msg_sync(),
>     and if we did enforce that as a requirement, it's a pretty subtle
>     API that is asking to be broken.
>     
>     So, if we just run a poll function to crank anything that the IPMI
>     backend needs, then we should be fine.
>     
>     This issue shows up very quickly under QEMU when loading the first
>     flash resource with the IPMI HIOMAP backend.
>     
>     Reported-by: Cédric Le Goater <clg at kaod.org>
>     Signed-off-by: Stewart Smith <stewart at linux.ibm.com>
> 
> diff --git a/core/ipmi.c b/core/ipmi.c
> index 2bf3f4dabe19..9cf5aa626b02 100644
> --- a/core/ipmi.c
> +++ b/core/ipmi.c
> @@ -182,8 +182,18 @@ void ipmi_queue_msg_sync(struct ipmi_msg *msg)
>  	ipmi_queue_msg_head(msg);
>  	unlock(&sync_lock);
>  
> -	while (sync_msg == msg)
> +	/*
> +	 * BT response handling relies on a timer. We can't just run all
> +	 * timers because we may have been called with a lock that a timer
> +	 * wants, and they're generally not written to cope with that.
> +	 * So, just run whatever the IPMI backend needs to make forward
> +	 * progress.
> +	 */
> +	while (sync_msg == msg) {
> +		if (msg->backend->poll)
> +			msg->backend->poll();
>  		time_wait_ms(10);
> +	}
>  }
>  
>  static void ipmi_read_event_complete(struct ipmi_msg *msg)
> diff --git a/hw/bt.c b/hw/bt.c
> index 9febe8e5ce08..a0ff0db4c479 100644
> --- a/hw/bt.c
> +++ b/hw/bt.c
> @@ -526,6 +526,11 @@ static void bt_poll(struct timer *t __unused, void 
> *data __unused,
>  		       bt.irq_ok ? TIMER_POLL : msecs_to_tb(BT_DEFAULT_POLL_MS));
>  }
>  
> +static void bt_ipmi_poll(void)
> +{
> +	bt_poll(NULL, NULL, mftb());
> +}
> +
>  static void bt_add_msg(struct bt_msg *bt_msg)
>  {
>  	bt_msg->tb = 0;
> @@ -647,6 +652,7 @@ static struct ipmi_backend bt_backend = {
>  	.queue_msg_head = bt_add_ipmi_msg_head,
>  	.dequeue_msg = bt_del_ipmi_msg,
>  	.disable_retry = bt_disable_ipmi_msg_retry,
> +	.poll = bt_ipmi_poll,
>  };
>  
>  static struct lpc_client bt_lpc_client = {
> diff --git a/hw/fsp/fsp-ipmi.c b/hw/fsp/fsp-ipmi.c
> index d262cee6591d..8c65e6c77f88 100644
> --- a/hw/fsp/fsp-ipmi.c
> +++ b/hw/fsp/fsp-ipmi.c
> @@ -254,6 +254,8 @@ static struct ipmi_backend fsp_ipmi_backend = {
>  	.queue_msg	= fsp_ipmi_queue_msg,
>  	.queue_msg_head	= fsp_ipmi_queue_msg_head,
>  	.dequeue_msg	= fsp_ipmi_dequeue_msg,
> +	/* FIXME if ever use ipmi_queue_msg_sync on FSP */
> +	.poll           = NULL,
>  };
>  
>  static bool fsp_ipmi_send_response(uint32_t cmd)
> diff --git a/include/ipmi.h b/include/ipmi.h
> index 4999bb5a3f4c..ea5a0a971daf 100644
> --- a/include/ipmi.h
> +++ b/include/ipmi.h
> @@ -182,6 +182,15 @@ struct ipmi_backend {
>  	int (*queue_msg_head)(struct ipmi_msg *);
>  	int (*dequeue_msg)(struct ipmi_msg *);
>  	void (*disable_retry)(struct ipmi_msg *);
> +	/*
> +	 * When processing a synchronous IPMI message, pollers may not run, and
> +	 * neither may timers (as the synchronous IPMI message may be being
> +	 * done with locks held, which a timer may then try to also take).
> +	 *
> +	 * So, ensure we have a way to drive any state machines that an IPMI
> +	 * backend may neeed to crank to ensure forward progress.
> +	 */
> +	void (*poll)(void);
>  };
>  
>  extern struct ipmi_backend *ipmi_backend;
> 

+1 for the explicit nature of this approach.

> 
> 
> -- 
> Stewart Smith
> OPAL Architect, IBM.
> 
>


More information about the Skiboot mailing list