[Skiboot] [PATCH] fix lock error when BT IRQ preempt BT timer

Vasant Hegde hegdevasant at linux.vnet.ibm.com
Thu Jan 7 22:36:04 AEDT 2021


On 1/6/21 2:03 PM, lixg wrote:
> BT IRQ may preempt BT timer if BMC response host when bt msg timeout.
> When BT IRQ preempt BT timer, the infight_bt_msg did not protected by bt.lock very well.
> 
> And we will see the following log:
> [29006114.163785853,3] BT: seq 0x81 netfn 0x0a cmd 0x23: Timeout sending message
> [29006114.288029290,3] BT: seq 0x81 netfn 0x0b cmd 0x23: Timeout sending message
> [29006114.288917798,3] IPMI: Incorrect netfn 0x0b in response
> 
> It may cause 'CPU Hardlock UP', 'memory refree', 'kernel crash' or something else...
> 
> Signed-off-by: lixg <867314078 at qq.com>
> ---
>   hw/bt.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/bt.c b/hw/bt.c
> index cf967f89..24e6ef7f 100644
> --- a/hw/bt.c
> +++ b/hw/bt.c
> @@ -111,7 +111,7 @@ struct bt {
>   };
> 
>   static struct bt bt;
> -static struct bt_msg *inflight_bt_msg; /* Holds in flight message */
> +static struct bt_msg * volatile inflight_bt_msg; /* Holds in flight message */

Why do we need volatile here?

> 
>   static int ipmi_seq;
> 
> @@ -211,6 +211,11 @@ static void bt_msg_del(struct bt_msg *bt_msg)
>   {
>   	list_del(&bt_msg->link);
>   	bt.queue_len--;
> +
> +	/* once inflight_bt_msg out of list, it should be emptyed */
> +	if (bt_msg == inflight_bt_msg)
> +		inflight_bt_msg = NULL;
> +
>   	unlock(&bt.lock);
>   	ipmi_cmd_done(bt_msg->ipmi_msg.cmd,
>   		      IPMI_NETFN_RETURN_CODE(bt_msg->ipmi_msg.netfn),
> @@ -394,7 +399,7 @@ static void bt_expire_old_msg(uint64_t tb)
>   			bt_msg_del(bt_msg);
> 
>   			/* Ready to send next message */
> -			inflight_bt_msg = NULL;
> +			//inflight_bt_msg = NULL;

We don't want to keep commented code. Please remove this.


-Vasant


More information about the Skiboot mailing list