[Skiboot] [PATCH v7 2/2] Add SBE driver support

Benjamin Herrenschmidt benh at kernel.crashing.org
Tue Apr 17 16:40:55 AEST 2018


On Tue, 2018-04-17 at 11:34 +0530, Vasant Hegde wrote:
> On 04/17/2018 09:38 AM, Benjamin Herrenschmidt wrote:
> > On Mon, 2018-04-16 at 17:17 +0530, Vasant Hegde wrote:
> > > 
> > > +static inline bool p9_sbe_mbox_busy(struct p9_sbe *sbe)
> > > +{
> > > +	return sbe->state != sbe_mbox_idle ? true : false;
> > > +}
> > 
> > What's wrong with just return sbe->state != sbe_mbox_idle ?
> > 
> 
> That should work fine. Will fix.
> 
> > > +static inline void p9_sbe_mbox_update_state(struct p9_sbe *sbe,
> > > +					    enum p9_sbe_mbox_state state)
> > > +{
> > > +	sbe->state = state;
> > > +}
> > 
> > What's the point of the above wrapper ? I don't find it makes
> > anything clearer...
> 
> Ok. Will remove.
> 
> 
> .../...
> 
> > > +}
> > > +
> > > +/* WARNING: This will drop sbe->lock */
> > > +static void p9_sbe_send_complete(struct p9_sbe *sbe)
> > > +{
> > > +	struct p9_sbe_msg *msg;
> > > +
> > > +	if (list_empty(&sbe->msg_list))
> > >   		return;
> > >   
> > > +	msg = list_top(&sbe->msg_list, struct p9_sbe_msg, link);
> > > +	/* Need response */
> > > +	if (msg->response) {
> > > +		msg->state = sbe_msg_wresp;
> > > +	} else {
> > > +		msg->state = sbe_msg_done;
> > > +		p9_sbe_mbox_update_state(sbe, sbe_mbox_idle);
> > > +		p9_sbe_msg_complete(sbe, msg);
> > > +	}
> > > +}
> > 
> > Subtle race: if somebody is polling on msg->state waiting for it to be
> > done without holding the SBE lock (which could happen), then there's a
> > risk that the message gets freed right after msg->state is updated.
> > 
> 
> Oh yes. If someone is polling without lock yes we have trouble.
> Will fix.
> 
> > msg->state should be set to "done" *inside* p9_sbe_msg_complete, and
> > the latter should make sure it snapshots msg->complete before it
> > changes msg->state (with a nice fat sync in between). It also needs to
> > take it out of the list before updating the state as well.
> 
> Yeah. I can do something like bleow in p9_sbe_msg_complete()
> 
> static void p9_sbe_msg_complete(struct p9_sbe *sbe, struct p9_sbe_msg *msg)
> {
> 	void (*comp)(struct p9_sbe_msg *msg);
> 
> 	comp = msg->complete;
> 	list_del(&msg);

	sync(); <---- important

> 	msg->state = sbe_msg_done;
> 	if (comp) {
> 		unlock(&sbe->lock);
> 		comp(msg);
> 		lock(&sbe->lock);
> 	}
> }
> 
> But now we always endup setting msg->state = done. How do we pass error state to 
> caller?

Pass it as an argument to msg_complete ?

> May be allocate resp structure for all messages?
> 
> > 
> > > +/* WARNING: This will drop sbe->lock */
> > > +static void p9_sbe_process_queue(struct p9_sbe *sbe)
> > > +{
> > > +	int rc, retry_cnt = 0;
> > > +	struct p9_sbe_msg *msg = NULL;
> > > +
> > > +	if (p9_sbe_mbox_busy(sbe))
> > > +		return;
> > > +
> > > +	while (!list_empty(&sbe->msg_list)) {
> > > +		msg = list_top(&sbe->msg_list, struct p9_sbe_msg, link);
> > > +		/* Send message */
> > > +		rc = p9_sbe_msg_send(sbe->chip_id, msg);
> > > +		if (rc == OPAL_SUCCESS)
> > > +			break;
> > > +
> > > +		prlog(PR_ERR, "Failed to send message to SBE [chip id = %x]\n",
> > > +		      sbe->chip_id);
> > > +		if (msg->resp) {
> > > +			p9_sbe_set_primary_rc(msg->resp,
> > > +					      SBE_STATUS_PRI_GENERIC_ERR);
> > > +		}
> > > +		msg->state = sbe_msg_error;
> > > +		p9_sbe_msg_complete(sbe, msg);
> > > +
> > > +		/*
> > > +		 * Repeatedly failed to send message to SBE. Lets stop
> > > +		 * sending message.
> > > +		 */
> > > +		if (retry_cnt++ >= 3) {
> > > +			prlog(PR_ERR, "Temporarily stopped sending "
> > > +			      "message to SBE\n");
> > > +			return;
> > > +		}
> > > +	}
> > > +
> > > +	if (list_empty(&sbe->msg_list))
> > > +		return;
> > > +
> > > +	prlog(PR_TRACE, "Message queued [chip id = 0x%x]:\n\t Reg0 : %016llx"
> > > +	      "\n\t Reg1 : %016llx\n\t Reg2 : %016llx\n\t Reg3 : %016llx\n",
> > > +	      sbe->chip_id, msg->reg[0], msg->reg[1], msg->reg[2], msg->reg[3]);
> > > +
> > > +	msg->timeout = mftb() + msecs_to_tb(SBE_CMD_TIMEOUT_MAX);
> > > +	p9_sbe_mbox_update_state(sbe, sbe_mbox_send);
> > > +	msg->state = sbe_msg_sent;
> > 
> > I would personally have the 2 above inside p9_sbe_msg_send(), any
> > reason why not ?
> 
> I don't have strong reason for that. I just wanted sbe_msg_send to just send 
> message without dealing with message state etc (like I do in sbe_msg_recv). I 
> can move above two statement inside msg_send().

it's not that much better to have that stuff opencoded in the general
queue processing either ... not biggie tho.
> 
> 
> .../...
> 
> > > +
> > > +static void __p9_sbe_interrupt(struct p9_sbe *sbe)
> > > +{
> > > +	bool has_response = false;
> > > +	int rc;
> > > +	u64 data = 0, val;
> > > +	struct p9_sbe_msg *msg = NULL;
> > > +
> > >   	/* Read doorbell register */
> > > -	rc = xscom_read(chip_id, PSU_HOST_DOORBELL_REG_RW, &data);
> > > +	rc = xscom_read(sbe->chip_id, PSU_HOST_DOORBELL_REG_RW, &data);
> > >   	if (rc) {
> > >   		prlog(PR_ERR, "Failed to read SBE to Host doorbell register "
> > > -		      "[chip id = %x]\n", chip_id);
> > > -		goto clr_interrupt;
> > > +		      "[chip id = %x]\n", sbe->chip_id);
> > > +		p9_sbe_reg_dump(sbe->chip_id);
> > > +		return;
> > >   	}
> > >   
> > > -	/* SBE passtrhough command, call prd handler */
> > > -	if (data & SBE_HOST_PASSTHROUGH) {
> > > -		prd_sbe_passthrough(chip_id);
> > > +	/* Called from poller path, nothing to process */
> > > +	if (!data)
> > > +		return;
> > > +
> > > +	/* Read SBE response before clearing doorbell register */
> > > +	if (data & SBE_HOST_RESPONSE_WAITING) {
> > > +		if (!list_empty(&sbe->msg_list)) {
> > > +			msg = list_top(&sbe->msg_list, struct p9_sbe_msg, link);
> > > +			p9_sbe_handle_response(sbe->chip_id, msg);
> > > +			msg->state = sbe_msg_done;
> > > +			has_response = true;
> > > +		}
> > > +
> > > +		/* Reset SBE MBOX state */
> > > +		p9_sbe_mbox_update_state(sbe, sbe_mbox_idle);
> > >   	}
> > 
> > Why do you do that one before the doorbell bit clearing ?
> > 
> > Also you can hit other cases further down such as the "return"
> > in the xscom_write error case or the SBE_HOST_RESET which will
> > make you "lose" the content of "msg" and overwrite its state.
> 
> Yeah. May be I will have separate local "resp" message, copy response to that 
> and then if everything is fine (like writing xscom , handling reset), I will 
> call msg_complete() with this new resp (as in I will copy "resp" to msg->resp).

Why copy ? You don't need to copy anything ...

> > 
> > You may want to use a separate variable:
> > 
> > struct p9_sbe_msg *resp.
> > 
> 
> Yeah. Separate message is cleaner.

Separate variable, same message... or not. Up to you.
> 
> > Also you have another possible issue. If you send a message that takes
> > a response, and you get both the ack and the resp bits, you'll end
> > up doing the above, and *then* hit send_complete, is that what you
> > really want ?
> 
> Actually this is fine. I'm processing like below
> 	Holding lock
> 	process ACK -> if message expects response then send_complete just sets message 
> state
> 					and returns
> 	process response -> At this stage if we have response then it calls msg->complete()
> 
> 
> > 
> > You should also double check when you get the response bit that
> > the message at the head of the list is indeed expecting a response.
> 
> SBE message is serialized. Until we get response to first message we are not 
> sending second one. So top of the message is the one which is waiting for 
> response. Worst case we may
> get both ACK and response bit .. which we are handling.

You should still look at ACK first, before you look at response.
> > 
> > If you don't want to worry too much about the lock dropping, what
> > you can do is a loop, clearing only the bit you processed and reading
> > the doorbell again, though that's a bit slower.
> 
> I think current code is fine with above modification. We should be able to 
> handle with single xscom_read.

I'll review again when you do the changes and see if I can convince
myself that it is.

If you look at the FSP code, when I complete a message via
fsp_complete_send() in __fsp_poll() I do re-fetch the interrupt causes
because the lock drop means somebody else could have called poll in the
meantime.

That's probably the best way for you too.

Rather than clear all the bits, do something like

again:
	bits = read_bits();

	if (bit & bit_one) {
		clear_bit(one);
		do_actions_for_bit_1();
		if (completed_message)
			goto again;
	}

	if (bit & bit_two)
	etc...

> Thanks!
> -Vasant


More information about the Skiboot mailing list