[Skiboot] [PATCH RESEND v6 2/2] SBE: Add timer support

Mon Apr 16 19:56:58 AEST 2018

On 04/13/2018 03:05 PM, Benjamin Herrenschmidt wrote:
> On Tue, 2018-04-03 at 17:27 +0530, Vasant Hegde wrote:
>> SBE on P9 provides one shot programmable timer facility. We can use this
>> to implement OPAL timers and hence limit the reliance on the Linux
>> heartbeat (similar to HW timer facility provided by SLW on P8).

.../...

>>   
>> +	/* Timer expired */
>> +	if (data & SBE_HOST_TIMER_EXPIRY) {
>> +		if (!p9_sbe_timer_got_ack()) {
>> +			data &= ~(SBE_HOST_TIMER_EXPIRY);
>> +		} else {
>> +			sbe_timer_in_progress = false;
>> +			timer_interrupt = true;
>> +		}
>> +	}
> 
> So not sure I understand the above. It's the same bit in the doorbell
> that is the "ack" and the timer expiry ? That means you basically need
> "2" interrupts, one for the ack, one for the expiry, when you set the
> timer ?

Its same doorbell register, but different bits. In some cases there will be 
delay in processing interrupt and we endup getting both bits set simultaneously. 
  So I added get_ack() to make sure we process ack before processing timer 
interrupt. In such cases we endup having two interrupt to process timer.

I'm changing from sync_msg() to async_msg(). With that we don't need about stuff 
anyway.

> 
> That's racy no ? IE what if you poll a bit slowly bcs you're busy
> elsewhere and get both ack and timer interrupt ? They coalesce into a
> single event and your timer is gone, no ?
> 
>>   	/* SBE came back from reset */
>>   	if (data & SBE_HOST_RESET) {
>>   		prlog(PR_NOTICE, "Back from reset [chip id = %x]\n", chip_id);
>> @@ -547,6 +586,9 @@ clr_interrupt:
>>   
>>   	p9_sbe_process_queue(sbe);
>>   	unlock(&sbe->lock);
>> +	/* Drop lock and call timers */
>> +	if (timer_interrupt)
>> +		check_timers(true);
> 
> If for some reason the timer fired early, are you confident it will be
> re-armed ?

Our understanding is SBE will never get back to us early. They will always come 
after timeout.
So we are good here.

> 
>>   }
>>   
>>   static void p9_sbe_timeout_poll(void *user_data __unused)
>> @@ -598,6 +640,120 @@ static void p9_sbe_timeout_poll(void *user_data __unused)
>>   		p9_sbe_process_queue(sbe);
>>   		unlock(&sbe->lock);
>>   	}
>> +
>> +	/*
>> +	 * Check if the timer is working. If at least 1ms elapsed since
>> +	 * last scheduled timer expiry.
>> +	 */
>> +	if (sbe_has_timer && sbe_timer_in_progress) {
>> +		chip = get_chip(sbe_default_chip_id);
>> +		sbe = chip->sbe;
>> +		if (tb_compare(mftb(), sbe_last_gen_stamp + msecs_to_tb(1))
>> +		    != TB_AAFTERB)
>> +			return;
> 
> So I would have put that inside the above lock section, basically, if
> the chip is the default chip, add the timeout logic (and move it to a
> separate function).

Yeah. I will have separate p9_sbe_timer_poll() to timer timeout checking. Also I 
will move above block  inside lock.

> 
>> +		/*
>> +		 * In some cases there will be a delay in calling OPAL interrupt
>> +		 * handler routine (opal_handle_interrupt). In such cases its
>> +		 * possible that SBE has responded, but OPAL didn't act on that.
>> +		 * Hence check for SBE response before disabling timer.
>> +		 */
>> +		p9_sbe_interrupt(sbe_default_chip_id);
> 
> 
> Otherwise you call the above twice in the same function and the
> function becomes a mess.
> 
> Also you keep taking & releasing the lock, it's inefficient. Follow my
> advice in the other patch, have the lock taken once in a per-chip thing
> and everything else underneath is done with that one lock instance.

Agree. Moved to separate function.

> 
>> +
>> +		lock(&sbe->lock);
>> +		if (!sbe_timer_in_progress) {
>> +			unlock(&sbe->lock);
>> +			return;
>> +		}
>> +
>> +		if (tb_compare(mftb(), sbe_last_gen_stamp + msecs_to_tb(1))
>> +		    != TB_AAFTERB) {
>> +			unlock(&sbe->lock);
>> +			return;
>> +		}
>> +
>> +		prlog(PR_ERR, "Timer stuck, falling back to OPAL pollers.\n");
>> +		prlog(PR_ERR, "You will likely have slower I2C and may have "
>> +		      "experienced increased jitter.\n");
>> +		p9_sbe_reg_dump(sbe_default_chip_id);
>> +		sbe_has_timer = false;
>> +		sbe_timer_in_progress = false;
>> +		unlock(&sbe->lock);
>> +	}
>> +}
>> +
>> +/*
>> + * This is called with the timer lock held, so there is no
>> + * issue with re-entrancy or concurrence
>> + */
>> +void p9_sbe_update_timer_expiry(uint64_t new_target)
>> +{
>> +	int rc;
>> +	u32 tick_us = SBE_TIMER_DEFAULT_US;
>> +	u64 tb_cnt, now = mftb();
>> +
>> +	if (!sbe_has_timer || new_target == sbe_timer_target)
>> +		return;
>> +
>> +	sbe_timer_target = new_target;
>> +
>> +	/* Modify timer */
>> +	if (sbe_timer_in_progress) {
>> +		if (sbe_timer_target < now)
>> +			return;
>> +
>> +		/* remaining time <= sbe_timer_def_tb */
>> +		if ((sbe_last_gen_stamp - now) <= sbe_timer_def_tb)
>> +			return;
>> +
>> +		if (sbe_timer_target > sbe_last_gen_stamp)
>> +			return;
>> +
>> +		sbe_timer_in_progress = false;
> 
> So aren't you potentially re-using or modifying a message that is in
> the middle of being sent ?
> 
> IE a previous call to p9_sbe_update_timer_expiry has queued the
> message, it's now in the process of being sent, while this gets called,
> cleares sbe_timer_in_progress, ad move on with hacking the "active"
> message ... that isn't right.
> 
> You need to flag that a new target is needed and prep a new message
> from the completion of the previous one.

Done.

I'm moving from sync msg to async message. Also I've "has_new_target" note down 
new timer value while we are processing current timer.

Thanks
-Vasant