[PATCH V4 7/9] cpuidle/powernv: Add "Fast-Sleep" CPU idle state

Thomas Gleixner tglx at linutronix.de
Sat Nov 30 01:39:37 EST 2013


On Fri, 29 Nov 2013, Preeti U Murthy wrote:
> +static enum hrtimer_restart handle_broadcast(struct hrtimer *hrtimer)
> +{
> +	struct clock_event_device *bc_evt = &bc_timer;
> +	ktime_t interval, next_bc_tick, now;
> +
> +	now = ktime_get();
> +
> +	if (!restart_broadcast(bc_evt))
> +		return HRTIMER_NORESTART;
> +
> +	interval = ktime_sub(bc_evt->next_event, now);
> +	next_bc_tick = get_next_bc_tick();

So you're seriously using a hrtimer to poll in HZ frequency for
updates of bc->next_event?

To be honest, this design sucks.

First of all, why is this a PPC specific feature? There are probably
other architectures which could make use of this. So this should be
implemented in the core code to begin with.

And a lot of the things you need for this are already available in the
core in one form or the other.

For a start you can stick the broadcast hrtimer to the cpu which does
the timekeeping. The handover in the hotplug case is handled there as
well as is the handover for the NOHZ case.

This needs to be extended for this hrtimer broadcast thingy to work,
but it shouldn't be that hard to do so.

Now for the polling. That's a complete trainwreck.

This can be solved via the broadcast IPI as well. When a CPU which
goes down into deep idle sets the broadcast to expire earlier than the
active value it can denote that and send the timer broadcast IPI over
to the CPU which has the honour of dealing with this.

This supports HIGHRES and NO_HZ if done right, without polling at
all. So you can even let the last CPU which handles the broadcast
hrtimer go for a long sleep, just not in the deepest idle state.

Thanks,

	tglx


More information about the Linuxppc-dev mailing list