[RESEND PATCH V5 0/8] cpuidle/ppc: Enable deep idle states on PowerNV

Preeti U Murthy preeti at linux.vnet.ibm.com
Wed Jan 22 18:07:51 EST 2014

On PowerPC, when CPUs enter certain deep idle states, the local timers stop
and the time base could go out of sync with the rest of the cores in the system.

This patchset adds support to wake up CPUs in such idle states by
broadcasting IPIs to them at their next timer events using the tick broadcast
framework in the Linux kernel. We refer to these IPIs as the tick
broadcast IPIs in this patchset.

However the tick broadcast framework as it exists today makes use of an external
clock device to wakeup CPUs in such idle states. But not all implementations of
PowerPC provides such an external clock device.

Hence Patch[6/8]:
[time/cpuidle: Support in tick broadcast framework for archs without external
clock device] adds support in the tick broadcast framework for such
use cases by queuing a hrtimer on one of the CPUs which is meant to handle the wakeup
of CPUs in deep idle states.
This patch was posted separately at: https://lkml.org/lkml/2013/12/12/687.

Patches 1-3 adds support in powerpc to hook onto the tick broadcast framework.

The patchset also includes support for resyncing of time base with the rest of the
cores in the system and context management for fast sleep. PATCH[4/8] and
PATCH[5/8] address these issues.

With the required support for deep idle states thus in place, the
patchset adds "Fast-Sleep" idle state into cpuidle (Patches 7 and 8). "Fast-Sleep"
is a deep idle state on Power8 in which the above mentioned challenges
exist. Fast-Sleep can yield us significantly more power
savings than the idle states that we have in cpuidle so far.

This patchset is based on Ben's ppc next branch at commit fac515db45207718
[Merge remote-tracking branch 'scott/next' into next],  and the
cpuidle driver for powernv posted by Deepthi Dharwar:
https://lkml.org/lkml/2014/1/14/172. The same patchset minus the resolving of
merge conflicts with Ben's ppc next branch had been posted earlier
at http://lkml.org/lkml/2014/1/15/70. This Repost resolves these merge
conflicts with Ben's ppc next branch. Hence the Repost. Besides the earlier
post was based and tested on the mainline commit that was quite old.

However the patchset posted earlier at http://lkml.org/lkml/2014/1/15/70
along wiith Deepthi's patches on cpuidle driver for
powernv applies cleanly on the mainline kernel at commit: 85ce70fdf48aa290b484531
dated Jan 16 2014 and has been tested on the same at the time of this Repost.

Changes in V5: The primary change in this version is in Patch[6/8].
As per the discussions in V4 posting of this patchset, it was decided to
refine handling the wakeup of CPUs in fast-sleep by doing the following:

1. In V4, a polling mechanism was used by the CPU handling broadcast to
find out the time of next wakeup of the CPUs in deep idle states. V5 avoids
polling by a way described under PATCH[6/8] in this patchset.

2. The mechanism of broadcast handling of CPUs in deep idle in the absence of an
external wakeup device should be generic and not arch specific code. Hence in this
version this functionality has been integrated into the tick broadcast framework in
the kernel unlike before where it was handled in powerpc specific code.

3. It was suggested that the "broadcast cpu" can be the time keeping cpu
itself. However this has challenges of its own:

 a. The time keeping cpu need not exist when all cpus are idle. Hence there
are phases in time when time keeping cpu is absent. But for the use case that
this patchset is trying to address we rely on the presence of a broadcast cpu
all the time.

 b. The nomination and un-assignment of the time keeping cpu is not protected
by a lock today and need not be as well since such is its use case in the
kernel. However we would need locks if we double up the time keeping cpu as the
broadcast cpu.

Hence the broadcast cpu is independent of the time-keeping cpu. However PATCH[6/8]
proposes a simpler solution to pick a broadcast cpu in this version.

Changes in V4: https://lkml.org/lkml/2013/11/29/97

1. Add Fast Sleep CPU idle state on PowerNV.

2. Add the required context management for Fast Sleep and the call to OPAL
to synchronize time base after wakeup from fast sleep.

4. Add parsing of CPU idle states from the device tree to populate the
state table.

5. Rename ambiguous functions in the code around waking up of CPUs from fast

6. Fixed a bug in re-programming of the hrtimer that is queued to wakeup the
CPUs in fast sleep and modified Changelogs.

7. Added the ARCH_HAS_TICK_BROADCAST option. This signifies that we have a
arch specific function to perform broadcast.

Changes in V3:

1. Fix the way in which a broadcast ipi is handled on the idling cpus. Timer
handling on a broadcast ipi is being done now without missing out any timer
stats generation.

2. Fix a bug in the programming of the hrtimer meant to do broadcast. Program
it to trigger at the earlier of a "broadcast period", and the next wakeup
event. By introducing the "broadcast period" as the maximum period after
which the broadcast hrtimer can fire, we ensure that we do not miss
wakeups in corner cases.

3. On hotplug of a broadcast cpu, trigger the hrtimer meant to do broadcast
to fire immediately on the new broadcast cpu. This will ensure we do not miss
doing a broadcast pending in the nearest future.

4. Change the type of allocation from GFP_KERNEL to GFP_NOWAIT while
initializing bc_hrtimer since we are in an atomic context and cannot sleep.

5. Use the broadcast ipi to wakeup the newly nominated broadcast cpu on
hotplug of the old instead of smp_call_function_single(). This is because we
are interrupt disabled at this point and should not be using
smp_call_function_single or its children in this context to send an ipi.

6. Move GENERIC_CLOCKEVENTS_BROADCAST to arch/powerpc/Kconfig.

7. Fix coding style issues.

Changes in V2: https://lkml.org/lkml/2013/8/14/239

1. Dynamically pick a broadcast CPU, instead of having a dedicated one.
2. Remove the constraint of having to disable tickless idle on the broadcast
CPU by queueing a hrtimer dedicated to do broadcast.

V1 posting: https://lkml.org/lkml/2013/7/25/740.

1. Added the infrastructure to wakeup CPUs in deep idle states in which the
local timers stop.


Preeti U Murthy (5):
      cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
      powermgt: Add OPAL call to resync timebase on wakeup
      time/cpuidle: Support in tick broadcast framework in the absence of external clock device
      cpuidle/powernv: Add "Fast-Sleep" CPU idle state
      cpuidle/powernv: Parse device tree to setup idle states

Srivatsa S. Bhat (2):
      powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
      powerpc: Implement tick broadcast IPI as a fixed IPI message

Vaidyanathan Srinivasan (1):
      powernv/cpuidle: Add context management for Fast Sleep

 arch/powerpc/Kconfig                           |    2 
 arch/powerpc/include/asm/opal.h                |    2 
 arch/powerpc/include/asm/processor.h           |    1 
 arch/powerpc/include/asm/smp.h                 |    2 
 arch/powerpc/include/asm/time.h                |    1 
 arch/powerpc/kernel/exceptions-64s.S           |   10 +
 arch/powerpc/kernel/idle_power7.S              |   90 +++++++++--
 arch/powerpc/kernel/smp.c                      |   23 ++-
 arch/powerpc/kernel/time.c                     |   88 +++++++----
 arch/powerpc/platforms/cell/interrupt.c        |    2 
 arch/powerpc/platforms/powernv/opal-wrappers.S |    1 
 arch/powerpc/platforms/ps3/smp.c               |    2 
 drivers/cpuidle/cpuidle-powernv.c              |  109 ++++++++++++--
 include/linux/clockchips.h                     |    4 -
 kernel/time/clockevents.c                      |    9 +
 kernel/time/tick-broadcast.c                   |  192 ++++++++++++++++++++++--
 kernel/time/tick-internal.h                    |    8 +
 17 files changed, 442 insertions(+), 104 deletions(-)

More information about the Linuxppc-dev mailing list