[Skiboot] [RFC 1/1] SLW : Add new idle device-tree format

Oliver O'Halloran oohall at gmail.com
Mon Nov 18 12:05:26 AEDT 2019


On Fri, Aug 9, 2019 at 8:55 PM Abhishek Goel <huntbag at linux.vnet.ibm.com> wrote:
>
> Add stop states under ibm,idle-states in addition to the current array
> based device tree properties.
Hi Ahbishek,

Sorry, about the late response. I got bogged down on other things for
a whle, but I'm trying to clear out the patchwork backlog now.

> New device tree format adds a compatible flag which has version
> corresponding to every state, so that only kernel which has the capability
> to handle the version of stop state will enable it. Drawback of the array
> based dt node is that versioning of idle states is not possible.
>
> Older kernel will still see stop0 and stop0_lite in older format and we
> will deprecate it after some time.
>
> Consider a case that stop4 has a bug. We take the following steps to
> mitigate the problem.
>
> 1) Change compatible string for stop4 in OPAL to "stop4,v2" from
> "stop4,v1", i.e. basicallly bump up the previous version and ship the
> new firmware.
> 2) The kernel will ignore stop4 as it won't be able to recognize this
> new version. Kernel will also ignore all the deeper states because its
> possible that a cpu have requested for a deeper state but was never able
> to enter into it. But we will still have shallower states that are there
> before stop 4. This, thus prevents from completely disabling stop states.
>
> Linux kernel can now look at the version string and decide if it has the
> ability to handle that idle state. Henceforth, if kernel does not know
> about a version, it will skip that state and all the deeper state.

How does the kernel know if one state node deeper than another? What
I'm really wondering is whether we need a reg for the individual
states e.g.:

stop at 11 {
 reg= <11>
 compatible="stop11-v1"
 ...
}
stop at 4 {
 reg = <4>;
 compatible="stop4-v1";
 ...
}

> Once when the workaround are implemented into the kernel, we can bump up
> the known version in kernel for that state, so that support can be
> enabled once again in kernel.
>
> New Device-tree :
>
> Final output
>        power-mgt {
>             ...
>          ibm,enabled-stop-levels = <0xec000000>;
>          ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>;
>          ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>;
>          ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>;
>          ibm,cpu-idle-state-flags = <0x100000 0x101000>;
>          ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>;
>          ibm,idle-states {
>                      stop4 {
>                          flags = <0x207000>;
>                          compatible = "stop4,v1",
>                          psscr-mask = <0x0 0x3003ff>;
>                          latency-ns = <0x186a0>;
>                          residency-ns = <0x989680>;
>                          psscr = <0x0 0x300374>;
>                   };
>                     ...
>                     stop11 {
>                          ...
>                          compatible = "stop11,v1",
>                          ...
>                   };
>              };
>
> Signed-off-by: Abhishek Goel <huntbag at linux.vnet.ibm.com>
> ---
>  hw/slw.c           | 146 +++++++++++++++++++++++++++++++++++----------
>  include/opal-api.h |   7 +++
>  2 files changed, 123 insertions(+), 30 deletions(-)
>
> diff --git a/hw/slw.c b/hw/slw.c
> index c872b630..1c857d5a 100644
> --- a/hw/slw.c
> +++ b/hw/slw.c
> @@ -370,9 +370,15 @@ static bool idle_prepare_core(struct proc_chip *chip, struct cpu_thread *c)
>  }
>
>  /* Define device-tree fields */
> -#define MAX_NAME_LEN   16
> +#define MAX_NAME_LEN           16
> +#define MAX_VERSION_LEN                25
> +#define MAX_DESCRIPTION_LEN    60
>  struct cpu_idle_states {
>         char name[MAX_NAME_LEN];
> +       /* Adding compatibility version for evey stop state */
Pointless comment.

> +       char compat_version[MAX_VERSION_LEN];
> +       /* Adding description for every stop state */
Same here.

> +       char desc[MAX_DESCRIPTION_LEN];
>         u32 latency_ns;
>         u32 residency_ns;
>         /*
> @@ -381,7 +387,9 @@ struct cpu_idle_states {
>          */
>         u64 pm_ctrl_reg_val;
>         u64 pm_ctrl_reg_mask;
> -       u32 flags;
> +       u64 flags; /* changing flags from u32 to u64 */
and here

> +       /* older kernel with newer firmware will use these states */
> +       u32 legacy_state;
That comment should probably be saying something like "legacy states
go into the legacy STOP state arrays" or something. We try not to
assume too much about the host kernel.

>  };
>
>  static struct cpu_idle_states nap_only_cpu_idle_states[] = {
> @@ -461,13 +469,20 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>         {
>                 .name = "stop0_lite", /* Enter stop0 with no state loss */
>                 .latency_ns = 1000,
> +               .compat_version = "stop0_lite,v1",
> +               .desc = "thread level stop instruction dispatch",
> +               .legacy_state = 1,
>                 .residency_ns = 10000,
> +               /* flags will now also have bits to identify if a state is
> +                * of cpuidle or hotplug type
> +                */
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 0*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_FAST,
> +                      | 1*OPAL_PM_STOP_INST_FAST \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
>                                  | OPAL_PM_PSSCR_MTL(3) \
>                                  | OPAL_PM_PSSCR_TR(3),
> @@ -475,13 +490,17 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>         {
>                 .name = "stop0",
>                 .latency_ns = 2000,
> +               .compat_version = "stop0,v1",
> +               .desc = "low latency SMT switching",
>                 .residency_ns = 20000,
> +               .legacy_state = 1,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_FAST,
> +                      | 1*OPAL_PM_STOP_INST_FAST \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(0) \
>                                  | OPAL_PM_PSSCR_MTL(3) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -493,14 +512,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>
>         {
>                 .name = "stop1",
> +               .compat_version = "stop1,v1",
> +               .desc = "SMT switching with basic clock gating",
>                 .latency_ns = 5000,
>                 .residency_ns = 50000,
> +               .legacy_state = 0,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_FAST,
> +                      | 1*OPAL_PM_STOP_INST_FAST \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(1) \
>                                  | OPAL_PM_PSSCR_MTL(3) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -514,14 +537,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>
>         {
>                 .name = "stop2",
> +               .compat_version = "stop2,v1",
> +               .desc = "core level clock gating",
>                 .latency_ns = 10000,
>                 .residency_ns = 100000,
> +               .legacy_state = 0,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 0*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 0*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_FAST,
> +                      | 1*OPAL_PM_STOP_INST_FAST \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(2) \
>                                  | OPAL_PM_PSSCR_MTL(3) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -530,14 +557,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>                 .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
>         {
>                 .name = "stop4",
> +               .compat_version = "stop4,v1",
> +               .desc = "core level power gating",
>                 .latency_ns = 100000,
>                 .residency_ns = 10000000,
> +               .legacy_state = 0,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 1*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_DEEP,
> +                      | 1*OPAL_PM_STOP_INST_DEEP \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(4) \
>                                  | OPAL_PM_PSSCR_MTL(7) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -546,14 +577,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>                 .pm_ctrl_reg_mask = OPAL_PM_PSSCR_MASK },
>         {
>                 .name = "stop5",
> +               .compat_version = "stop5,v1",
> +               .desc = "core level power gating and turbo boost benefit",
>                 .latency_ns = 200000,
>                 .residency_ns = 20000000,
> +               .legacy_state = 0,
>                 .flags = 0*OPAL_PM_DEC_STOP \
>                        | 0*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 1*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_DEEP,
> +                      | 1*OPAL_PM_STOP_INST_DEEP \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(5) \
>                                  | OPAL_PM_PSSCR_MTL(7) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -563,14 +598,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>
>         {
>                 .name = "stop8",
> +               .compat_version = "stop8,v1",
> +               .desc = "core power gating and L2 clock gating",
>                 .latency_ns = 2000000,
>                 .residency_ns = 20000000,
> +               .legacy_state = 0,
>                 .flags = 1*OPAL_PM_DEC_STOP \
>                        | 1*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 1*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_DEEP,
> +                      | 1*OPAL_PM_STOP_INST_DEEP \
> +                      | 1*OPAL_PM_STOP_CPUIDLE,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(8) \
>                                  | OPAL_PM_PSSCR_MTL(11) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -580,14 +619,18 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>
>         {
>                 .name = "stop11",
> +               .compat_version = "stop11,v1",
> +               .desc = "quad level core and cache power gating",
>                 .latency_ns = 10000000,
>                 .residency_ns = 100000000,
> +               .legacy_state = 0,
>                 .flags = 1*OPAL_PM_DEC_STOP \
>                        | 1*OPAL_PM_TIMEBASE_STOP  \
>                        | 1*OPAL_PM_LOSE_USER_CONTEXT \
>                        | 1*OPAL_PM_LOSE_HYP_CONTEXT \
>                        | 1*OPAL_PM_LOSE_FULL_CONTEXT \
> -                      | 1*OPAL_PM_STOP_INST_DEEP,
> +                      | 1*OPAL_PM_STOP_INST_DEEP \
> +                      | 1*OPAL_PM_STOP_CPUHOTPLUG,
>                 .pm_ctrl_reg_val = OPAL_PM_PSSCR_RL(11) \
>                                  | OPAL_PM_PSSCR_MTL(11) \
>                                  | OPAL_PM_PSSCR_TR(3) \
> @@ -917,34 +960,77 @@ void add_cpu_idle_state_properties(void)
>                                 }
>                         }
>                 }

> +               if (states[i].legacy_state) {
> +                       prlog(PR_NOTICE, "SLW: Enabling: %s\n", states[i].name);
>
> -               prlog(PR_INFO, "SLW: Enabling: %s\n", states[i].name);
> +                       /*
> +                        * If a state is supported add each of its property
> +                        * to its corresponding property buffer.
> +                        */
> +                       strncpy(name_buf, states[i].name, MAX_NAME_LEN);
> +                       name_buf = name_buf + strlen(states[i].name) + 1;
>
> -               /*
> -                * If a state is supported add each of its property
> -                * to its corresponding property buffer.
> -                */
> -               strncpy(name_buf, states[i].name, MAX_NAME_LEN);
> -               name_buf = name_buf + strlen(states[i].name) + 1;
> +                       *latency_ns_buf = cpu_to_fdt32(states[i].latency_ns);
> +                       latency_ns_buf++;
>
> -               *latency_ns_buf = cpu_to_fdt32(states[i].latency_ns);
> -               latency_ns_buf++;
> +                       *residency_ns_buf = cpu_to_fdt32(states[i].residency_ns);
> +                       residency_ns_buf++;
>
> -               *residency_ns_buf = cpu_to_fdt32(states[i].residency_ns);
> -               residency_ns_buf++;
> +                       *flags_buf = cpu_to_fdt32(states[i].flags);
> +                       flags_buf++;
>
> -               *flags_buf = cpu_to_fdt32(states[i].flags);
> -               flags_buf++;
> +                       *pm_ctrl_reg_val_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
> +                       pm_ctrl_reg_val_buf++;
>
> -               *pm_ctrl_reg_val_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
> -               pm_ctrl_reg_val_buf++;
> +                       *pm_ctrl_reg_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
> +                       pm_ctrl_reg_mask_buf++;
>
> -               *pm_ctrl_reg_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
> -               pm_ctrl_reg_mask_buf++;
> +                       /* Increment buffer length trackers */
> +                       name_buf_len += strlen(states[i].name) + 1;
> +                       num_supported_idle_states++;
>
> -               /* Increment buffer length trackers */
> -               name_buf_len += strlen(states[i].name) + 1;
> -               num_supported_idle_states++;

> +               } else {
> +                       struct dt_node *parent_node, *dt_state_node;
> +                       char version_buf[MAX_VERSION_LEN];
> +                       u32 *lat_buf, *res_buf;
> +                       u64 *flg_buf, *psscr_buf, *psscr_mask_buf;
> +
> +                       lat_buf = malloc(sizeof(u32));
> +                       res_buf = malloc(sizeof(u32));
> +                       flg_buf = malloc(sizeof(u64));
> +                       psscr_buf = malloc(sizeof(u64));
> +                       psscr_mask_buf = malloc(sizeof(u64));
> +                       prlog(PR_NOTICE, "SLW: Enabling: %s in new format\n", states[i].name);
> +
> +                       parent_node = dt_new_check(power_mgt,  "ibm,idle-states");
> +                       if (!parent_node)
> +                               prlog(PR_ERR, "Error creating ibm,idle-states\n");
> +
> +                       dt_state_node = dt_new_check(parent_node, states[i].name);
> +                       if (!dt_state_node)
> +                               prlog(PR_ERR, "Error creating %s\n", states[i].name);
> +
> +                       strncpy(version_buf, states[i].compat_version,
> +                                       strlen(states[i].compat_version)+1);
> +                       dt_add_property_strings(dt_state_node, "compatible", version_buf);
> +
> +                       *lat_buf = cpu_to_fdt32(states[i].latency_ns);
> +                       dt_add_property(dt_state_node, "latency-ns", lat_buf, sizeof(u32));
> +                       *res_buf = cpu_to_fdt32(states[i].residency_ns);
> +                       dt_add_property(dt_state_node, "residency-ns", res_buf, sizeof(u32));
> +                       *flg_buf = cpu_to_fdt64(states[i].flags);
> +                       dt_add_property(dt_state_node, "flags", flg_buf, sizeof(u64));
> +                       *psscr_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_val);
> +                       dt_add_property(dt_state_node, "psscr", psscr_buf, sizeof(u64));
> +                       *psscr_mask_buf = cpu_to_fdt64(states[i].pm_ctrl_reg_mask);
> +                       dt_add_property(dt_state_node, "psscr-mask", psscr_mask_buf, sizeof(u64));
> +                       free(res_buf);
> +                       free(lat_buf);
> +                       free(flg_buf);
> +                       free(psscr_buf);
> +                       free(psscr_mask_buf);

All the malloc(), free() and manual endian swapping goes away if you
use dt_add_property_u64() and friends. The existing code uses
dt_add_property() directly because there isn't a better way to handle
dynamicly sized arrays, but you need to. Anyway, I doing this:

for_each_state() {
  if (states[i].legacy_state) {
     /* old blah */
  } else {
    /* new blah */
  }
}

Seems bad to me. We only want the legacy arrays to be used if the
kernel doesn't understand the new format, so the new format should
contain all the same information as the old one i.e.

for_each_state() {
  if (states[i].legacy_state) {
     /* old blah */
  }
}

for_each_state() {
   /* new blah*/
}

So if you understand the new format then you only need to parse the
new format and the old one can be ignored.

> +
> +               }
>
>         }
>
> diff --git a/include/opal-api.h b/include/opal-api.h
> index 0b0ae196..f54f3862 100644
> --- a/include/opal-api.h
> +++ b/include/opal-api.h
> @@ -292,6 +292,13 @@
>  #define OPAL_PM_STOP_INST_FAST         0x00100000
>  #define OPAL_PM_STOP_INST_DEEP         0x00200000
>
> +/*
> + * Flags for stop states to distinguish between cpuidle and
> + * cpuoffline type of states.
> + */
> +#define OPAL_PM_STOP_CPUIDLE           0x01000000
> +#define OPAL_PM_STOP_CPUHOTPLUG                0x02000000

Seems a bit arbitrary and I don't think it's firmware's job to decide
if a state is "cpu hotplug" or not. IIRC Raptor are interested in
using STOP11 to support a hibernate mode so there are uses for it
besides removing a CPU.

> +
>  #ifndef __ASSEMBLY__
>
>  #include <stdbool.h>
> --
> 2.17.1
>
> _______________________________________________
> Skiboot mailing list
> Skiboot at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot


More information about the Skiboot mailing list