[OpenPower-Firmware] SRESET and idle CPU

Artem Senichev a.senichev at yadro.com
Wed Jul 29 00:29:38 AEST 2020


As it turns out, the pdbg solution works fine, but it depends on the interval between 'stop' and 'sreset'.
When this pause is longer than 1 second, the chance of sreset being processed becomes significantly less.

I have the last question:
Do I need to stop all threads and send an sreset to each of them?
Or is it enough to stop/sreset just one thread?

I am asking because the openpower-proc-control implementation tries to stop them all and this leads to errors:
https://github.com/openbmc/openpower-proc-control/blob/master/nmi_interface.cpp#L43
When I send sreset to more than 1 thread, I get the "Unrecoverable nested System Reset" error and the linux kernel hangs.
When I stop all threads and send sreset to any one of them, I get the "CPU Hard LOCKUP" error.
So it works only with one stopped/sreseted thread.

--
Regards,
Artem Senichev
Software Engineer, YADRO.


________________________________________
From: Artem Senichev
Sent: 27 July 2020 16:56
To: Daniel M Crowell
Cc: Alexander Amelkin; openpower-firmware at lists.ozlabs.org
Subject: Re: [OpenPower-Firmware] SRESET and idle CPU

On Fri, Jul 24, 2020 at 04:39:32PM -0500, Daniel M Crowell wrote:
>
> What you have tells me that pdbg stop is already doing what I suggested you
> do I think since it shows the wakeup bit getting set.  Note though that it
> is a core reg, so you need to read 200F010B on every core.  Does pdbg have
> a "-call" option?

Yes, we can use ranges (-c0-23), but it does not change anything in my case.

> I also don't know what pdbg threadstatus is actually looking at.  There are
> requested states and actual states.  Wakeup would only change the actual
> state.  Also, it might not block STOP1 since that doesn't affect any
> pervasive accesses, but that is just speculation.

Pdbg reads state from the following registers:

#define P9_RAS_STATUS        0x10a02
#define P9_CORE_THREAD_STATE 0x10ab3
#define P9_THREAD_INFO       0x10a9b

thread_read(thread, P9_RAS_STATUS, &value);
thread_state.quiesced = (GETFIELD(PPC_BITMASK(8*thread->id, 3 + 8*thread->id), value) == 0xf);

thread_read(thread, P9_THREAD_INFO, &value);
thread_state.active = !!(value & PPC_BIT(thread->id));

thread_read(thread, P9_CORE_THREAD_STATE, &value);
if (value & PPC_BIT(56 + thread->id))
    thread_state.sleep_state = PDBG_THREAD_STATE_STOP;
else
    thread_state.sleep_state = PDBG_THREAD_STATE_RUN;

Thread stop is implemented as:
#define P9_DIRECT_CONTROL 0x10a9c
thread_write(thread, P9_DIRECT_CONTROL, PPC_BIT(7 + 8*thread->id));

SRESET:
thread_write(thread, P9_DIRECT_CONTROL, PPC_BIT(4 + 8*thread->id));

Official POWER9 specification has minimum information about these registers
(all fields are "reserved"), so I can't say anything else.

> > It looks like 0 bit in SPECIAL_WKUP_FSP_REG takes the thread out of
> > "quiesced" state, but do not set it as active.
> Correct.  Wakeup only prevents the core from going through the stop state
> transitions.  The thread still thinks it is in those states.  It should
> look similar to what you'd see if a single thread were to execute STOP5 but
> the other threads are still active.  In that case, the core can't do
> anything because there are some active threads.

Excuse me, I am not an expert in POWER CPU architecture. What is the STOP5?
I found some mentions about this state in skiboot and it somehow related to
CPU idle, but I still don't see the whole picture.

--
Regards,
Artem Senichev
Software Engineer, YADRO.

>
> From: Artem Senichev <a.senichev at yadro.com>
> To:   Daniel M Crowell <dcrowell at us.ibm.com>
> Cc:   "openpower-firmware at lists.ozlabs.org"
>             <openpower-firmware at lists.ozlabs.org>, Alexander Amelkin
>             <a.amelkin at yadro.com>
> Date: 07/24/2020 12:31 PM
> Subject:      [EXTERNAL] Re:  [OpenPower-Firmware] SRESET and idle CPU
>
> Hi Dan,
>
> Thank you for quick reply!
>
> I tried to use SPECIAL_WKUP_FSP_REG register, but it does not work as
> expected for me.
> Maybe I am using this incorrectly?
>
> # current state
> bmc:~ # pdbg -p0 -c0 -t0-3 threadstatus
> p0t:   0   1   2   3
> c00:  .S. .S. .S. .S.
>
> # read SPECIAL_WKUP_FSP_REG
> bmc:~ # pdbg -p0 getscom 0x200F010B
> p0: 0x00000000200f010b = 0x0000000000000000 (/proc0/pib)
>
> # stop the first thread
> bmc:~ # pdbg -p0 -c0 -t0 stop
>
> # SPECIAL_WKUP_FSP_REG has changed
> bmc:~ # pdbg -p0 getscom 0x200F010B
> p0: 0x00000000200f010b = 0x8000000000000000 (/proc0/pib)
>
> # thread now in quiesced state
> bmc:~ # pdbg -p0 -c0 -t0-3 threadstatus
> p0t:   0   1   2   3
> c00:  .SQ .S. .S. .S.
>
> # force set 0 bit (addr value mask)
> bmc:~ # pdbg -p0 putscom 0x200F010B 0x8000000000000000 0x8000000000000000
>
> # state still inactive
> bmc:~ # pdbg -p0 -c0 -t0-3 threadstatus
> p0t:   0   1   2   3
> c00:  .SQ .S. .S. .S.
>
> # read SPECIAL_WKUP_FSP_REG
> bmc:~ # pdbg -p0 getscom 0x200F010B
> p0: 0x00000000200f010b = 0x8000000000000000 (/proc0/pib)
>
> # reset 0 bit
> bmc:~ # pdbg -p0 putscom 0x200F010B 0 0x8000000000000000
>
> # no quiesced state for now
> bmc:~ # pdbg -p0 -c0 -t0-3 threadstatus
> p0t:   0   1   2   3
> c00:  .S. .S. .S. .S.
>
> # read SPECIAL_WKUP_FSP_REG
> bmc:~ # pdbg -p0 getscom 0x200F010B
> p0: 0x00000000200f010b = 0x0000000000000000 (/proc0/pib)
>
> It looks like 0 bit in SPECIAL_WKUP_FSP_REG takes the thread out of
> "quiesced" state, but do not set it as active.
>
> If I disable cpuidle on the host, I see something like this (which is what
> I expect):
> bmc:~ # pdbg -p0 -c0 -t0-3 threadstatus
> p0t:   0   1   2   3
> c00:  A.. .S. .S. .S.
>
> --
> Regards,
> Artem Senichev
> Software Engineer, YADRO.
>
>
> ________________________________________
> From: Daniel M Crowell <dcrowell at us.ibm.com>
> Sent: 24 July 2020 18:04
> To: Artem Senichev
> Cc: openpower-firmware at lists.ozlabs.org
> Subject: Re:  [OpenPower-Firmware] SRESET and idle CPU
>
> I think that you might be able to initiate a 'special wakeup' via scom from
> the BMC. I'm a little surprised that pdbg doesn't have that built in to the
> stop function already like we do in some similar tooling (but I have no
> visibility to pdbg's goals). Enabling special wakeup forces a core to exit
> the idle state and prevents it from going idle. Instructions aren't
> executed, but pervasively the core is alive. That should allow the scoms to
> trigger stop/sreset/etc to work.
>
> For P9 there are 3 sets of wakeup registers, each with a different owner:
> - 200F010A = SPECIAL_WKUP_OTR_REG - Used by the PM Complex itself
> internally
> - 200F010B = SPECIAL_WKUP_FSP_REG - Used by FSP when we have one, or by
> HBRT/opal-prd on these boxes
> - 200F010C = SPECIAL_WKUP_OCC_REG - Used by OCC
> - 200F010D = SPECIAL_WKUP_HYP_REG - Used by OPAL/PHYP
>
> I would recommend that you use 200F010B for this purpose. You just need to
> set bit 0 to trigger it, though there is a non-zero time for it to take
> effect. See
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_open-2Dpower_hostboot_blob_master_src_import_chips_p9_procedures_hwp_pm_p9-5Fcpu-5Fspecial-5Fwakeup.C&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=mCj3CQvqek9g0fdziO-GEHyU1m9T3SAh0ZPd5s_AGpo&m=04dl2JlKT9RWCXxpJDCCzqrKgmpFGpqP9jv9OwswBwU&s=qTCm77ilfAtqRC_6z00eAwmkhCiyYeg8pwxHiRZ-aUw&e=
>   for the details.
>
> You'll need to set it on every functional core. You could probably get
> fancy and use multicasts to do it with a single scom using group 1. And
> don't forget to clear it later so that you can use idle states again later.
>
> --
> Dan Crowell
> Senior Software Engineer - Power Systems Enablement Firmware
> IBM Rochester: t/l 553-2987
> dcrowell at us.ibm.com
>
> [Inactive hide details for Artem Senichev ---07/24/2020 07:03:58 AM---Hi
> all, Our customers want to be able to initiate kdump on]Artem Senichev
> ---07/24/2020 07:03:58 AM---Hi all, Our customers want to be able to
> initiate kdump on a POWER9 host system from BMC console.
>
> From: Artem Senichev <a.senichev at yadro.com>
> To: "openpower-firmware at lists.ozlabs.org"
> <openpower-firmware at lists.ozlabs.org>
> Date: 07/24/2020 07:03 AM
> Subject: [EXTERNAL] [OpenPower-Firmware] SRESET and idle CPU
> Sent by: "OpenPower-Firmware" <openpower-firmware-bounces
> +dcrowell=us.ibm.com at lists.ozlabs.org>
>
> ________________________________
>
> Hi all,
>
> Our customers want to be able to initiate kdump on a POWER9 host system
> from BMC console.
> I tried to implement this functionality with an SRESET signal sent through
> the pdbg utility, but it turned out that when the CPU is in an idle state
> (sleep), the signal could not be delivered.
>
> I can disable the idle state on a host:
>
> for i in /sys/devices/system/cpu/cpu0/cpuidle/state*/disable; do
>  echo 1 > $i
> done
>
> and then send SRESET from BMC:
>
> pdbg -p0 -c0 -t0 stop
> pdbg -p0 -c0 -t0 sreset
>
> This solution works fine, but I need to do it without interfering with the
> host system.
> Is it possible?
>
> --
> Regards,
> Artem Senichev
> Software Engineer, YADRO.


More information about the OpenPower-Firmware mailing list