[Skiboot] [PATCH] core/cpu: Initialize all cpu thread areas to avoid invalid memory access.

Mon Sep 3 16:55:24 AEST 2018

On Mon, Sep 3, 2018 at 4:07 PM, Mahesh Jagannath Salgaonkar
<mahesh at linux.vnet.ibm.com> wrote:
> On 09/03/2018 11:20 AM, Oliver wrote:
>> On Sun, Sep 2, 2018 at 3:40 AM, Mahesh J Salgaonkar
>> <mahesh at linux.vnet.ibm.com> wrote:
>>> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>>>
>>> Currently we initialize cpu stack memory only for cpu pir that is found on
>>> the device-tree. For the rest, the cpu thread contents are uninitialized.
>>> This sometime causes for_each_cpu* macros to return cpu thread for pir/cpu
>>> which isn't present on the system. The for_each_cpu* macros iterate over
>>> cpu stacks using pir as index and returns cpu thread pointer if
>>> state != cpu_state_no_cpu. For cpus that are not found on device-tree the
>>> state may hold junk value leading opal to access invalid cpu thread area.
>>> This further leads to accessing pointers with junk values causing machine
>>> check (MCE) during OPAL init code. Fix this by Initializing all the cpu
>>> thread areas upto cpu_max_pir.
>>
>> Is this from a cold IPL or MPIPL? Memory should be zeroed before we
>> enter OPAL so if there is junk in the stack area we might have a data
>> corruption problem,
>
> This is happening during re-IPL (OS reboot). On FSP, per Dean Sanner, OS
> reboot requests get transformed into an MPIPL. On cold IPL the memory is
> all zeroed and we don't see this issue during first IPL.

Right, you should probably mention this is MPIPL specific in the commit message.

I'm still a little concerned that skiboot is putting junk data in the
unused cpu_thread structures at all. If it is then I'd expect
cpu_for_each*() to be broken at broken at runtime even if it boots
successfully from a cold IPL. I'll take a look at it sometime.