[Skiboot] [RFC PATCH] flash: Handle nullptr dereference of system_flash

Aditya Gupta adityag at linux.ibm.com
Fri Mar 7 16:24:29 AEDT 2025


Hello Mahesh,

Thanks for reviewing


On 07/03/25 10:42, Mahesh J Salgaonkar wrote:
> On 2025-03-05 22:21:34 Wed, Aditya Gupta wrote:
>> With QEMU with NO support for MPIPL, 'p9_sbe_terminate' returns early
>> at:
>>
>>      /* Return if MPIPL is not supported */
>>      if (!is_mpipl_enabled())
>>      	return;
>>
>> But with MPIPL supported in QEMU, 'p9_sbe_terminate' continues further and
>> calls 'flash_unregister' which causes a Machine Check due to nullptr
>> dereference of 'system_flash':
>>
>>      [   13.240783728,5] Reboot: OS reported error. Performing MPIPL
>>      [   13.241662601,5] DUMP: Crashing PIR = 0x0
>>      [   13.244049276,5] RESET: Fast reboot disabled: Kernel re-entered OPAL
>>      [    1.815018] Disabling lock debugging due to kernel taint
>>      [    1.815518] MCE: CPU0: machine check (Severe)  Real address Load (bad) DAR: 0000006000000098 [Not recovered]
>>      [    1.815544] MCE: CPU0: NIP: [0000000030040f54] 0x30040f54
>>      [    1.815911] MCE: CPU0: Initiator CPU
>>      [    1.815930] MCE: CPU0: Hardware error
>>      [    1.816110] opal: Hardware platform error: Unrecoverable Machine Check exception
>>      [    1.816338] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G   M               6.12.0-rc4+ #1
>>      [    1.816531] Tainted: [M]=MACHINE_CHECK
>>      [    1.816546] Hardware name: IBM PowerNV (emulated by qemu) POWER10 0x801200 opal:v7.1 PowerNV
>>      [    1.816629] NIP:  0000000030040f54 LR: 000000003007e528 CTR: 000000003004d75c
>>      [    1.816646] REGS: c0000004d5e47d60 TRAP: 0200   Tainted: G   M                (6.12.0-rc4+)
>>      [    1.816684] MSR:  9000000002a03002 <SF,HV,VEC,VSX,FP,ME,RI>  CR: 28002284  XER: 00000000
>>      [    1.816863] CFAR: 000000003007e524 DAR: 0000006000000098 DSISR: 00000040 IRQMASK: 3
>>      [    1.816863] GPR00: 000000003007e528 0000000031c13ac0 0000000030192900 0000006000000060
>>      [    1.816863] GPR04: 0000000030500028 000000000000000a 0000000031c10068 0000000031c10068
>>      [    1.816863] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>      [    1.816863] GPR12: 0000000028002284 c000000002e80000 c00000000001192c 0000000000000000
>>      [    1.816863] GPR16: 0000000031c10000 0000000000000000 0000000000000000 0000000000000000
>>      [    1.816863] GPR20: 0000000000000003 0000000000000074 0000000000000000 0000000000000000
>>      [    1.816863] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>      [    1.816863] GPR28: c000000002d0e8c8 00000000301257de c000000002d0e8c8 000000000000000c
>>      [    1.817061] NIP [0000000030040f54] 0x30040f54
>>      [    1.817074] LR [000000003007e528] 0x3007e528
>>      [    1.817165] Call Trace:
>>      [    1.817337] Code: 00000060 80002138 e01d0d48 00000000 01000000 00000180 a602087c 3700223d 602e29e9 100001f8 91ff21f8 180069e8 <380023e9> 0000292c 34008241 280041f8
>>      [   13.247702490,0] OPAL: Reboot requested due to Platform error.
>>      [   13.247857686,3] OPAL: failed to log an error
>>      [   13.248012502,2] NVRAM: Failed to load
>>
>> Previously above machine check was never hit as QEMU platform didn't
>> had MPIPL, and hence the caller 'p9_sbe_terminate' used to return early.
>>
>> Add null check to ignore the unregister request if system_flash is not set.
>>
>> Signed-off-by: Aditya Gupta <adityag at linux.ibm.com>
>>
>> ---
>> Initial QEMU MPIPL support was posted to [1]. It has not been merged
>> yet.
>>
>> Question: Should this be done in a way that the unregister gets skipped
>> only in case of QEMU platform ?
> As I see flash_unregister is nothing but calling exit on blocklevel device.
> system_flash is set only if flash_register() is called. pnor_init calls
> flash_register only if flash_init is successful. If iflash init fails it
> anyway calls exit on blocklevel device. So we are good to return from
> flash_unregister if system_flash is not set.
Okay, thanks for explaining.
>
>> Also in this patch I am returning true even on an error, since we need
>> skiboot to continue and send S0 interrupts in 'p9_sbe_terminate'. Is it
>> okay ?
> I think we are good since blocklevel device is anyway exited during
> flash_init failure.
>
>> [1]: https://lore.kernel.org/qemu-devel/20250217071934.86131-1-adityag@linux.ibm.com/
>> ---
>> ---
>>   core/flash.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/core/flash.c b/core/flash.c
>> index a14bfc68fd1a..02e017041a18 100644
>> --- a/core/flash.c
>> +++ b/core/flash.c
>> @@ -88,7 +88,16 @@ void flash_release(void)
>>   
>>   bool flash_unregister(void)
>>   {
>> -	struct blocklevel_device *bl = system_flash->bl;
>> +	struct blocklevel_device *bl;
>> +
>> +	if (!system_flash) {
>> +		prlog(PR_WARNING, "System Flash is NULL, ignoring unregister request\n");
> Can we say flash wasn't registered instead of system flash is NULL ?

Sure, I will reword it to say that.


Thanks,

- Aditya Gupta

>> +
>> +		/* returning true to preserve previous behaviour */
>> +		return true;
>> +	}
>> +
>> +	bl = system_flash->bl;
>>   
>>   	if (bl->exit)
>>   		return bl->exit(bl);
> Thanks,
> -Mahesh.
>


More information about the Skiboot mailing list