[Skiboot] [RFC PATCH] flash: Handle nullptr dereference of system_flash

Mahesh J Salgaonkar mahesh at linux.ibm.com
Fri Mar 7 16:12:30 AEDT 2025


On 2025-03-05 22:21:34 Wed, Aditya Gupta wrote:
> With QEMU with NO support for MPIPL, 'p9_sbe_terminate' returns early
> at:
> 
>     /* Return if MPIPL is not supported */
>     if (!is_mpipl_enabled())
>     	return;
> 
> But with MPIPL supported in QEMU, 'p9_sbe_terminate' continues further and
> calls 'flash_unregister' which causes a Machine Check due to nullptr
> dereference of 'system_flash':
> 
>     [   13.240783728,5] Reboot: OS reported error. Performing MPIPL
>     [   13.241662601,5] DUMP: Crashing PIR = 0x0
>     [   13.244049276,5] RESET: Fast reboot disabled: Kernel re-entered OPAL
>     [    1.815018] Disabling lock debugging due to kernel taint
>     [    1.815518] MCE: CPU0: machine check (Severe)  Real address Load (bad) DAR: 0000006000000098 [Not recovered]
>     [    1.815544] MCE: CPU0: NIP: [0000000030040f54] 0x30040f54
>     [    1.815911] MCE: CPU0: Initiator CPU
>     [    1.815930] MCE: CPU0: Hardware error
>     [    1.816110] opal: Hardware platform error: Unrecoverable Machine Check exception
>     [    1.816338] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G   M               6.12.0-rc4+ #1
>     [    1.816531] Tainted: [M]=MACHINE_CHECK
>     [    1.816546] Hardware name: IBM PowerNV (emulated by qemu) POWER10 0x801200 opal:v7.1 PowerNV
>     [    1.816629] NIP:  0000000030040f54 LR: 000000003007e528 CTR: 000000003004d75c
>     [    1.816646] REGS: c0000004d5e47d60 TRAP: 0200   Tainted: G   M                (6.12.0-rc4+)
>     [    1.816684] MSR:  9000000002a03002 <SF,HV,VEC,VSX,FP,ME,RI>  CR: 28002284  XER: 00000000
>     [    1.816863] CFAR: 000000003007e524 DAR: 0000006000000098 DSISR: 00000040 IRQMASK: 3
>     [    1.816863] GPR00: 000000003007e528 0000000031c13ac0 0000000030192900 0000006000000060
>     [    1.816863] GPR04: 0000000030500028 000000000000000a 0000000031c10068 0000000031c10068
>     [    1.816863] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>     [    1.816863] GPR12: 0000000028002284 c000000002e80000 c00000000001192c 0000000000000000
>     [    1.816863] GPR16: 0000000031c10000 0000000000000000 0000000000000000 0000000000000000
>     [    1.816863] GPR20: 0000000000000003 0000000000000074 0000000000000000 0000000000000000
>     [    1.816863] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>     [    1.816863] GPR28: c000000002d0e8c8 00000000301257de c000000002d0e8c8 000000000000000c
>     [    1.817061] NIP [0000000030040f54] 0x30040f54
>     [    1.817074] LR [000000003007e528] 0x3007e528
>     [    1.817165] Call Trace:
>     [    1.817337] Code: 00000060 80002138 e01d0d48 00000000 01000000 00000180 a602087c 3700223d 602e29e9 100001f8 91ff21f8 180069e8 <380023e9> 0000292c 34008241 280041f8
>     [   13.247702490,0] OPAL: Reboot requested due to Platform error.
>     [   13.247857686,3] OPAL: failed to log an error
>     [   13.248012502,2] NVRAM: Failed to load
> 
> Previously above machine check was never hit as QEMU platform didn't
> had MPIPL, and hence the caller 'p9_sbe_terminate' used to return early.
> 
> Add null check to ignore the unregister request if system_flash is not set.
> 
> Signed-off-by: Aditya Gupta <adityag at linux.ibm.com>
> 
> ---
> Initial QEMU MPIPL support was posted to [1]. It has not been merged
> yet.
> 
> Question: Should this be done in a way that the unregister gets skipped
> only in case of QEMU platform ?

As I see flash_unregister is nothing but calling exit on blocklevel device.
system_flash is set only if flash_register() is called. pnor_init calls
flash_register only if flash_init is successful. If iflash init fails it
anyway calls exit on blocklevel device. So we are good to return from
flash_unregister if system_flash is not set.

> 
> Also in this patch I am returning true even on an error, since we need
> skiboot to continue and send S0 interrupts in 'p9_sbe_terminate'. Is it
> okay ?

I think we are good since blocklevel device is anyway exited during
flash_init failure.

> 
> [1]: https://lore.kernel.org/qemu-devel/20250217071934.86131-1-adityag@linux.ibm.com/
> ---
> ---
>  core/flash.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/core/flash.c b/core/flash.c
> index a14bfc68fd1a..02e017041a18 100644
> --- a/core/flash.c
> +++ b/core/flash.c
> @@ -88,7 +88,16 @@ void flash_release(void)
>  
>  bool flash_unregister(void)
>  {
> -	struct blocklevel_device *bl = system_flash->bl;
> +	struct blocklevel_device *bl;
> +
> +	if (!system_flash) {
> +		prlog(PR_WARNING, "System Flash is NULL, ignoring unregister request\n");

Can we say flash wasn't registered instead of system flash is NULL ?

> +
> +		/* returning true to preserve previous behaviour */
> +		return true;
> +	}
> +
> +	bl = system_flash->bl;
>  
>  	if (bl->exit)
>  		return bl->exit(bl);

Thanks,
-Mahesh.



More information about the Skiboot mailing list