[Skiboot] [RFC PATCH] flash: Handle nullptr dereference of system_flash
Mahesh J Salgaonkar
mahesh at linux.ibm.com
Fri Mar 7 16:12:30 AEDT 2025
On 2025-03-05 22:21:34 Wed, Aditya Gupta wrote:
> With QEMU with NO support for MPIPL, 'p9_sbe_terminate' returns early
> at:
>
> /* Return if MPIPL is not supported */
> if (!is_mpipl_enabled())
> return;
>
> But with MPIPL supported in QEMU, 'p9_sbe_terminate' continues further and
> calls 'flash_unregister' which causes a Machine Check due to nullptr
> dereference of 'system_flash':
>
> [ 13.240783728,5] Reboot: OS reported error. Performing MPIPL
> [ 13.241662601,5] DUMP: Crashing PIR = 0x0
> [ 13.244049276,5] RESET: Fast reboot disabled: Kernel re-entered OPAL
> [ 1.815018] Disabling lock debugging due to kernel taint
> [ 1.815518] MCE: CPU0: machine check (Severe) Real address Load (bad) DAR: 0000006000000098 [Not recovered]
> [ 1.815544] MCE: CPU0: NIP: [0000000030040f54] 0x30040f54
> [ 1.815911] MCE: CPU0: Initiator CPU
> [ 1.815930] MCE: CPU0: Hardware error
> [ 1.816110] opal: Hardware platform error: Unrecoverable Machine Check exception
> [ 1.816338] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G M 6.12.0-rc4+ #1
> [ 1.816531] Tainted: [M]=MACHINE_CHECK
> [ 1.816546] Hardware name: IBM PowerNV (emulated by qemu) POWER10 0x801200 opal:v7.1 PowerNV
> [ 1.816629] NIP: 0000000030040f54 LR: 000000003007e528 CTR: 000000003004d75c
> [ 1.816646] REGS: c0000004d5e47d60 TRAP: 0200 Tainted: G M (6.12.0-rc4+)
> [ 1.816684] MSR: 9000000002a03002 <SF,HV,VEC,VSX,FP,ME,RI> CR: 28002284 XER: 00000000
> [ 1.816863] CFAR: 000000003007e524 DAR: 0000006000000098 DSISR: 00000040 IRQMASK: 3
> [ 1.816863] GPR00: 000000003007e528 0000000031c13ac0 0000000030192900 0000006000000060
> [ 1.816863] GPR04: 0000000030500028 000000000000000a 0000000031c10068 0000000031c10068
> [ 1.816863] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 1.816863] GPR12: 0000000028002284 c000000002e80000 c00000000001192c 0000000000000000
> [ 1.816863] GPR16: 0000000031c10000 0000000000000000 0000000000000000 0000000000000000
> [ 1.816863] GPR20: 0000000000000003 0000000000000074 0000000000000000 0000000000000000
> [ 1.816863] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 1.816863] GPR28: c000000002d0e8c8 00000000301257de c000000002d0e8c8 000000000000000c
> [ 1.817061] NIP [0000000030040f54] 0x30040f54
> [ 1.817074] LR [000000003007e528] 0x3007e528
> [ 1.817165] Call Trace:
> [ 1.817337] Code: 00000060 80002138 e01d0d48 00000000 01000000 00000180 a602087c 3700223d 602e29e9 100001f8 91ff21f8 180069e8 <380023e9> 0000292c 34008241 280041f8
> [ 13.247702490,0] OPAL: Reboot requested due to Platform error.
> [ 13.247857686,3] OPAL: failed to log an error
> [ 13.248012502,2] NVRAM: Failed to load
>
> Previously above machine check was never hit as QEMU platform didn't
> had MPIPL, and hence the caller 'p9_sbe_terminate' used to return early.
>
> Add null check to ignore the unregister request if system_flash is not set.
>
> Signed-off-by: Aditya Gupta <adityag at linux.ibm.com>
>
> ---
> Initial QEMU MPIPL support was posted to [1]. It has not been merged
> yet.
>
> Question: Should this be done in a way that the unregister gets skipped
> only in case of QEMU platform ?
As I see flash_unregister is nothing but calling exit on blocklevel device.
system_flash is set only if flash_register() is called. pnor_init calls
flash_register only if flash_init is successful. If iflash init fails it
anyway calls exit on blocklevel device. So we are good to return from
flash_unregister if system_flash is not set.
>
> Also in this patch I am returning true even on an error, since we need
> skiboot to continue and send S0 interrupts in 'p9_sbe_terminate'. Is it
> okay ?
I think we are good since blocklevel device is anyway exited during
flash_init failure.
>
> [1]: https://lore.kernel.org/qemu-devel/20250217071934.86131-1-adityag@linux.ibm.com/
> ---
> ---
> core/flash.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/core/flash.c b/core/flash.c
> index a14bfc68fd1a..02e017041a18 100644
> --- a/core/flash.c
> +++ b/core/flash.c
> @@ -88,7 +88,16 @@ void flash_release(void)
>
> bool flash_unregister(void)
> {
> - struct blocklevel_device *bl = system_flash->bl;
> + struct blocklevel_device *bl;
> +
> + if (!system_flash) {
> + prlog(PR_WARNING, "System Flash is NULL, ignoring unregister request\n");
Can we say flash wasn't registered instead of system flash is NULL ?
> +
> + /* returning true to preserve previous behaviour */
> + return true;
> + }
> +
> + bl = system_flash->bl;
>
> if (bl->exit)
> return bl->exit(bl);
Thanks,
-Mahesh.
More information about the Skiboot
mailing list