[PATCH] powerpc/64s/radix: Don't warn on copros in radix__tlb_flush()
Sachin Sant
sachinp at linux.ibm.com
Wed Oct 18 03:59:52 AEDT 2023
> On 17-Oct-2023, at 5:45 PM, Michael Ellerman <mpe at ellerman.id.au> wrote:
>
> Sachin reported a warning when running the inject-ra-err selftest:
>
> # selftests: powerpc/mce: inject-ra-err
> Disabling lock debugging due to kernel taint
> MCE: CPU19: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered]
> MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48]
> MCE: CPU19: Initiator CPU
> MCE: CPU19: Unknown
> ------------[ cut here ]------------
> WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180
> CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G M E 6.6.0-rc3-00055-g9ed22ae6be81 #4
> Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
> ...
> NIP radix__tlb_flush+0x160/0x180
> LR radix__tlb_flush+0x104/0x180
> Call Trace:
> radix__tlb_flush+0xf4/0x180 (unreliable)
> tlb_finish_mmu+0x15c/0x1e0
> exit_mmap+0x1a0/0x510
> __mmput+0x60/0x1e0
> exit_mm+0xdc/0x170
> do_exit+0x2bc/0x5a0
> do_group_exit+0x4c/0xc0
> sys_exit_group+0x28/0x30
> system_call_exception+0x138/0x330
> system_call_vectored_common+0x15c/0x2ec
>
> And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine
> final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
> in radix__tlb_flush() if mm->context.copros is still elevated.
>
> However it's possible for the copros count to be elevated if a process
> exits without first closing file descriptors that are associated with a
> copro, eg. VAS.
>
> If the process exits with a VAS file still open, the release callback
> is queued up for exit_task_work() via:
> exit_files()
> put_files_struct()
> close_files()
> filp_close()
> fput()
>
> And called via:
> exit_task_work()
> ____fput()
> __fput()
> file->f_op->release(inode, file)
> coproc_release()
> vas_user_win_ops->close_win()
> vas_deallocate_window()
> mm_context_remove_vas_window()
> mm_context_remove_copro()
>
> But that is after exit_mm() has been called from do_exit() and triggered
> the warning.
>
> Fix it by dropping the warning, and always calling __flush_all_mm().
>
> In the normal case of no copros, that will result in a call to
> _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
> does.
>
> If the copros count is elevated then it will cause a global flush, which
> should flush translations from any copros. Note that the process table
> entry was cleared in arch_exit_mmap(), so copros should not be able to
> fetch any new translations.
>
> Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs")
> Reported-by: Sachin Sant <sachinp at linux.ibm.com>
> Closes: https://lore.kernel.org/all/A8E52547-4BF1-47CE-8AEA-BC5A9D7E3567@linux.ibm.com/
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> ---
Thanks for the fix. This fixes the reported problem.
Tested-by: Sachin Sant <sachinp at linux.ibm.com>
- Sachin
More information about the Linuxppc-dev
mailing list