[PATCH] powerpc/mm: Fix set_memory_*() against concurrent accesses
Christophe Leroy
christophe.leroy at csgroup.eu
Wed Aug 18 00:20:57 AEST 2021
Le 17/08/2021 à 15:25, Michael Ellerman a écrit :
> Laurent reported that STRICT_MODULE_RWX was causing intermittent crashes
> on one of his systems:
>
> kernel tried to execute exec-protected page (c008000004073278) - exploit attempt? (uid: 0)
> BUG: Unable to handle kernel instruction fetch
> Faulting instruction address: 0xc008000004073278
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: drm virtio_console fuse drm_panel_orientation_quirks ...
> CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
> Workqueue: events control_work_handler [virtio_console]
> NIP: c008000004073278 LR: c008000004073278 CTR: c0000000001e9de0
> REGS: c00000002e4ef7e0 TRAP: 0400 Not tainted (5.14.0-rc4+)
> MSR: 800000004280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: 200400cf
> ...
> NIP fill_queue+0xf0/0x210 [virtio_console]
> LR fill_queue+0xf0/0x210 [virtio_console]
> Call Trace:
> fill_queue+0xb4/0x210 [virtio_console] (unreliable)
> add_port+0x1a8/0x470 [virtio_console]
> control_work_handler+0xbc/0x1e8 [virtio_console]
> process_one_work+0x290/0x590
> worker_thread+0x88/0x620
> kthread+0x194/0x1a0
> ret_from_kernel_thread+0x5c/0x64
>
> Jordan, Fabiano & Murilo were able to reproduce and identify that the
> problem is caused by the call to module_enable_ro() in do_init_module(),
> which happens after the module's init function has already been called.
>
> Our current implementation of change_page_attr() is not safe against
> concurrent accesses, because it invalidates the PTE before flushing the
> TLB and then installing the new PTE. That leaves a window in time where
> there is no valid PTE for the page, if another CPU tries to access the
> page at that time we see something like the fault above.
>
> We can't simply switch to set_pte_at()/flush TLB, because our hash MMU
> code doesn't handle a set_pte_at() of a valid PTE. See [1].
>
> But we do have pte_update(), which replaces the old PTE with the new,
> meaning there's no window where the PTE is invalid. And the hash MMU
> version hash__pte_update() deals with synchronising the hash page table
> correctly.
>
> Because pte_update() takes the set of PTE bits to set and clear we can't
> use our existing helpers, eg. pte_wrprotect() etc. and instead have to
> open code the set of flags. We will clean that up somehow in a future
> commit.
>
> [1]: https://lore.kernel.org/linuxppc-dev/87y318wp9r.fsf@linux.ibm.com/
>
> Fixes: 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
> Reported-by: Laurent Vivier <lvivier at redhat.com>
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> ---
> arch/powerpc/mm/pageattr.c | 45 +++++++++++++++++++++++---------------
> 1 file changed, 27 insertions(+), 18 deletions(-)
>
> diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
> index 0876216ceee6..72425b61eb7e 100644
> --- a/arch/powerpc/mm/pageattr.c
> +++ b/arch/powerpc/mm/pageattr.c
> @@ -18,52 +18,61 @@
> /*
> * Updates the attributes of a page in three steps:
> *
> - * 1. invalidate the page table entry
> - * 2. flush the TLB
> - * 3. install the new entry with the updated attributes
> - *
> - * Invalidating the pte means there are situations where this will not work
> - * when in theory it should.
> - * For example:
> - * - removing write from page whilst it is being executed
> - * - setting a page read-only whilst it is being read by another CPU
> + * 1. take the page_table_lock
> + * 2. install the new entry with the updated attributes
> + * 3. flush the TLB
> *
> + * This sequence is safe against concurrent updates, and also allows updating the
> + * attributes of a page currently being executed or accessed.
> */
> static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
> {
> long action = (long)data;
> - pte_t pte;
> + unsigned long set, clear;
>
> spin_lock(&init_mm.page_table_lock);
>
> - /* invalidate the PTE so it's safe to modify */
> - pte = ptep_get_and_clear(&init_mm, addr, ptep);
> - flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> + set = clear = 0;
>
> /* modify the PTE bits as desired, then apply */
> switch (action) {
> case SET_MEMORY_RO:
> - pte = pte_wrprotect(pte);
> +#ifdef CONFIG_PPC_BOOK3S_64
> + clear = _PAGE_WRITE;
> +#elif defined(CONFIG_PPC_8xx)
> + set = _PAGE_RO;
> +#else
> + clear = _PAGE_RW;
> +#endif
I think it can be handle as follows (untested):
new = pte_wrprotect(pte);
set = pte_val(new) & ~pte_val(pte);
clear = ~pte_val(new) & pte_val(pte);
So just put those two lines before the pte_update() and only change the switch cases to create a
'new' pte instead of changing it.
Or you can do the way we do in ptep_set_wrprotect() in <asm/nohash/32/pgtable.h>
Or can __ptep_set_access_flags() be used ?
> break;
> case SET_MEMORY_RW:
> - pte = pte_mkwrite(pte_mkdirty(pte));
> +#ifdef CONFIG_PPC_8xx
> + clear = _PAGE_RO;
> +#elif defined(CONFIG_PPC_BOOK3S_64)
> + set = _PAGE_RW | _PAGE_DIRTY | _PAGE_SOFT_DIRTY;
> +#else
> + set = _PAGE_RW | _PAGE_DIRTY;
> +#endif
> break;
> case SET_MEMORY_NX:
> - pte = pte_exprotect(pte);
> + clear = _PAGE_EXEC;
> break;
> case SET_MEMORY_X:
> - pte = pte_mkexec(pte);
> + set = _PAGE_EXEC;
> break;
> default:
> WARN_ON_ONCE(1);
> break;
> }
>
> - set_pte_at(&init_mm, addr, ptep, pte);
> + pte_update(&init_mm, addr, ptep, clear, set, 0);
>
> /* See ptesync comment in radix__set_pte_at() */
> if (radix_enabled())
> asm volatile("ptesync": : :"memory");
> +
> + flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
Can we use a page version like flush_tlb_page() in order to avoid a 'tlbia' ? (maybe another page as
it was already there).
> +
> spin_unlock(&init_mm.page_table_lock);
>
> return 0;
>
> base-commit: cbc06f051c524dcfe52ef0d1f30647828e226d30
>
More information about the Linuxppc-dev
mailing list