[PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
Satheesh Rajendran
sathnaga at linux.vnet.ibm.com
Wed Jul 10 17:32:07 AEST 2019
On Wed, Jul 10, 2019 at 03:20:18PM +1000, Suraj Jitindar Singh wrote:
> The virtual real mode addressing (VRMA) mechanism is used when a
> partition is using HPT (Hash Page Table) translation and performs
> real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
> mode effective address bits 0:23 are treated as zero (i.e. the access
> is aliased to 0) and the access is performed using an implicit 1TB SLB
> entry.
>
> The size of the RMA (Real Memory Area) is communicated to the guest as
> the size of the first memory region in the device tree. And because of
> the mechanism described above can be expected to not exceed 1TB. In the
> event that the host erroneously represents the RMA as being larger than
> 1TB, guest accesses in real mode to memory addresses above 1TB will be
> aliased down to below 1TB. This means that a memory access performed in
> real mode may differ to one performed in virtual mode for the same memory
> address, which would likely have unintended consequences.
>
> To avoid this outcome have the guest explicitly limit the size of the
> RMA to the current maximum, which is 1TB. This means that even if the
> first memory block is larger than 1TB, only the first 1TB should be
> accessed in real mode.
>
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh at gmail.com>
> ---
> arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
Hi,
Tested this patch and now Power8 compat guest boots fine with mem >1024G on
Power9 host.
Tested-by: Satheesh Rajendran <sathnaga at linux.vnet.ibm.com>
Host: P9; kernel: 5.2.0-00915-g5ad18b2e60b7
Before this patch:
Guest crashes..
[0.000000] BUG: Kernel NULL pointer dereference at 0x00000028
[0.000000] Faulting instruction address: 0xc00000000102caa0
[0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
[0.000000] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[0.000000] Modules linked in:
[0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-03135-ge9a83bd23220 #24
[0.000000] NIP: c00000000102caa0 LR: c00000000102ca84 CTR: 0000000000000000
[0.000000] REGS: c000000001603ba0 TRAP: 0380 Not tainted (5.2.0-03135-ge9a83bd23220)
[0.000000] MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 24000428 XER: 20000000
[0.000000] CFAR: c00000000102c1d8 IRQMASK: 1
[0.000000] GPR00: c00000000102ca84 c000000001603e30 c000000001605300 0000010000000000
[0.000000] GPR04: 0000000000000000 0000000000000000 c00000ffffff8000 c000000001863dc8
[0.000000] GPR08: 0000000000002028 0000000000000000 c00000ffffff8000 0000000000000009
[0.000000] GPR12: 0000000000000000 c0000000018f0000 000000007dc5fef0 00000000012e1220
[0.000000] GPR16: 00000000012e10a0 fffffffffffffffd 000000007dc5fef0 000000000130fcc0
[0.000000] GPR20: 0000000000000014 0000000001a80000 000000002fff0000 fffffffffffffffd
[0.000000] GPR24: 0000000001d0000c c000000000000000 c000000001641ed8 c000000001641b78
[0.000000] GPR28: 0000000000000000 0000000000000000 0000010000000000 0000000000000000
[0.000000] NIP [c00000000102caa0] emergency_stack_init+0xb8/0x118
[0.000000] LR [c00000000102ca84] emergency_stack_init+0x9c/0x118
[0.000000] Call Trace:
[0.000000] [c000000001603e30] [c00000000102ca84] emergency_stack_init+0x9c/0x118 (unreliable)
[0.000000] [c000000001603e80] [c00000000102bd54] setup_arch+0x2fc/0x388
[0.000000] [c000000001603ef0] [c000000001023ccc] start_kernel+0xa4/0x660
[0.000000] [c000000001603f90] [c00000000000b774] start_here_common+0x1c/0x528
[0.000000] Instruction dump:
[0.000000] 7ffc07b4 7fc3f378 7bfd1f24 7f84e378 4bfff6e9 3f620004 3b7bc878 7f84e378
[0.000000] 39434000 7fc3f378 e93b0000 7d29e82a <f9490028> 4bfff6c5 e93b0000 7f84e378
[0.000000] random: get_random_bytes called from print_oops_end_marker+0x6c/0xa0 with crng_init=0
[0.000000] ---[ end trace 0000000000000000 ]---
[0.000000]
[0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
-------------------------
With this patch:
# virsh start --console p8
Domain p8 started
Connected to domain p8
..
..
Fedora 27 (Twenty Seven)
Kernel 5.2.0-03136-gf709b0494ad9 on an ppc64le (hvc0)
atest-guest login:
# free -g
total used free shared buff/cache available
Mem: 1028 0 1027 0 0 1025
Swap: 0 0
Regards,
-Satheesh.
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 28ced26f2a00..4d0e2cce9cd5 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> *
> * For guests on platforms before POWER9, we clamp the it limit to 1G
> * to avoid some funky things such as RTAS bugs etc...
> + * On POWER9 we limit to 1TB in case the host erroneously told us that
> + * the RMA was >1TB. Effective address bits 0:23 are treated as zero
> + * (meaning the access is aliased to zero i.e. addr = addr % 1TB)
> + * for virtual real mode addressing and so it doesn't make sense to
> + * have an area larger than 1TB as it can't be addressed.
> */
> if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
> ppc64_rma_size = first_memblock_size;
> if (!early_cpu_has_feature(CPU_FTR_ARCH_300))
> ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
> + else
> + ppc64_rma_size = min_t(u64, ppc64_rma_size,
> + 1UL << SID_SHIFT_1T);
>
> /* Finally limit subsequent allocations */
> memblock_set_current_limit(ppc64_rma_size);
> --
> 2.13.6
>
More information about the Linuxppc-dev
mailing list