[RFC PATCH 1/4] libnvdimm/namespace: Make namespace size validation arch dependent
Dan Williams
dan.j.williams at intel.com
Tue Oct 29 16:30:01 AEDT 2019
On Mon, Oct 28, 2019 at 9:35 PM Aneesh Kumar K.V
<aneesh.kumar at linux.ibm.com> wrote:
>
> On 10/29/19 4:38 AM, Dan Williams wrote:
> > On Mon, Oct 28, 2019 at 2:48 AM Aneesh Kumar K.V
> > <aneesh.kumar at linux.ibm.com> wrote:
> >>
> >> The page size used to map the namespace is arch dependent. For example
> >> architectures like ppc64 use 16MB page size for direct-mapping. If the namespace
> >> size is not aligned to the mapping page size, we can observe kernel crash
> >> during namespace init and destroy.
> >>
> >> This is due to kernel doing partial map/unmap of the resource range
> >>
> >> BUG: Unable to handle kernel data access at 0xc001000406000000
> >> Faulting instruction address: 0xc000000000090790
> >> NIP [c000000000090790] arch_add_memory+0xc0/0x130
> >> LR [c000000000090744] arch_add_memory+0x74/0x130
> >> Call Trace:
> >> arch_add_memory+0x74/0x130 (unreliable)
> >> memremap_pages+0x74c/0xa30
> >> devm_memremap_pages+0x3c/0xa0
> >> pmem_attach_disk+0x188/0x770
> >> nvdimm_bus_probe+0xd8/0x470
> >> really_probe+0x148/0x570
> >> driver_probe_device+0x19c/0x1d0
> >> device_driver_attach+0xcc/0x100
> >> bind_store+0x134/0x1c0
> >> drv_attr_store+0x44/0x60
> >> sysfs_kf_write+0x74/0xc0
> >> kernfs_fop_write+0x1b4/0x290
> >> __vfs_write+0x3c/0x70
> >> vfs_write+0xd0/0x260
> >> ksys_write+0xdc/0x130
> >> system_call+0x5c/0x68
> >>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
> >> ---
> >> arch/arm64/mm/flush.c | 11 +++++++++++
> >> arch/powerpc/lib/pmem.c | 21 +++++++++++++++++++--
> >> arch/x86/mm/pageattr.c | 12 ++++++++++++
> >> include/linux/libnvdimm.h | 1 +
> >> 4 files changed, 43 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> >> index ac485163a4a7..90c54c600023 100644
> >> --- a/arch/arm64/mm/flush.c
> >> +++ b/arch/arm64/mm/flush.c
> >> @@ -91,4 +91,15 @@ void arch_invalidate_pmem(void *addr, size_t size)
> >> __inval_dcache_area(addr, size);
> >> }
> >> EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
> >> +
> >> +unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, unsigned long size)
> >> +{
> >> + u32 remainder;
> >> +
> >> + div_u64_rem(size, PAGE_SIZE * ndr_mappings, &remainder);
> >> + if (remainder)
> >> + return PAGE_SIZE * ndr_mappings;
> >> + return 0;
> >> +}
> >> +EXPORT_SYMBOL_GPL(arch_validate_namespace_size);
> >> #endif
> >> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> >> index 377712e85605..2e661a08dae5 100644
> >> --- a/arch/powerpc/lib/pmem.c
> >> +++ b/arch/powerpc/lib/pmem.c
> >> @@ -17,14 +17,31 @@ void arch_wb_cache_pmem(void *addr, size_t size)
> >> unsigned long start = (unsigned long) addr;
> >> flush_dcache_range(start, start + size);
> >> }
> >> -EXPORT_SYMBOL(arch_wb_cache_pmem);
> >> +EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
> >>
> >> void arch_invalidate_pmem(void *addr, size_t size)
> >> {
> >> unsigned long start = (unsigned long) addr;
> >> flush_dcache_range(start, start + size);
> >> }
> >> -EXPORT_SYMBOL(arch_invalidate_pmem);
> >> +EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
> >> +
> >> +unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, unsigned long size)
> >> +{
> >> + u32 remainder;
> >> + unsigned long linear_map_size;
> >> +
> >> + if (radix_enabled())
> >> + linear_map_size = PAGE_SIZE;
> >> + else
> >> + linear_map_size = (1UL << mmu_psize_defs[mmu_linear_psize].shift);
> >
> > This seems more a "supported_alignments" problem, and less a namespace
> > size or PAGE_SIZE problem, because if the starting address is
> > misaligned this size validation can still succeed when it shouldn't.
> >
>
>
> Isn't supported_alignments an indication of how user want the namespace
> to be mapped to applications? Ie, with the above restrictions we can
> still do both 64K and 16M mapping of the namespace to userspace.
True, for the pfn device and the device-dax mapping size, but I'm
suggesting adding another instance of alignment control at the raw
namespace level. That would need to be disconnected from the
device-dax page mapping granularity.
>
> Also for supported alignment the huge page mapping is further dependent
> on the THP feature.
>
> The restrictions here are mostly w.r.t the direct-mapping page size used
> by some architecture.
Right, that's a base requirement for the namespace that can be
independent of the user page mapping size for device-dax.
More information about the Linuxppc-dev
mailing list