[RFC PATCH 1/4] libnvdimm/namespace: Make namespace size validation arch dependent

Dan Williams dan.j.williams at intel.com
Tue Oct 29 16:30:01 AEDT 2019


On Mon, Oct 28, 2019 at 9:35 PM Aneesh Kumar K.V
<aneesh.kumar at linux.ibm.com> wrote:
>
> On 10/29/19 4:38 AM, Dan Williams wrote:
> > On Mon, Oct 28, 2019 at 2:48 AM Aneesh Kumar K.V
> > <aneesh.kumar at linux.ibm.com> wrote:
> >>
> >> The page size used to map the namespace is arch dependent. For example
> >> architectures like ppc64 use 16MB page size for direct-mapping. If the namespace
> >> size is not aligned to the mapping page size, we can observe kernel crash
> >> during namespace init and destroy.
> >>
> >> This is due to kernel doing partial map/unmap of the resource range
> >>
> >> BUG: Unable to handle kernel data access at 0xc001000406000000
> >> Faulting instruction address: 0xc000000000090790
> >> NIP [c000000000090790] arch_add_memory+0xc0/0x130
> >> LR [c000000000090744] arch_add_memory+0x74/0x130
> >> Call Trace:
> >>   arch_add_memory+0x74/0x130 (unreliable)
> >>   memremap_pages+0x74c/0xa30
> >>   devm_memremap_pages+0x3c/0xa0
> >>   pmem_attach_disk+0x188/0x770
> >>   nvdimm_bus_probe+0xd8/0x470
> >>   really_probe+0x148/0x570
> >>   driver_probe_device+0x19c/0x1d0
> >>   device_driver_attach+0xcc/0x100
> >>   bind_store+0x134/0x1c0
> >>   drv_attr_store+0x44/0x60
> >>   sysfs_kf_write+0x74/0xc0
> >>   kernfs_fop_write+0x1b4/0x290
> >>   __vfs_write+0x3c/0x70
> >>   vfs_write+0xd0/0x260
> >>   ksys_write+0xdc/0x130
> >>   system_call+0x5c/0x68
> >>
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
> >> ---
> >>   arch/arm64/mm/flush.c     | 11 +++++++++++
> >>   arch/powerpc/lib/pmem.c   | 21 +++++++++++++++++++--
> >>   arch/x86/mm/pageattr.c    | 12 ++++++++++++
> >>   include/linux/libnvdimm.h |  1 +
> >>   4 files changed, 43 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> >> index ac485163a4a7..90c54c600023 100644
> >> --- a/arch/arm64/mm/flush.c
> >> +++ b/arch/arm64/mm/flush.c
> >> @@ -91,4 +91,15 @@ void arch_invalidate_pmem(void *addr, size_t size)
> >>          __inval_dcache_area(addr, size);
> >>   }
> >>   EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
> >> +
> >> +unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, unsigned long size)
> >> +{
> >> +       u32 remainder;
> >> +
> >> +       div_u64_rem(size, PAGE_SIZE * ndr_mappings, &remainder);
> >> +       if (remainder)
> >> +               return PAGE_SIZE * ndr_mappings;
> >> +       return 0;
> >> +}
> >> +EXPORT_SYMBOL_GPL(arch_validate_namespace_size);
> >>   #endif
> >> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
> >> index 377712e85605..2e661a08dae5 100644
> >> --- a/arch/powerpc/lib/pmem.c
> >> +++ b/arch/powerpc/lib/pmem.c
> >> @@ -17,14 +17,31 @@ void arch_wb_cache_pmem(void *addr, size_t size)
> >>          unsigned long start = (unsigned long) addr;
> >>          flush_dcache_range(start, start + size);
> >>   }
> >> -EXPORT_SYMBOL(arch_wb_cache_pmem);
> >> +EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);
> >>
> >>   void arch_invalidate_pmem(void *addr, size_t size)
> >>   {
> >>          unsigned long start = (unsigned long) addr;
> >>          flush_dcache_range(start, start + size);
> >>   }
> >> -EXPORT_SYMBOL(arch_invalidate_pmem);
> >> +EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
> >> +
> >> +unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, unsigned long size)
> >> +{
> >> +       u32 remainder;
> >> +       unsigned long linear_map_size;
> >> +
> >> +       if (radix_enabled())
> >> +               linear_map_size = PAGE_SIZE;
> >> +       else
> >> +               linear_map_size = (1UL << mmu_psize_defs[mmu_linear_psize].shift);
> >
> > This seems more a "supported_alignments" problem, and less a namespace
> > size or PAGE_SIZE problem, because if the starting address is
> > misaligned this size validation can still succeed when it shouldn't.
> >
>
>
> Isn't supported_alignments an indication of how user want the namespace
> to be mapped to applications?  Ie, with the above restrictions we can
> still do both 64K and 16M mapping of the namespace to userspace.

True, for the pfn device and the device-dax mapping size, but I'm
suggesting adding another instance of alignment control at the raw
namespace level. That would need to be disconnected from the
device-dax page mapping granularity.

>
> Also for supported alignment the huge page mapping is further  dependent
> on the THP feature.
>
> The restrictions here are mostly w.r.t the direct-mapping page size used
> by some architecture.

Right, that's a base requirement for the namespace that can be
independent of the user page mapping size for device-dax.


More information about the Linuxppc-dev mailing list