[RFC PATCH 1/2] libnvdimm: Add prctl control for disabling synchronous fault support.

Aneesh Kumar K.V aneesh.kumar at linux.ibm.com
Mon Jun 1 22:20:40 AEST 2020


On 6/1/20 5:37 PM, Michal Suchánek wrote:
> On Mon, Jun 01, 2020 at 05:31:50PM +0530, Aneesh Kumar K.V wrote:
>> On 6/1/20 3:39 PM, Jan Kara wrote:
>>> On Fri 29-05-20 16:25:35, Aneesh Kumar K.V wrote:
>>>> On 5/29/20 3:22 PM, Jan Kara wrote:
>>>>> On Fri 29-05-20 15:07:31, Aneesh Kumar K.V wrote:
>>>>>> Thanks Michal. I also missed Jeff in this email thread.
>>>>>
>>>>> And I think you'll also need some of the sched maintainers for the prctl
>>>>> bits...
>>>>>
>>>>>> On 5/29/20 3:03 PM, Michal Suchánek wrote:
>>>>>>> Adding Jan
>>>>>>>
>>>>>>> On Fri, May 29, 2020 at 11:11:39AM +0530, Aneesh Kumar K.V wrote:
>>>>>>>> With POWER10, architecture is adding new pmem flush and sync instructions.
>>>>>>>> The kernel should prevent the usage of MAP_SYNC if applications are not using
>>>>>>>> the new instructions on newer hardware.
>>>>>>>>
>>>>>>>> This patch adds a prctl option MAP_SYNC_ENABLE that can be used to enable
>>>>>>>> the usage of MAP_SYNC. The kernel config option is added to allow the user
>>>>>>>> to control whether MAP_SYNC should be enabled by default or not.
>>>>>>>>
>>>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>>>>> ...
>>>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>>>>> index 8c700f881d92..d5a9a363e81e 100644
>>>>>>>> --- a/kernel/fork.c
>>>>>>>> +++ b/kernel/fork.c
>>>>>>>> @@ -963,6 +963,12 @@ __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock);
>>>>>>>>      static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT;
>>>>>>>> +#ifdef CONFIG_ARCH_MAP_SYNC_DISABLE
>>>>>>>> +unsigned long default_map_sync_mask = MMF_DISABLE_MAP_SYNC_MASK;
>>>>>>>> +#else
>>>>>>>> +unsigned long default_map_sync_mask = 0;
>>>>>>>> +#endif
>>>>>>>> +
>>>>>
>>>>> I'm not sure CONFIG is really the right approach here. For a distro that would
>>>>> basically mean to disable MAP_SYNC for all PPC kernels unless application
>>>>> explicitly uses the right prctl. Shouldn't we rather initialize
>>>>> default_map_sync_mask on boot based on whether the CPU we run on requires
>>>>> new flush instructions or not? Otherwise the patch looks sensible.
>>>>>
>>>>
>>>> yes that is correct. We ideally want to deny MAP_SYNC only w.r.t POWER10.
>>>> But on a virtualized platform there is no easy way to detect that. We could
>>>> ideally hook this into the nvdimm driver where we look at the new compat
>>>> string ibm,persistent-memory-v2 and then disable MAP_SYNC
>>>> if we find a device with the specific value.
>>>
>>> Hum, couldn't we set some flag for nvdimm devices with
>>> "ibm,persistent-memory-v2" property and then check it during mmap(2) time
>>> and when the device has this propery and the mmap(2) caller doesn't have
>>> the prctl set, we'd disallow MAP_SYNC? That should make things mostly
>>> seamless, shouldn't it? Only apps that want to use MAP_SYNC on these
>>> devices would need to use prctl(MMF_DISABLE_MAP_SYNC, 0) but then these
>>> applications need to be aware of new instructions so this isn't that much
>>> additional burden...
>>
>> I am not sure application would want to add that much details/knowledge
>> about a platform in their code. I was expecting application to do
>>
>> #ifdef __ppc64__
>>          prctl(MAP_SYNC_ENABLE, 1, 0, 0, 0));
>> #endif
>>          a = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
>>                          MAP_SHARED_VALIDATE | MAP_SYNC, fd, 0);
>>
>>
>> For that code all the complexity that we add w.r.t ibm,persistent-memory-v2
>> is not useful. Do you see a value in making all these device specific rather
>> than a conditional on  __ppc64__?

> If the vpmem devices continue to work with the old instruction on
> POWER10 then it makes sense to make this per-device.

vPMEM doesn't have write_cache and hence it is synchronous even without 
using any specific flush instruction. The question is do we want to have
different programming steps when running on vPMEM vs a persistent PMEM 
device on ppc64.

I will work on the device specific ENABLE flag and then we can compare 
the kernel complexity against the added benefit.


-aneesh


More information about the Linuxppc-dev mailing list