[RESEND PATCH V2 1/3] Add mmap flag to request pages are locked after page fault

Tue Jun 23 00:18:06 AEST 2015

On Mon, 22 Jun 2015, Michal Hocko wrote:

> On Fri 19-06-15 12:43:33, Eric B Munson wrote:
> > On Fri, 19 Jun 2015, Michal Hocko wrote:
> > 
> > > On Thu 18-06-15 16:30:48, Eric B Munson wrote:
> > > > On Thu, 18 Jun 2015, Michal Hocko wrote:
> > > [...]
> > > > > Wouldn't it be much more reasonable and straightforward to have
> > > > > MAP_FAULTPOPULATE as a counterpart for MAP_POPULATE which would
> > > > > explicitly disallow any form of pre-faulting? It would be usable for
> > > > > other usecases than with MAP_LOCKED combination.
> > > > 
> > > > I don't see a clear case for it being more reasonable, it is one
> > > > possible way to solve the problem.
> > > 
> > > MAP_FAULTPOPULATE would be usable for other cases as well. E.g. fault
> > > around is all or nothing feature. Either all mappings (which support
> > > this) fault around or none. There is no way to tell the kernel that
> > > this particular mapping shouldn't fault around. I haven't seen such a
> > > request yet but we have seen requests to have a way to opt out from
> > > a global policy in the past (e.g. per-process opt out from THP). So
> > > I can imagine somebody will come with a request to opt out from any
> > > speculative operations on the mapped area in the future.
> > > 
> > > > But I think it leaves us in an even
> > > > more akward state WRT VMA flags.  As you noted in your fix for the
> > > > mmap() man page, one can get into a state where a VMA is VM_LOCKED, but
> > > > not present.  Having VM_LOCKONFAULT states that this was intentional, if
> > > > we go to using MAP_FAULTPOPULATE instead of MAP_LOCKONFAULT, we no
> > > > longer set VM_LOCKONFAULT (unless we want to start mapping it to the
> > > > presence of two MAP_ flags).  This can make detecting the MAP_LOCKED +
> > > > populate failure state harder.
> > > 
> > > I am not sure I understand your point here. Could you be more specific
> > > how would you check for that and what for?
> > 
> > My thought on detecting was that someone might want to know if they had
> > a VMA that was VM_LOCKED but had not been made present becuase of a
> > failure in mmap.  We don't have a way today, but adding VM_LOCKONFAULT
> > is at least explicit about what is happening which would make detecting
> > the VM_LOCKED but not present state easier. 
> 
> One could use /proc/<pid>/pagemap to query the residency.
> 
> > This assumes that
> > MAP_FAULTPOPULATE does not translate to a VMA flag, but it sounds like
> > it would have to.
> 
> Yes, it would have to have a VM flag for the vma.
> 
> > > From my understanding MAP_LOCKONFAULT is essentially
> > > MAP_FAULTPOPULATE|MAP_LOCKED with a quite obvious semantic (unlike
> > > single MAP_LOCKED unfortunately). I would love to also have
> > > MAP_LOCKED|MAP_POPULATE (aka full mlock semantic) but I am really
> > > skeptical considering how my previous attempt to make MAP_POPULATE
> > > reasonable went.
> > 
> > Are you objecting to the addition of the VMA flag VM_LOCKONFAULT, or the
> > new MAP_LOCKONFAULT flag (or both)? 
> 
> I thought the MAP_FAULTPOPULATE (or any other better name) would
> directly translate into VM_FAULTPOPULATE and wouldn't be tight to the
> locked semantic. We already have VM_LOCKED for that. The direct effect
> of the flag would be to prevent from population other than the direct
> page fault - including any speculative actions like fault around or
> read-ahead.

I like the ability to control other speculative population, but I am not
sure about overloading it with the VM_LOCKONFAULT case.  Here is my
concern.  If we are using VM_FAULTPOPULATE | VM_LOCKED to denote
LOCKONFAULT, how can we tell the difference between someone that wants
to avoid read-ahead and wants to use mlock()?  This might lead to some
interesting states with mlock() and munlock() that take flags.  For
instance, using VM_LOCKONFAULT mlock(MLOCK_ONFAULT) followed by
munlock(MLOCK_LOCKED) leaves the VMAs in the same state with
VM_LOCKONFAULT set.  If we use VM_FAULTPOPULATE, the same pair of calls
would clear VM_LOCKED, but leave VM_FAULTPOPULATE.  It may not matter in
the end, but I am concerned about the subtleties here.

> 
> > If you prefer that MAP_LOCKED |
> > MAP_FAULTPOPULATE means that VM_LOCKONFAULT is set, I am fine with that
> > instead of introducing MAP_LOCKONFAULT.  I went with the new flag
> > because to date, we have a one to one mapping of MAP_* to VM_* flags.
> > 
> > > 
> > > > If this is the preferred path for mmap(), I am fine with that. 
> > > 
> > > > However,
> > > > I would like to see the new system calls that Andrew mentioned (and that
> > > > I am testing patches for) go in as well. 
> > > 
> > > mlock with flags sounds like a good step but I am not sure it will make
> > > sense in the future. POSIX has screwed that and I am not sure how many
> > > applications would use it. This ship has sailed long time ago.
> > 
> > I don't know either, but the code is the question, right?  I know that
> > we have at least one team that wants it here.
> > 
> > > 
> > > > That way we give users the
> > > > ability to request VM_LOCKONFAULT for memory allocated using something
> > > > other than mmap.
> > > 
> > > mmap(MAP_FAULTPOPULATE); mlock() would have the same semantic even
> > > without changing mlock syscall.
> > 
> > That is true as long as MAP_FAULTPOPULATE set a flag in the VMA(s).  It
> > doesn't cover the actual case I was asking about, which is how do I get
> > lock on fault on malloc'd memory?
> 
> OK I see your point now. We would indeed need a flag argument for mlock.
> -- 
> Michal Hocko
> SUSE Labs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20150622/d8b21ade/attachment.sig>