[RFC PATCH] seccomp: Add protection keys into seccomp_data

Tue Oct 30 09:33:54 AEDT 2018

On 10/29/18 2:55 PM, Michael Sammler wrote:
>> PKRU getting reset on signals, and the requirement now that it *can't*
>> be changed if you make syscalls probably needs to get thought about very
>> carefully before we do this, though.
> I am not sure, whether I follow you. Are you saying, that PKRU is
> currently not guaranteed to be preserved across system calls?
> This would make it very hard to use protection keys if libc does not
> save and restore the PKRU before/after systemcalls (and I am not aware
> of this).

It's preserved *across* system calls, but you have to be a bit careful
using it _inside_ the kernel.  We could context switch off to something
else, and not think that we need to restore PKRU until _just_ before we
return to userspace.

> Or do you mean, that the kernel might want to use the PKRU register for
> its own purposes while it is executing?

That, or we might keep another process's PKRU state in the register if
we don't think anyone is using it.  Now that I think about it, I think
Rik (cc'd, who was working on those patches) *had* to explicitly restore
PKRU because it's hard to tell where we might do a copy_to/from_user()
and need it.

> Then the solution you proposed in another email in this thread would
> work: instead of providing the seccomp filter with the current value of
> the PKRU (which might be different from what the user space expects) use
> the user space value which must have been saved somewhere (otherwise it
> would not be possible to restore it).

Yep, that's the worst-case scenario: either fetch PKRU out of the XSAVE
buffer (current->fpu->something), or just restore them using an existing
API before doing RDPKRU.

But, that's really an implementation detail.  The effect on the ABI and
how this might constrain future pkeys use is my bigger worry.

I'd also want to make sure that your specific use-case is compatible
with all the oddities of pkeys, like the 'clone' and signal behavior.
Some of that is spelled out here:

	http://man7.org/linux/man-pages/man7/pkeys.7.html

One thing that's a worry is that we have never said that you *can't*
write to arbitrary permissions in PKRU.  I can totally see some really
paranoid code saying, "I'm about to do something risky, so I'll turn off
access to *all* pkeys", or " turn off all access except my current
stack".  If they did that, they might also inadvertently disable access
to certain seccomp-restricted syscalls.

We can fix that up by documenting restrictions like "code should never
change the access rights of any pkey other than those that it
allocated", but that doesn't help any old code (of which I hope there is
relatively little).