[PATCH v3 1/2] pid: add pidfd_open()
Konstantin Khlebnikov
khlebnikov at yandex-team.ru
Thu Jun 27 23:57:18 AEST 2019
On 20.05.2019 18:56, Christian Brauner wrote:
> This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
> pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
> process that is created via traditional fork()/clone() calls that is only
> referenced by a PID:
>
> int pidfd = pidfd_open(1234, 0);
> ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
>
> With the introduction of pidfds through CLONE_PIDFD it is possible to
> created pidfds at process creation time.
> However, a lot of processes get created with traditional PID-based calls
> such as fork() or clone() (without CLONE_PIDFD). For these processes a
> caller can currently not create a pollable pidfd. This is a problem for
> Android's low memory killer (LMK) and service managers such as systemd.
> Both are examples of tools that want to make use of pidfds to get reliable
> notification of process exit for non-parents (pidfd polling) and race-free
> signal sending (pidfd_send_signal()). They intend to switch to this API for
> process supervision/management as soon as possible. Having no way to get
> pollable pidfds from PID-only processes is one of the biggest blockers for
> them in adopting this api. With pidfd_open() making it possible to retrieve
> pidfds for PID-based processes we enable them to adopt this api.
>
> In line with Arnd's recent changes to consolidate syscall numbers across
> architectures, I have added the pidfd_open() syscall to all architectures
> at the same time.
As I see pidfd_open() works only within current pid-namespace.
Have you considered separate argument for pidns-fd or flag for opening pid in
pid-ns referred by nsproxy->pid_ns_for_children set by setns.
This could be used for use cases I've tried to cover by syscall "translate_pid"
https://lkml.org/lkml/2018/6/1/788
>
> Signed-off-by: Christian Brauner <christian at brauner.io>
> Reviewed-by: Oleg Nesterov <oleg at redhat.com>
> Acked-by: Arnd Bergmann <arnd at arndb.de>
> Cc: "Eric W. Biederman" <ebiederm at xmission.com>
> Cc: Kees Cook <keescook at chromium.org>
> Cc: Joel Fernandes (Google) <joel at joelfernandes.org>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: Jann Horn <jannh at google.com>
> Cc: David Howells <dhowells at redhat.com>
> Cc: Andy Lutomirsky <luto at kernel.org>
> Cc: Andrew Morton <akpm at linux-foundation.org>
> Cc: Aleksa Sarai <cyphar at cyphar.com>
> Cc: Linus Torvalds <torvalds at linux-foundation.org>
> Cc: Al Viro <viro at zeniv.linux.org.uk>
> Cc: linux-api at vger.kernel.org
> ---
> v1:
> - kbuild test robot <lkp at intel.com>:
> - add missing entry for pidfd_open to arch/arm/tools/syscall.tbl
> - Oleg Nesterov <oleg at redhat.com>:
> - use simpler thread-group leader check
> v2:
> - Oleg Nesterov <oleg at redhat.com>:
> - avoid using additional variable
> - remove unneeded comment
> - Arnd Bergmann <arnd at arndb.de>:
> - switch from 428 to 434 since the new mount api has taken it
> - bump syscall numbers in arch/arm64/include/asm/unistd.h
> - Joel Fernandes (Google) <joel at joelfernandes.org>:
> - switch from ESRCH to EINVAL when the passed-in pid does not refer to a
> thread-group leader
> - Christian Brauner <christian at brauner.io>:
> - rebase on v5.2-rc1
> - adapt syscall number to account for new mount api syscalls
> v3:
> - Arnd Bergmann <arnd at arndb.de>:
> - add missing syscall entries for mips-o32 and mips-n64
> ---
> arch/alpha/kernel/syscalls/syscall.tbl | 1 +
> arch/arm/tools/syscall.tbl | 1 +
> arch/arm64/include/asm/unistd.h | 2 +-
> arch/arm64/include/asm/unistd32.h | 2 +
> arch/ia64/kernel/syscalls/syscall.tbl | 1 +
> arch/m68k/kernel/syscalls/syscall.tbl | 1 +
> arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
> arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
> arch/parisc/kernel/syscalls/syscall.tbl | 1 +
> arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
> arch/s390/kernel/syscalls/syscall.tbl | 1 +
> arch/sh/kernel/syscalls/syscall.tbl | 1 +
> arch/sparc/kernel/syscalls/syscall.tbl | 1 +
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
> include/linux/pid.h | 1 +
> include/linux/syscalls.h | 1 +
> include/uapi/asm-generic/unistd.h | 4 +-
> kernel/fork.c | 2 +-
> kernel/pid.c | 43 +++++++++++++++++++++
> 23 files changed, 68 insertions(+), 3 deletions(-)
>
> diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
> index 9e7704e44f6d..1db9bbcfb84e 100644
> --- a/arch/alpha/kernel/syscalls/syscall.tbl
> +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> @@ -473,3 +473,4 @@
> 541 common fsconfig sys_fsconfig
> 542 common fsmount sys_fsmount
> 543 common fspick sys_fspick
> +544 common pidfd_open sys_pidfd_open
> diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> index aaf479a9e92d..81e6e1817c45 100644
> --- a/arch/arm/tools/syscall.tbl
> +++ b/arch/arm/tools/syscall.tbl
> @@ -447,3 +447,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
> index 70e6882853c0..e8f7d95a1481 100644
> --- a/arch/arm64/include/asm/unistd.h
> +++ b/arch/arm64/include/asm/unistd.h
> @@ -44,7 +44,7 @@
> #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
> #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
>
> -#define __NR_compat_syscalls 434
> +#define __NR_compat_syscalls 435
> #endif
>
> #define __ARCH_WANT_SYS_CLONE
> diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
> index c39e90600bb3..7a3158ccd68e 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
> __SYSCALL(__NR_fsmount, sys_fsmount)
> #define __NR_fspick 433
> __SYSCALL(__NR_fspick, sys_fspick)
> +#define __NR_pidfd_open 434
> +__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
>
> /*
> * Please add new compat syscalls above this comment and update
> diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
> index e01df3f2f80d..ecc44926737b 100644
> --- a/arch/ia64/kernel/syscalls/syscall.tbl
> +++ b/arch/ia64/kernel/syscalls/syscall.tbl
> @@ -354,3 +354,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
> index 7e3d0734b2f3..9a3eb2558568 100644
> --- a/arch/m68k/kernel/syscalls/syscall.tbl
> +++ b/arch/m68k/kernel/syscalls/syscall.tbl
> @@ -433,3 +433,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
> index 26339e417695..ad706f83c755 100644
> --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> @@ -439,3 +439,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index 0e2dd68ade57..97035e19ad03 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -372,3 +372,4 @@
> 431 n32 fsconfig sys_fsconfig
> 432 n32 fsmount sys_fsmount
> 433 n32 fspick sys_fspick
> +434 n32 pidfd_open sys_pidfd_open
> diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
> index 5eebfa0d155c..d7292722d3b0 100644
> --- a/arch/mips/kernel/syscalls/syscall_n64.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
> @@ -348,3 +348,4 @@
> 431 n64 fsconfig sys_fsconfig
> 432 n64 fsmount sys_fsmount
> 433 n64 fspick sys_fspick
> +434 n64 pidfd_open sys_pidfd_open
> diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
> index 3cc1374e02d0..dba084c92f14 100644
> --- a/arch/mips/kernel/syscalls/syscall_o32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
> @@ -421,3 +421,4 @@
> 431 o32 fsconfig sys_fsconfig
> 432 o32 fsmount sys_fsmount
> 433 o32 fspick sys_fspick
> +434 o32 pidfd_open sys_pidfd_open
> diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
> index c9e377d59232..5022b9e179c2 100644
> --- a/arch/parisc/kernel/syscalls/syscall.tbl
> +++ b/arch/parisc/kernel/syscalls/syscall.tbl
> @@ -430,3 +430,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
> index 103655d84b4b..f2c3bda2d39f 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -515,3 +515,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> index e822b2964a83..6ebacfeaf853 100644
> --- a/arch/s390/kernel/syscalls/syscall.tbl
> +++ b/arch/s390/kernel/syscalls/syscall.tbl
> @@ -436,3 +436,4 @@
> 431 common fsconfig sys_fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount sys_fsmount
> 433 common fspick sys_fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open sys_pidfd_open
> diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> index 016a727d4357..834c9c7d79fa 100644
> --- a/arch/sh/kernel/syscalls/syscall.tbl
> +++ b/arch/sh/kernel/syscalls/syscall.tbl
> @@ -436,3 +436,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
> index e047480b1605..c58e71f21129 100644
> --- a/arch/sparc/kernel/syscalls/syscall.tbl
> +++ b/arch/sparc/kernel/syscalls/syscall.tbl
> @@ -479,3 +479,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index ad968b7bac72..43e4429a5272 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -438,3 +438,4 @@
> 431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig
> 432 i386 fsmount sys_fsmount __ia32_sys_fsmount
> 433 i386 fspick sys_fspick __ia32_sys_fspick
> +434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index b4e6f9e6204a..1bee0a77fdd3 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -355,6 +355,7 @@
> 431 common fsconfig __x64_sys_fsconfig
> 432 common fsmount __x64_sys_fsmount
> 433 common fspick __x64_sys_fspick
> +434 common pidfd_open __x64_sys_pidfd_open
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
> index 5fa0ee1c8e00..782b81945ccc 100644
> --- a/arch/xtensa/kernel/syscalls/syscall.tbl
> +++ b/arch/xtensa/kernel/syscalls/syscall.tbl
> @@ -404,3 +404,4 @@
> 431 common fsconfig sys_fsconfig
> 432 common fsmount sys_fsmount
> 433 common fspick sys_fspick
> +434 common pidfd_open sys_pidfd_open
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index 3c8ef5a199ca..c938a92eab99 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -67,6 +67,7 @@ struct pid
> extern struct pid init_struct_pid;
>
> extern const struct file_operations pidfd_fops;
> +extern int pidfd_create(struct pid *pid);
>
> static inline struct pid *get_pid(struct pid *pid)
> {
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index e2870fe1be5b..989055e0b501 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock,
> struct old_timex32 __user *tx);
> asmlinkage long sys_syncfs(int fd);
> asmlinkage long sys_setns(int fd, int nstype);
> +asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags);
> asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg,
> unsigned int vlen, unsigned flags);
> asmlinkage long sys_process_vm_readv(pid_t pid,
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index a87904daf103..e5684a4512c0 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
> __SYSCALL(__NR_fsmount, sys_fsmount)
> #define __NR_fspick 433
> __SYSCALL(__NR_fspick, sys_fspick)
> +#define __NR_pidfd_open 434
> +__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 434
> +#define __NR_syscalls 435
>
> /*
> * 32 bit systems traditionally used different
> diff --git a/kernel/fork.c b/kernel/fork.c
> index b4cba953040a..c3df226f47a1 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1724,7 +1724,7 @@ const struct file_operations pidfd_fops = {
> * Return: On success, a cloexec pidfd is returned.
> * On error, a negative errno number will be returned.
> */
> -static int pidfd_create(struct pid *pid)
> +int pidfd_create(struct pid *pid)
> {
> int fd;
>
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 89548d35eefb..8fc9d94f6ac1 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -37,6 +37,7 @@
> #include <linux/syscalls.h>
> #include <linux/proc_ns.h>
> #include <linux/proc_fs.h>
> +#include <linux/sched/signal.h>
> #include <linux/sched/task.h>
> #include <linux/idr.h>
>
> @@ -450,6 +451,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
> return idr_get_next(&ns->idr, &nr);
> }
>
> +/**
> + * pidfd_open() - Open new pid file descriptor.
> + *
> + * @pid: pid for which to retrieve a pidfd
> + * @flags: flags to pass
> + *
> + * This creates a new pid file descriptor with the O_CLOEXEC flag set for
> + * the process identified by @pid. Currently, the process identified by
> + * @pid must be a thread-group leader. This restriction currently exists
> + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
> + * be used with CLONE_THREAD) and pidfd polling (only supports thread group
> + * leaders).
> + *
> + * Return: On success, a cloexec pidfd is returned.
> + * On error, a negative errno number will be returned.
> + */
> +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> +{
> + int fd, ret;
> + struct pid *p;
> +
> + if (flags)
> + return -EINVAL;
> +
> + if (pid <= 0)
> + return -EINVAL;
> +
> + p = find_get_pid(pid);
> + if (!p)
> + return -ESRCH;
> +
> + ret = 0;
> + rcu_read_lock();
> + if (!pid_task(p, PIDTYPE_TGID))
> + ret = -EINVAL;
> + rcu_read_unlock();
> +
> + fd = ret ?: pidfd_create(p);
> + put_pid(p);
> + return fd;
> +}
> +
> void __init pid_idr_init(void)
> {
> /* Verify no one has done anything silly: */
>
More information about the Linuxppc-dev
mailing list