[PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

Michael Roth mdroth at linux.vnet.ibm.com
Sat Dec 8 02:36:41 AEDT 2018


Quoting Michael Ellerman (2018-12-07 06:31:13)
> Michael Roth <mdroth at linux.vnet.ibm.com> writes:
> 
> > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
> > JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
> > and is adjusted again at init time if MODULES_VADDR is defined.
> >
> > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
> 
> But maybe it should be, I don't know why we don't define it.
> 
> > the compile-time default at boot-time, which is 0x9c400000 when
> > using 64K page size. This overflows the signed 32-bit bpf_jit_limit
> > value:
> >
> >   root at ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
> >   -1673527296
> >
> > and can cause various unexpected failures throughout the network
> > stack. In one case `strace dhclient eth0` reported:
> >
> >   setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524)
> >
> > and similar failures can be seen with tools like tcpdump. This doesn't
> > always reproduce however, and I'm not sure why. The more consistent
> > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
> > would time out on systemd/netplan configuring a virtio-net NIC with no
> > noticeable errors in the logs.
> >
> > Fix this by limiting the compile-time default for bpf_jit_limit to
> > INT_MAX.
> 
> INT_MAX is a lot more than (4k * 40000), so I guess I'm not clear on
> whether we should be using PAGE_SIZE here at all. I guess each BPF
> program uses at least one page is the thinking?

That seems to be the case, at least, the max number of minimum-sized
allocations would be less on ppc64 since the allocations are always at
least PAGE_SIZE in size. The init-time default also limits to INT_MAX,
so it seemed consistent to do that here too.

> 
> Thanks for tracking this down. For some reason none of my ~10 test boxes
> have hit this, perhaps I don't have new enough userspace?

I'm not too sure, I would've thought things like the dhclient case in
the commit log would fail every time, but sometimes I need to reboot the
guest before I start seeing the behavior. Maybe there's something special
about when JIT allocations are actually done that can affect
reproducibility?

In my case at least the virtio-net networking timeout was consistent
enough for a bisect, but maybe it depends on the specific network
configuration (single NIC, basic DHCP through netplan/systemd in my case).

> 
> You don't mention why you needed to add BPF_MIN(), I assume because the
> kernel version of min() has gotten too complicated to work here?

I wasn't sure if it was safe here or not, so I tried looking at other
users and came across:

mm/vmalloc.c:777:#define VMAP_MIN(x, y)		((x) < (y) ? (x) : (y)) /* can't use min() */

I'm not sure what the reasoning was (or whether it still applies), but I
figured it was safer to do the same here. Maybe Nick still recalls?

> 
> Daniel I assume you'll merge this via your tree?
> 
> cheers
> 
> > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
> > Cc: linuxppc-dev at ozlabs.org
> > Cc: Daniel Borkmann <daniel at iogearbox.net>
> > Cc: Sandipan Das <sandipan at linux.ibm.com>
> > Cc: Alexei Starovoitov <ast at kernel.org>
> > Signed-off-by: Michael Roth <mdroth at linux.vnet.ibm.com>
> > ---
> >  kernel/bpf/core.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index b1a3545d0ec8..55de4746cdfd 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
> >  }
> >  
> >  #ifdef CONFIG_BPF_JIT
> > -# define BPF_JIT_LIMIT_DEFAULT       (PAGE_SIZE * 40000)
> > +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
> > +# define BPF_JIT_LIMIT_DEFAULT       BPF_MIN((PAGE_SIZE * 40000), INT_MAX)
> >  
> >  /* All BPF JIT sysctl knobs here. */
> >  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
> > -- 
> > 2.17.1
> 


More information about the Linuxppc-dev mailing list