BUG: Kernel NULL pointer dereference on write at 0x00000000 (rtmsg_ifinfo_build_skb)

Zhouyi Zhou zhouzhouyi at gmail.com
Wed Feb 9 07:10:26 AEDT 2022


Hi Paul

Below are my preliminary test results tested on PPC VM supplied by
Open source lab of Oregon State University, thank you for your
support!

[Preliminary test results on ppc64le virtual guest]

1. Conclusion
Some other kernel configuration besides RCU may lead to "BUG: Kernel
NULL pointer dereference" at boot


2. Test Environment
2.1 host hardware
8 core ppc64le virtual guest with 16G ram and 160G disk
cpu        : POWER9 (architected), altivec supported
clock        : 2200.000000MHz
revision    : 2.2 (pvr 004e 1202)

2.2 host software
Operating System: Ubuntu 20.04.3 LTS, Compiler: gcc version 9.3.0


3. Test Procedure
3.1 kernel source
next-20220203

3.2 build and boot the kernel with CONFIG_DRM_BOCHS=m and
CONFIG_RCU_TORTURE_TEST=y
test result: "BUG: Kernel NULL pointer dereference" at boot
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.bochs.torture
boot msg: http://154.223.142.244/Feb2022/dmesg.torture.bochs

3.3 build and boot the kernel with CONFIG_DRM_BOCHS=m
test result: "BUG: Kernel NULL pointer dereference" at boot
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.bochs
boot msg: http://154.223.142.244/Feb2022/dmesg.bochs

3.4 build and boot the kernel with CONFIG_RCU_TORTURE_TEST=y (without
CONFIG_DRM_BOCHS)
test result: boot without error
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next.torture
boot msg: http://154.223.142.244/Feb2022/dmesg.torture

3.5 build and boot the kernel with CONFIG_RCU_TORTURE_TEST=m (without
CONFIG_DRM_BOCHS)
test result: boot without error
config file: http://154.223.142.244/Feb2022/config-5.17.0-rc2-next
boot msg: http://154.223.142.244/Feb2022/dmesg

4. Acknowledgement
Thank Open source lab of Oregon State University and Paul Menzel and
all other community members who support my tiny research.

Thanks
Zhouyi

On Wed, Feb 2, 2022 at 10:39 AM Zhouyi Zhou <zhouzhouyi at gmail.com> wrote:
>
> Thank Paul for your encouragement!
>
> On Wed, Feb 2, 2022 at 1:50 AM Paul E. McKenney <paulmck at kernel.org> wrote:
> >
> > On Mon, Jan 31, 2022 at 09:08:40AM +0800, Zhouyi Zhou wrote:
> > > Thank Paul for joining us!
> > >
> > > On Mon, Jan 31, 2022 at 1:44 AM Paul E. McKenney <paulmck at kernel.org> wrote:
> > > >
> > > > On Sun, Jan 30, 2022 at 09:24:44PM +0800, Zhouyi Zhou wrote:
> > > > > Dear Paul
> > > > >
> > > > > On Sun, Jan 30, 2022 at 4:19 PM Paul Menzel <pmenzel at molgen.mpg.de> wrote:
> > > > > >
> > > > > > Dear Zhouyi,
> > > > > >
> > > > > >
> > > > > > Am 30.01.22 um 01:21 schrieb Zhouyi Zhou:
> > > > > >
> > > > > > > Thank you for your instructions, I learned a lot from this process.
> > > > > >
> > > > > > Same on my end.
> > > > > >
> > > > > > > On Sun, Jan 30, 2022 at 12:52 AM Paul Menzel <pmenzel at molgen.mpg.de> wrote:
> > > > > >
> > > > > > >> Am 29.01.22 um 03:23 schrieb Zhouyi Zhou:
> > > > > > >>
> > > > > > >>> I don't have an IBM machine, but I tried to analyze the problem using
> > > > > > >>> my x86_64 kvm virtual machine, I can't reproduce the bug using my
> > > > > > >>> x86_64 kvm virtual machine.
> > > > > > >>
> > > > > > >> No idea, if it’s architecture specific.
> > > > > > >>
> > > > > > >>> I saw the panic is caused by registration of sit device (A sit device
> > > > > > >>> is a type of virtual network device that takes our IPv6 traffic,
> > > > > > >>> encapsulates/decapsulates it in IPv4 packets, and sends/receives it
> > > > > > >>> over the IPv4 Internet to another host)
> > > > > > >>>
> > > > > > >>> sit device is registered in function sit_init_net:
> > > > > > >>> 1895    static int __net_init sit_init_net(struct net *net)
> > > > > > >>> 1896    {
> > > > > > >>> 1897        struct sit_net *sitn = net_generic(net, sit_net_id);
> > > > > > >>> 1898        struct ip_tunnel *t;
> > > > > > >>> 1899        int err;
> > > > > > >>> 1900
> > > > > > >>> 1901        sitn->tunnels[0] = sitn->tunnels_wc;
> > > > > > >>> 1902        sitn->tunnels[1] = sitn->tunnels_l;
> > > > > > >>> 1903        sitn->tunnels[2] = sitn->tunnels_r;
> > > > > > >>> 1904        sitn->tunnels[3] = sitn->tunnels_r_l;
> > > > > > >>> 1905
> > > > > > >>> 1906        if (!net_has_fallback_tunnels(net))
> > > > > > >>> 1907            return 0;
> > > > > > >>> 1908
> > > > > > >>> 1909        sitn->fb_tunnel_dev = alloc_netdev(sizeof(struct ip_tunnel), "sit0",
> > > > > > >>> 1910                           NET_NAME_UNKNOWN,
> > > > > > >>> 1911                           ipip6_tunnel_setup);
> > > > > > >>> 1912        if (!sitn->fb_tunnel_dev) {
> > > > > > >>> 1913            err = -ENOMEM;
> > > > > > >>> 1914            goto err_alloc_dev;
> > > > > > >>> 1915        }
> > > > > > >>> 1916        dev_net_set(sitn->fb_tunnel_dev, net);
> > > > > > >>> 1917        sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
> > > > > > >>> 1918        /* FB netdevice is special: we have one, and only one per netns.
> > > > > > >>> 1919         * Allowing to move it to another netns is clearly unsafe.
> > > > > > >>> 1920         */
> > > > > > >>> 1921        sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > > > >>> 1922
> > > > > > >>> 1923        err = register_netdev(sitn->fb_tunnel_dev);
> > > > > > >>> register_netdev on line 1923 will call if_nlmsg_size indirectly.
> > > > > > >>>
> > > > > > >>> On the other hand, the function that calls the paniced strlen is if_nlmsg_size:
> > > > > > >>> (gdb) disassemble if_nlmsg_size
> > > > > > >>> Dump of assembler code for function if_nlmsg_size:
> > > > > > >>>      0xffffffff81a0dc20 <+0>:    nopl   0x0(%rax,%rax,1)
> > > > > > >>>      0xffffffff81a0dc25 <+5>:    push   %rbp
> > > > > > >>>      0xffffffff81a0dc26 <+6>:    push   %r15
> > > > > > >>>      0xffffffff81a0dd04 <+228>:    je     0xffffffff81a0de20 <if_nlmsg_size+512>
> > > > > > >>>      0xffffffff81a0dd0a <+234>:    mov    0x10(%rbp),%rdi
> > > > > > >>>      ...
> > > > > > >>>    => 0xffffffff81a0dd0e <+238>:    callq  0xffffffff817532d0 <strlen>
> > > > > > >>>      0xffffffff81a0dd13 <+243>:    add    $0x10,%eax
> > > > > > >>>      0xffffffff81a0dd16 <+246>:    movslq %eax,%r12
> > > > > > >>
> > > > > > >> Excuse my ignorance, would that look the same for ppc64le?
> > > > > > >> Unfortunately, I didn’t save the problematic `vmlinuz` file, but on a
> > > > > > >> current build (without rcutorture) I have the line below, where strlen
> > > > > > >> shows up.
> > > > > > >>
> > > > > > >>       (gdb) disassemble if_nlmsg_size
> > > > > > >>       […]
> > > > > > >>       0xc000000000f7f82c <+332>: bl      0xc000000000a10e30 <strlen>
> > > > > > >>       […]
> > > > > > >>
> > > > > > >>> and the C code for 0xffffffff81a0dd0e is following (line 524):
> > > > > > >>> 515    static size_t rtnl_link_get_size(const struct net_device *dev)
> > > > > > >>> 516    {
> > > > > > >>> 517        const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> > > > > > >>> 518        size_t size;
> > > > > > >>> 519
> > > > > > >>> 520        if (!ops)
> > > > > > >>> 521            return 0;
> > > > > > >>> 522
> > > > > > >>> 523        size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > > > >>> 524               nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > > > > >>
> > > > > > >> How do I connect the disassemby output with the corresponding line?
> > > > > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64.
> > > > > > >
> > > > > > > gdb-multiarch ./vmlinux
> > > > > > > (gdb)disassemble if_nlmsg_size
> > > > > > > [...]
> > > > > > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > > > > > [...]
> > > > > > > (gdb) break *0xc00000000191bf40
> > > > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > > > >
> > > > > > > But in include/net/netlink.h:1112, I can't find the call to strlen
> > > > > > > 1110static inline int nla_total_size(int payload)
> > > > > > > 1111{
> > > > > > > 1112        return NLA_ALIGN(nla_attr_size(payload));
> > > > > > > 1113}
> > > > > > > This may be due to the compiler wrongly encode the debug information, I guess.
> > > > > >
> > > > > > `rtnl_link_get_size()` contains:
> > > > > >
> > > > > >              size = nla_total_size(sizeof(struct nlattr)) + /*
> > > > > > IFLA_LINKINFO */
> > > > > >                     nla_total_size(strlen(ops->kind) + 1);  /*
> > > > > > IFLA_INFO_KIND */
> > > > > >
> > > > > > Is that inlined(?) and the code at fault?
> > > > > Yes, that is inlined! because
> > > > > (gdb) disassemble if_nlmsg_size
> > > > > Dump of assembler code for function if_nlmsg_size:
> > > > > [...]
> > > > > 0xc00000000191bf38 <+104>:    beq     0xc00000000191c1f0 <if_nlmsg_size+800>
> > > > > 0xc00000000191bf3c <+108>:    ld      r3,16(r31)
> > > > > 0xc00000000191bf40 <+112>:    bl      0xc000000001c28ad0 <strlen>
> > > > > [...]
> > > > > (gdb)
> > > > > (gdb) break *0xc00000000191bf40
> > > > > Breakpoint 1 at 0xc00000000191bf40: file ./include/net/netlink.h, line 1112.
> > > > > (gdb) break *0xc00000000191bf38
> > > > > Breakpoint 2 at 0xc00000000191bf38: file net/core/rtnetlink.c, line 520.
> > > >
> > > > I suggest building your kernel with CONFIG_DEBUG_INFO=y if you are not
> > > > already doing so.  That gives gdb a lot more information about things
> > > > like inlining.
> > > I check my .config file, CONFIG_DEBUG_INFO=y is here:
> > > linux-next$ grep CONFIG_DEBUG_INFO .config
> > > CONFIG_DEBUG_INFO=y
> > > Then I invoke "make clean" and rebuild the kernel, the behavior of gdb
> > > and vmlinux remain unchanged, sorry for that
> >
> > Glad you were already on top of this one!
> I am very pleased to contribute my tiny effort to the process of
> making Linux better ;-)
> >
> > > I am trying to reproduce the bug on my bare metal x86_64 machines in
> > > the coming days, and am also trying to work with Mr Menzel after he
> > > comes back to the office.
> >
> > This URL used to allow community members such as yourself to request
> > access to Power systems: https://osuosl.org/services/powerdev/
> I have filled the request form on
> https://osuosl.org/services/powerdev/ and now wait for them to deploy
> the environment for me.
>
> Thanks again
> Zhouyi
> >
> > In case that helps.
> >
> >                                                         Thanx, Paul
> >
> > > Thanks
> > > Zhouyi
> > > >
> > > >                                                         Thanx, Paul
> > > >
> > > > > > >>> But ops is assigned the value of sit_link_ops in function sit_init_net
> > > > > > >>> line 1917, so I guess something must happened between the calls.
> > > > > > >>>
> > > > > > >>> Do we have KASAN in IBM machine? would KASAN help us find out what
> > > > > > >>> happened in between?
> > > > > > >>
> > > > > > >> Unfortunately, KASAN is not support on Power, I have, as far as I can
> > > > > > >> see. From `arch/powerpc/Kconfig`:
> > > > > > >>
> > > > > > >>           select HAVE_ARCH_KASAN                  if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > > >>           select HAVE_ARCH_KASAN_VMALLOC          if PPC32 && PPC_PAGE_SHIFT <= 14
> > > > > > >>
> > > > > > > en, agree, I invoke "make  menuconfig  ARCH=powerpc
> > > > > > > CC=powerpc64le-linux-gnu-gcc-9 CROSS_COMPILE=powerpc64le-linux-gnu- -j
> > > > > > > 16", I can't find KASAN under Memory Debugging, I guess we should find
> > > > > > > the bug by bisecting instead.
> > > > > >
> > > > > > I do not know, if it is a regression, as it was the first time I tried
> > > > > > to run a Linux kernel built with rcutorture on real hardware.
> > > > > I tried to add some debug statements to the kernel to locate the bug
> > > > > more accurately,  you can try it when you're not busy in the future,
> > > > > or just ignore it if the following patch looks not very effective ;-)
> > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > index 1baab07820f6..969ac7c540cc 100644
> > > > > --- a/net/core/dev.c
> > > > > +++ b/net/core/dev.c
> > > > > @@ -9707,6 +9707,9 @@ int register_netdevice(struct net_device *dev)
> > > > >       *    Prevent userspace races by waiting until the network
> > > > >       *    device is fully setup before sending notifications.
> > > > >       */
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      if (!dev->rtnl_link_ops ||
> > > > >          dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
> > > > >          rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL);
> > > > > @@ -9788,6 +9791,9 @@ int register_netdev(struct net_device *dev)
> > > > >
> > > > >      if (rtnl_lock_killable())
> > > > >          return -EINTR;
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      err = register_netdevice(dev);
> > > > >      rtnl_unlock();
> > > > >      return err;
> > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > > > > index e476403231f0..e08986ae6238 100644
> > > > > --- a/net/core/rtnetlink.c
> > > > > +++ b/net/core/rtnetlink.c
> > > > > @@ -520,6 +520,8 @@ static size_t rtnl_link_get_size(const struct
> > > > > net_device *dev)
> > > > >      if (!ops)
> > > > >          return 0;
> > > > >
> > > > > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", ops,
> > > > > +           ops->kind, __FUNCTION__);
> > > > >      size = nla_total_size(sizeof(struct nlattr)) + /* IFLA_LINKINFO */
> > > > >             nla_total_size(strlen(ops->kind) + 1);  /* IFLA_INFO_KIND */
> > > > >
> > > > > @@ -1006,6 +1008,9 @@ static size_t rtnl_proto_down_size(const struct
> > > > > net_device *dev)
> > > > >  static noinline size_t if_nlmsg_size(const struct net_device *dev,
> > > > >                       u32 ext_filter_mask)
> > > > >  {
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      return NLMSG_ALIGN(sizeof(struct ifinfomsg))
> > > > >             + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
> > > > >             + nla_total_size(IFALIASZ) /* IFLA_IFALIAS */
> > > > > @@ -3825,7 +3830,9 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type,
> > > > > struct net_device *dev,
> > > > >      struct net *net = dev_net(dev);
> > > > >      struct sk_buff *skb;
> > > > >      int err = -ENOBUFS;
> > > > > -
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      skb = nlmsg_new(if_nlmsg_size(dev, 0), flags);
> > > > >      if (skb == NULL)
> > > > >          goto errout;
> > > > > @@ -3861,7 +3868,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > > net_device *dev,
> > > > >
> > > > >      if (dev->reg_state != NETREG_REGISTERED)
> > > > >          return;
> > > > > -
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      skb = rtmsg_ifinfo_build_skb(type, dev, change, event, flags, new_nsid,
> > > > >                       new_ifindex);
> > > > >      if (skb)
> > > > > @@ -3871,6 +3880,9 @@ static void rtmsg_ifinfo_event(int type, struct
> > > > > net_device *dev,
> > > > >  void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change,
> > > > >            gfp_t flags)
> > > > >  {
> > > > > +    if (dev->rtnl_link_ops)
> > > > > +        printk(KERN_INFO "%lx IFLA_INFO_KIND  %s %s\n", dev->rtnl_link_ops,
> > > > > +               dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      rtmsg_ifinfo_event(type, dev, change, rtnl_get_event(0), flags,
> > > > >                 NULL, 0);
> > > > >  }
> > > > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> > > > > index c0b138c20992..fa5b2725811c 100644
> > > > > --- a/net/ipv6/sit.c
> > > > > +++ b/net/ipv6/sit.c
> > > > > @@ -1919,6 +1919,8 @@ static int __net_init sit_init_net(struct net *net)
> > > > >       * Allowing to move it to another netns is clearly unsafe.
> > > > >       */
> > > > >      sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
> > > > > -
> > > > > +    printk(KERN_INFO "%lx IFLA_INFO_KIND %s %s\n",
> > > > > +           sitn->fb_tunnel_dev->rtnl_link_ops,
> > > > > +           sitn->fb_tunnel_dev->rtnl_link_ops->kind, __FUNCTION__);
> > > > >      err = register_netdev(sitn->fb_tunnel_dev);
> > > > >      if (err)
> > > > >          goto err_reg_dev;
> > > > > >
> > > > > > >>> Hope I can be of more helpful.
> > > > > > >>
> > > > > > >> Some distributions support multi-arch, so they easily allow
> > > > > > >> crosscompiling for different architectures.
> > > > > > > I use "make  ARCH=powerpc CC=powerpc64le-linux-gnu-gcc-9
> > > > > > > CROSS_COMPILE=powerpc64le-linux-gnu- -j 16" to cross compile kernel
> > > > > > > for powerpc64le in my Ubuntu 20.04 x86_64. But I can't boot the
> > > > > > > compiled kernel using "qemu-system-ppc64le -M pseries -nographic -smp
> > > > > > > 4 -net none -m 4G -kernel arch/powerpc/boot/zImage". I will continue
> > > > > > > to explore it.
> > > > > >
> > > > > > Oh, that does not sound good. But I have not tried that in a long time
> > > > > > either. It’s a separate issue, but maybe some of the PPC
> > > > > > maintainers/folks could help.
> > > > > I will do further research on this later.
> > > > >
> > > > > Thanks for your time
> > > > > Kind regards
> > > > > Zhouyi
> > > > > >
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Paul


More information about the Linuxppc-dev mailing list