Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"

Arnd Bergmann arnd at kernel.org
Thu Oct 22 20:16:54 AEDT 2020


On Thu, Oct 22, 2020 at 10:26 AM Greg KH <gregkh at linuxfoundation.org> wrote:
> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
> > On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
> > > On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> > >
> > > I can't really figure out what the regression is, only that this commit
> > > triggers a "large Android system binary" from working properly.  There's
> > > no kernel log messages anywhere, and I don't have any way to strace the
> > > thing in the testing framework, so any hints that people can provide
> > > would be most appreciated.
> >
> > It's a pure move - modulo changed line breaks in the argument lists
> > the functions involved are identical before and after that (just checked
> > that directly, by checking out the trees before and after, extracting two
> > functions in question from fs/read_write.c and lib/iov_iter.c (before and
> > after, resp.) and checking the diff between those.
> >
> > How certain is your bisection?
>
> The bisection is very reproducable.
>
> But, this looks now to be a compiler bug.  I'm using the latest version
> of clang and if I put "noinline" at the front of the function,
> everything works.
>
> Nick, any ideas here as to who I should report this to?
>
> I'll work on a fixup patch for the Android kernel tree to see if I can
> work around it there, but others will hit this in Linus's tree sooner or
> later...

I see that Christoph rewrote the function again in bfdc59701d6d
("iov_iter: refactor rw_copy_check_uvector and import_iovec"),
do you know if the current mainline version is also affected?

Do you know if it happens across multiple architectures or might
be specific to either x86 or arm64?

https://bugs.llvm.org/ is obviously the place for reporting the
issue if it does turn out to be a bug in clang, but that requires
a specific thing going wrong in the output.

One idea I have for debugging it is to sprinkle the inlined
function with lots of barrier()s to prevent a lot of the optimizations.
If that solves the issue, you can bisect through those until you
find one barrier that makes the difference between working and
broken and then look at diff of the assembler output.

        Arnd


More information about the Linuxppc-dev mailing list