Buggy commit tracked to: "Re: [PATCH 2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c"

Thu Oct 22 20:25:25 AEDT 2020

On 22.10.20 11:19, David Hildenbrand wrote:
> On 22.10.20 11:01, Greg KH wrote:
>> On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:
>>> On 22.10.20 10:40, David Laight wrote:
>>>> From: David Hildenbrand
>>>>> Sent: 22 October 2020 09:35
>>>>>
>>>>> On 22.10.20 10:26, Greg KH wrote:
>>>>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
>>>>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
>>>>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
>>>>>>>>> From: David Laight <David.Laight at ACULAB.COM>
>>>>>>>>>
>>>>>>>>> This lets the compiler inline it into import_iovec() generating
>>>>>>>>> much better code.
>>>>>>>>>
>>>>>>>>> Signed-off-by: David Laight <david.laight at aculab.com>
>>>>>>>>> Signed-off-by: Christoph Hellwig <hch at lst.de>
>>>>>>>>> ---
>>>>>>>>>  fs/read_write.c | 179 ------------------------------------------------
>>>>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)
>>>>>>>>
>>>>>>>> Strangely, this commit causes a regression in Linus's tree right now.
>>>>>>>>
>>>>>>>> I can't really figure out what the regression is, only that this commit
>>>>>>>> triggers a "large Android system binary" from working properly.  There's
>>>>>>>> no kernel log messages anywhere, and I don't have any way to strace the
>>>>>>>> thing in the testing framework, so any hints that people can provide
>>>>>>>> would be most appreciated.
>>>>>>>
>>>>>>> It's a pure move - modulo changed line breaks in the argument lists
>>>>>>> the functions involved are identical before and after that (just checked
>>>>>>> that directly, by checking out the trees before and after, extracting two
>>>>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
>>>>>>> after, resp.) and checking the diff between those.
>>>>>>>
>>>>>>> How certain is your bisection?
>>>>>>
>>>>>> The bisection is very reproducable.
>>>>>>
>>>>>> But, this looks now to be a compiler bug.  I'm using the latest version
>>>>>> of clang and if I put "noinline" at the front of the function,
>>>>>> everything works.
>>>>>
>>>>> Well, the compiler can do more invasive optimizations when inlining. If
>>>>> you have buggy code that relies on some unspecified behavior, inlining
>>>>> can change the behavior ... but going over that code, there isn't too
>>>>> much action going on. At least nothing screamed at me.
>>>>
>>>> Apart from all the optimisations that get rid off the 'pass be reference'
>>>> parameters and strange conditional tests.
>>>> Plenty of scope for the compiler getting it wrong.
>>>> But nothing even vaguely illegal.
>>>
>>> Not the first time that people blame the compiler to then figure out
>>> that something else is wrong ... but maybe this time is different :)
>>
>> I agree, I hate to blame the compiler, that's almost never the real
>> problem, but this one sure "feels" like it.
>>
>> I'm running some more tests, trying to narrow things down as just adding
>> a "noinline" to the function that got moved here doesn't work on Linus's
>> tree at the moment because the function was split into multiple
>> functions.
>>
>> Give me a few hours...
> 
> I might be wrong but
> 
> a) import_iovec() uses:
> - unsigned nr_segs -> int
> - unsigned fast_segs -> int
> b) rw_copy_check_uvector() uses:
> - unsigned long nr_segs -> long
> - unsigned long fast_seg -> long
> 
> So when calling rw_copy_check_uvector(), we have to zero-extend the
> registers used for passing the arguments. That's definitely done when
> calling the function explicitly. Maybe when inlining something is messed up?
> 
> Just a thought ...
> 

... especially because I recall that clang and gcc behave slightly
differently:

https://github.com/hjl-tools/x86-psABI/issues/2

"Function args are different: narrow types are sign or zero extended to
32 bits, depending on their type. clang depends on this for incoming
args, but gcc doesn't make that assumption. But both compilers do it
when calling, so gcc code can call clang code.

The upper 32 bits of registers are always undefined garbage for types
smaller than 64 bits."

Again, just a thought.

-- 
Thanks,

David / dhildenb