Odd SIGSEGV issue introduced by commit 6b31d5955cb29 ("mm, oom: fix potential data corruption when oom_reaper races with writer")

Wed Aug 22 18:19:02 AEST 2018

Le 21/08/2018 à 19:50, Ram Pai a écrit :
> On Tue, Aug 21, 2018 at 04:40:15PM +1000, Michael Ellerman wrote:
>> Christophe LEROY <christophe.leroy at c-s.fr> writes:
>> ...
>>>
>>> And I bisected its disappearance with commit 99cd1302327a2 ("powerpc:
>>> Deliver SEGV signal on pkey violation")
>>
>> Whoa that's weird.
>>
>>> Looking at those two commits, especially the one which makes it
>>> dissapear, I'm quite sceptic. Any idea on what could be the cause and/or
>>> how to investigate further ?
>>
>> Are you sure it's not some corruption that just happens to be masked by
>> that commit? I can't see anything in that commit that could explain that
>> change in behaviour.
>>
>> The only real change is if you're hitting DSISR_KEYFAULT isn't it?
> 
> even with the 'commit 99cd1302327a2', a SEGV signal should get generated;
> which should kill the process. Unless the process handles SEGV signals
> with SEGV_PKUERR differently.

No, the sigsegv are not handled differently. And the trace shown it is 
SEGV_MAPERR which is generated.

> 
> The other surprising thing is, why is DSISR_KEYFAULT getting generated
> in the first place?  Are keys somehow getting programmed into the HPTE?

Can't be that, because DSISR_KEYFAULT is filtered out when applying 
DSISR_SRR1_MATCH_32S mask.

> 
> Feels like some random corruption.

In a way yes, except that it is always at the same instruction (in 
ld.so) and always because the accessed address is 0x67xxxxxx instead of 
0x77xxxxxx
I also tested with TASK_SIZE set to 0xa0000000 instead of 0x80000000, 
and I get same failure with bad address being 0x87xxxxxx instead of 
0x97xxxxxx

Christophe

> 
> Is this behavior seen with power8 or power9?
> 
> RP
>