Debian SID kernel doesn't boot on PowerBook 3400c

Christophe Leroy christophe.leroy at csgroup.eu
Sun Aug 8 03:08:07 AEST 2021



Le 07/08/2021 à 18:26, Stan Johnson a écrit :
> On 8/7/21 8:35 AM, Christophe Leroy wrote:
>>
>>
>> Le 07/08/2021 à 15:09, Stan Johnson a écrit :
>>> On 8/6/21 10:08 PM, Finn Thain wrote:
>>>>
>>>> On Fri, 6 Aug 2021, Stan Johnson wrote:
>>>>
>>>>> $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
>>>>> CONFIG_PPC_KUAP=y
>>>>> CONFIG_PPC_KUAP_DEBUG=y
>>>>> CONFIG_VMAP_STACK=y
>>>>> $ strings vmlinux | fgrep "Linux version"
>>>>> Linux version 5.13.0-pmac-00004-g63e3756d1bd ...
>>>>> $ cp vmlinux ../vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
>>>>>
>>>>> 1) PB 3400c
>>>>> vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
>>>>> Boots, no errors logging in at (text) fb console. Logging in via ssh
>>>>> and
>>>>> running "ls -Rail /usr/include" generated errors (and a hung ssh
>>>>> session). Once errors started, they repeated for almost every command.
>>>>> See pb3400c-63e3756d1bdf-1.txt.
>>>>>
>>>>> 2) Wallstreet
>>>>> vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
>>>>> X login failed, there were errors ("Oops: Kernel access of bad area",
>>>>> "Oops: Exception in kernel mode"). Logging in via SSH, there were no
>>>>> additional errors after running "ls -Rail /usr/include" -- the errors
>>>>> did not escalate as they did on the PB 3400.
>>>>> See Wallstreet-63e3756d1bdf-1.txt.
>>>>>
>>>> ...
>>>>> $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
>>>>> CONFIG_PPC_KUAP=y
>>>>> CONFIG_PPC_KUAP_DEBUG=y
>>>>> # CONFIG_VMAP_STACK is not set
>>>>> $ strings vmlinux | fgrep "Linux version"
>>>>> Linux version 5.13.0-pmac-00004-g63e3756d1bd ...
>>>>> $ cp vmlinux ../vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
>>>>>
>>>>> 3) PB 3400c
>>>>> vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
>>>>> Filesystem was corrupt from the previous test (probably from all the
>>>>> errors during shutdown). After fixing the filesystem:
>>>>> Boots, no errors logging in at (text) fb console. Logging in via ssh
>>>>> and
>>>>> running "ls -Rail /usr/include" generated a few errors. There didn't
>>>>> seem to be as many errors as in the previous test, there were a few
>>>>> errors during shutdown but the shutdown was otherwise normal.
>>>>> See pb3400c-63e3756d1bdf-2.txt.
>>>>>
>>>>> 4) Wallstreet
>>>>> vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
>>>>> X login worked, and there were no errors. There were no errors during
>>>>> ssh access.
>>>>> See Wallstreet-63e3756d1bdf-2.txt.
>>>>>
>>>>
>>>> Thanks for collecting these results, Stan. Do you think that the
>>>> successful result from test 4) could have been just chance?
>>>
>>> No. I repeated Test 4 above two more times on the Wallstreet. After
>>> stomping on it as hard as I could, I didn't see any errors. I ran the
>>> following tests simultaneously, with no errors:
>>>
>>> a) Ping flood the Wallstreet
>>> 862132 packets transmitted, 862117 packets received, 0.0% packet loss
>>> round-trip min/avg/max/stddev = 0.316/0.418/12.163/0.143 ms
>>>
>>> b) "ls -Rail /usr" in an ssh window.
>>>
>>> c) "find /usr/include -type f -exec sha1sum {} \;" in a second ssh
>>> window.
>>>
>>> d) With a, b and c running, I logged in at the X console (slow but it
>>> worked). Load average was 7.0 as reported by uptime.
>>>
>>> So the success seems to be repeatable (or at least the errors are so
>>> unlikely to happen that I'm not seeing anything).
>>>
>>>>
>>>> It appears that the bug affecting the Powerbook 3400 is unaffected by
>>>> CONFIG_VMAP_STACK.
>>>>
>>>> Whereas the bug affecting the Powerbook G3 disappears when
>>>> CONFIG_VMAP_STACK is disabled (assuming the result from 4 is reliable).
>>>>
>>>> Either way, these results reiterate that "Oops: Kernel access of bad
>>>> area,
>>>> sig: 11" was not entirely resolved by "powerpc/32s: Fix napping
>>>> restore in
>>>> data storage interrupt (DSI)".
>>>>
>>>
>>> That sounds right. Thanks for investigating this.
>>>
>>
>>
>> Thanks a lot for your patience and for the tests.
>>
>> I'm still having hard time understanding what the problem is.
>>
>> Could you try the new change I pushed into the git repo ? It shouldn't
>> have any effect, but I prefer to eliminate all possibilities. The
>> documentation says that SRR1 upper bit are 0 on DSI and the code relies
>> on that. But if the doc is wrong then that can explain the problem. So
>> now I'm forcing it to 0 regardless.
>>
>> To get the change, you just have to do 'git pull -r' inside the
>> directory where you checked out the sources and build.
>>
>> Thanks again
>> Christophe
>>
> 
> Thanks, Christophe.
> 
> In the same directory as previous builds:
> 
> $ git checkout chleroy-linux/bugtest
> HEAD is now at 63e3756d1bdf powerpc/interrupts: Also perform KUAP/KUEP
> lock and usertime accounting on NMI
> $ git pull -r
> You are not currently on a branch.
> Please specify which branch you want to rebase against.
> ...
> $ git pull -r chleroy-linux
> remote: Enumerating objects: 6, done.
> remote: Counting objects: 100% (6/6), done.
> remote: Compressing objects: 100% (6/6), done.
> remote: Total 6 (delta 0), reused 6 (delta 0), pack-reused 0
> Unpacking objects: 100% (6/6), done.
>  From https://github.com/chleroy/linux
>     63e3756d1bdf..9023760b1361  bugtest    -> chleroy-linux/bugtest
> Updating 63e3756d1bdf..9023760b1361
> Fast-forward
>   arch/powerpc/kernel/head_book3s_32.S | 1 +
>   1 file changed, 1 insertion(+)
> HEAD is up to date.
> 
> Hopefully I did that right and ended up at the right spot.
> 
> For tests 5 and 6:
> 
> $ cp ../dot-config-powermac-5.13 .config
> $ scripts/config -e CONFIG_PPC_KUAP -e CONFIG_PPC_KUAP_DEBUG -e
> CONFIG_VMAP_STACK
> $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean
> olddefconfig vmlinux
> $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
> CONFIG_PPC_KUAP=y
> CONFIG_PPC_KUAP_DEBUG=y
> CONFIG_VMAP_STACK=y
> $ strings vmlinux | grep "Linux version"
> Linux version 5.13.0-pmac-00005-g9023760b136 (johnson at ThinkPad)
> (powerpc-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0, GNU ld (GNU Binutils for
> Debian) 2.31.1) #3 SMP Sat Aug 7 09:29:11 MDT 2021
> $ cp vmlinux ../vmlinux-5.13.0-pmac-00005-g9023760b136-1
> 
> 
> 5) PB 3400c
> vmlinux-5.13.0-pmac-00005-g9023760b136-1
> Boots, no errors logging in at (text) fb console. Logging in via ssh and
> running "ls -Rail /usr/include" generated errors. As before, once errors
> started, they seemed to escalate, including errors during "shutdown -r now".
> See pb3400c-g9023760b136-1.txt.
> 
> 6) Wallstreet
> vmlinux-5.13.0-pmac-00005-g9023760b136-1
> X login failed, and there were errors. Logging in via SSH, there were no
> additional errors after running "ls -Rail /usr/include" -- as before,
> the errors did not escalate as they did on the PB 3400.
> See Wallstreet-g9023760b136-1.txt.
> 
> For tests 7 and 8:
> 
> $ cp ../dot-config-powermac-5.13 .config
> $ scripts/config -e CONFIG_PPC_KUAP -e CONFIG_PPC_KUAP_DEBUG -d
> CONFIG_VMAP_STACK
> $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean
> olddefconfig vmlinux
> $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
> CONFIG_PPC_KUAP=y
> CONFIG_PPC_KUAP_DEBUG=y
> # CONFIG_VMAP_STACK is not set
> $ strings vmlinux | grep "Linux version"
> Linux version 5.13.0-pmac-00005-g9023760b136 (johnson at ThinkPad)
> (powerpc-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0, GNU ld (GNU Binutils for
> Debian) 2.31.1) #4 SMP Sat Aug 7 09:49:03 MDT 2021
> $ cp vmlinux ../vmlinux-5.13.0-pmac-00005-g9023760b136-2
> 
> 
> 7) PB 3400c
> vmlinux-5.13.0-pmac-00005-g9023760b136-2
> As before, the filesystem was corrupt from the previous test. After
> fixing that, this kernel boots, and there were no errors from logging in
> at the (text) fb console. Logging in via ssh and running "ls -Rail
> /usr/include" generated errors. There were a few errors logging in at
> the serial console and during shutdown, but the shutdown was otherwise
> normal.
> See pb3400c-g9023760b136-2.txt.
> 
> 8) Wallstreet
> vmlinux-5.13.0-pmac-00005-g9023760b136-2
> X login worked, and there were no errors. There were also no errors
> during ssh access.
> Simultaneous stress test, also no errors:
> a) Login at X console.
> b) Ping flood the Wallstreet
> 359695 packets transmitted, 359688 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev = 0.322/0.428/16.857/0.165 ms
> c) "ls -Rail /usr" in an ssh window.
> d) "find /usr/include -type f -exec sha1sum {} \;" in a second ssh window.
> See Wallstreet-g9023760b136-2.txt.
> 
> As far as I could tell, there were no significant changes from the
> previous four tests.
> 

Ok, that was expected, but I wanted to be 100% sure to avoid looking into the wrong direction.

To be honnest, I'm running out of ideas.

We have two remaining independant problems as far as I understand:

PB3400C (603ev core = No hash table)
- A KUAP fault, regardless of CONFIG_VMAP_STACK, due to a clobber of r11 registers apparently.

Wallstreet (Hash table)
- Random faults, only with CONFIG_VMAP_STACK


One thing I am wondering, could there be a link with SMP ?

Would you mind trying with a kernel built without CONFIG_SMP ?

Thanks
Christophe


More information about the Linuxppc-dev mailing list