[Bug 213837] "Kernel panic - not syncing: corrupted stack end detected inside scheduler" at building via distcc on a G5

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Fri Sep 24 00:05:19 AEST 2021


https://bugzilla.kernel.org/show_bug.cgi?id=213837

--- Comment #8 from mpe at ellerman.id.au ---
bugzilla-daemon at bugzilla.kernel.org writes:
> https://bugzilla.kernel.org/show_bug.cgi?id=213837
>
> --- Comment #7 from Erhard F. (erhard_f at mailbox.org) ---
> Created attachment 298919
>   --> https://bugzilla.kernel.org/attachment.cgi?id=298919&action=edit
> dmesg (5.15-rc2 + patch, PowerMac G5 11,2)
>
> (In reply to mpe from comment #6)
>> Can you try this patch, it might help us work out what is corrupting the
>> stack.
> With your patch applied to recent v5.15-rc2 the output looks like this:
>
> [...]
> stack corrupted? stack end = 0xc000000029fdc000
> stack: c000000029fdbc00: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
> ZZZZZZZZ........
...

> Can't make much sense out of it but hopefully you can. ;)

Thanks. Obvious isn't it? ;)

  stack corrupted? stack end = 0xc000000029fdc000
  stack: c000000029fdbc00: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbc10: 00000ddc 7c000010 cccccccc cccccccc 
....|...........
  stack: c000000029fdbc20: 29fc4e41 673d4bb3 5a5a5a5a 5a5a5a5a 
).NAg=K.ZZZZZZZZ
  stack: c000000029fdbc30: cccccccc cccccccc 00000ddc 8e000010 
................
  stack: c000000029fdbc40: cccccccc cccccccc 41fc4e41 673d41a3 
........A.NAg=A.
  stack: c000000029fdbc50: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbc60: 00000ddc 8e00000c cccccccc cccccccc 
................
  stack: c000000029fdbc70: 79fc4e41 673d4dab 5a5a5a5a 5a5a5a5a 
y.NAg=M.ZZZZZZZZ
  stack: c000000029fdbc80: cccccccc cccccccc 00000ddc 90000008 
................
  stack: c000000029fdbc90: cccccccc cccccccc 91fc4e41 673d4573 
..........NAg=Es
  stack: c000000029fdbca0: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbcb0: 00000dd7 ac000016 cccccccc cccccccc 
................
  stack: c000000029fdbcc0: c9fc4e41 673d4203 5a5a5a5a 5a5a5a5a 
..NAg=B.ZZZZZZZZ
  stack: c000000029fdbcd0: cccccccc cccccccc 00000ddc 6c000004 
............l...
  stack: c000000029fdbce0: cccccccc cccccccc e1fc4e41 673d474b 
..........NAg=GK
  stack: c000000029fdbcf0: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbd00: 00000ddc 88000000 cccccccc cccccccc 
................
  stack: c000000029fdbd10: 19fd4e41 673d4143 5a5a5a5a 5a5a5a5a 
..NAg=ACZZZZZZZZ
  stack: c000000029fdbd20: cccccccc cccccccc 00000ddb 6c00000e 
............l...
  stack: c000000029fdbd30: cccccccc cccccccc 31fd4e41 673d4f43 
........1.NAg=OC
  stack: c000000029fdbd40: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbd50: 00000ddc 8e000008 cccccccc cccccccc 
................
  stack: c000000029fdbd60: 69fd4e41 673d407b 5a5a5a5a 5a5a5a5a 
i.NAg=@{ZZZZZZZZ
  stack: c000000029fdbd70: cccccccc cccccccc 00000ddc 92000008 
................
  stack: c000000029fdbd80: cccccccc cccccccc 81fd4e41 673d4633 
..........NAg=F3
  stack: c000000029fdbd90: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbda0: 00000ddb 42000018 cccccccc cccccccc 
....B...........
  stack: c000000029fdbdb0: b9fd4e41 673d42fb 5a5a5a5a 5a5a5a5a 
..NAg=B.ZZZZZZZZ
  stack: c000000029fdbdc0: cccccccc cccccccc 00000ddc 7e000018 
............~...
  stack: c000000029fdbdd0: cccccccc cccccccc d1fd4e41 673d4a1b 
..........NAg=J.
  stack: c000000029fdbde0: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbdf0: 00000ddc 8e000004 cccccccc cccccccc 
................
  stack: c000000029fdbe00: 09fe4e41 673d4ee3 5a5a5a5a 5a5a5a5a 
..NAg=N.ZZZZZZZZ
  stack: c000000029fdbe10: cccccccc cccccccc 00000dd9 7200001c 
............r...
  stack: c000000029fdbe20: cccccccc cccccccc 21fe4e41 673d4fa3 
........!.NAg=O.

That's slab data.

It's not clear what the actual data is, but because you booted with
slub_debug=FZP we can see the red zones and poison.

The cccccccc is SLUB_RED_ACTIVE, and 5a5a5a5a is POISON_INUSE (see poison.h)


  stack: c000000029fdbe30: c0000000 29fdbeb0 cccccccc cccccccc 
....)...........

But then here we have an obvious pointer (big endian FTW).

And it points nearby, just slightly higher in memory, so that looks
suspiciously like a stack back chain pointer. There's more similar
values if you look further.

But we shouldn't be seeing the stack yet, it's meant to start (end) at
c000000029fdc000 ...

  stack: c000000029fdbe40: 00000ddc 94000000 cccccccc cccccccc 
................
  stack: c000000029fdbe50: 59fe4e41 673d4933 5a5a5a5a 5a5a5a5a 
Y.NAg=I3ZZZZZZZZ
  stack: c000000029fdbe60: cccccccc cccccccc 00000dd9 60000024 
............`..$
  stack: c000000029fdbe70: cccccccc cccccccc 71fe4e41 673d416b 
........q.NAg=Ak
  stack: c000000029fdbe80: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbe90: 00000ddc 6000000c cccccccc cccccccc 
....`...........
  stack: c000000029fdbea0: c0000000 29fdbf20 00000000 00000002  ....)..
........
  stack: c000000029fdbeb0: c0000000 29fdbf30 00000ddc 7e00001c ....)..0....~...
    <---
  stack: c000000029fdbec0: c0000000 29fdbf40 c1fe4e41 673d4723 
....).. at ..NAg=G#
  stack: c000000029fdbed0: 5a5a5a5a 5a5a5a5a cccccccc cccccccc 
ZZZZZZZZ........
  stack: c000000029fdbee0: c0000000 29fdbf60 cccccccc cccccccc 
....)..`........
  stack: c000000029fdbef0: c0000000 29fdbf70 5a5a5a5a 5a5a5a5a 
....)..pZZZZZZZZ
  stack: c000000029fdbf00: cccccccc cccccccc 00000ddc 60000010 
............`...
  stack: c000000029fdbf10: c0000000 29fdbf90 00000000 00000002 
....)...........
  stack: c000000029fdbf20: c0000000 29fdbf01 001d3029 96167689 
....).....0)..v.
  stack: c000000029fdbf30: c0000000 29fdbfc0 c0000004 7f6f1800 ....)........o..
    <---
  stack: c000000029fdbf40: c0000000 29fdbfc0 5a5a5a5a 5a5a5a5a 
....)...ZZZZZZZZ
  stack: c000000029fdbf50: c0000000 000ea33c 00000000 00000000 
.......<........
  stack: c000000029fdbf60: c0000000 29fdbfe0 c0000000 05cdb700 
....)...........
  stack: c000000029fdbf70: c0000000 29fdbff0 cccccccc cccccccc 
....)...........
  stack: c000000029fdbf80: c0000000 000ea33c 00000000 00328780 
.......<.....2..
  stack: c000000029fdbf90: c0000000 29fdc010 001d3029 96167689 
....).....0)..v.
  stack: c000000029fdbfa0: c0000000 29fdc020 00000000 000008e4  ....)..
........
  stack: c000000029fdbfb0: 00000000 00000201 001d3029 96167689 
..........0)..v.
  stack: c000000029fdbfc0: c0000000 29fdc040 cccccccc cccccccc ....).. at ........
    <---
  stack: c000000029fdbfd0: c0000000 000c2344 001d3029 96167689 
......#D..0)..v.
  stack: c000000029fdbfe0: c0000000 29fdc001 001d3029 96167689 
....).....0)..v.
  stack: c000000029fdbff0: c0000000 29fdc080 00000088 554c539a 
....).......ULS.

... which is here:

  stack: c000000029fdc000: c0000000 000c1d9c 001d3029 96167689 
..........0)..v.
  stack: c000000029fdc010: c0000000 29fdc0d0 c0000004 7f6f1700 
....)........o..
  stack: c000000029fdc020: c0000000 29fdc0a0 c0000000 05cdb580 
....)...........
  stack: c000000029fdc030: c0000000 29fdc0b0 c0000004 7f6f1700 
....)........o..
  stack: c000000029fdc040: c0000000 29fdc0c0 00000000 00000001 
....)...........


So it looks like you have actually overran your stack, rather than
something else clobbering your stack.

Can you attach your System.map for that exact kernel? We might be able
to work out what functions we were in when we overran.

You could also try changing CONFIG_THREAD_SHIFT to 15, that might keep
the system running a bit longer and give us some other clues.

cheers

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
You are watching someone on the CC list of the bug.


More information about the Linuxppc-dev mailing list