oops bringing up secondary cpus

Dave Hansen haveblue at us.ibm.com
Wed Jul 14 11:04:24 EST 2004


First of all, this oops is very likely my fault.  I'm porting
CONFIG_NONLINEAR over to ppc64 for memory hotplug, and I have little
doubt that I screwed up somewhere.  I could really use some more eyes
analyzing this oops, though.  It's a 2-way p650 partition with 12GB of
allocated memory.

I think it occurs the first time that a secondary CPU touches its idle
task's kernel stack.  Here's the dump:

cpu 0x1: Vector: 300 (Data Access) at [c0000002fff3bd10]
    pc: 000000000000beac
    lr: 000000000000be88
    sp: c0000002fff3bf90
   msr: 8000000000001002
   dar: c0000002fff3bf90
 dsisr: a000000
  current = 0xc0000002fff30920
  paca    = 0xc00000000035a000
    pid   = 0, comm = swapper

pc appears to be in __secondary_start (the 0xc00000000000beac address):

objdump:
c00000000000be38 <.__secondary_start>:
...
c00000000000be94:       64 63 00 50     oris    r3,r3,80
c00000000000be98:       60 63 5f 58     ori     r3,r3,24408
c00000000000be9c:       7b 1c 1f 24     rldicr  r28,r24,3,60
c00000000000bea0:       7c 23 e0 2a     ldx     r1,r3,r28
c00000000000bea4:       38 21 3f 90     addi    r1,r1,16272
c00000000000bea8:       38 00 00 00     li      r0,0
c00000000000beac:       f8 01 00 00     std     r0,0(r1)
c00000000000beb0:       f8 2d 00 20     std     r1,32(r13)
c00000000000beb4:       e8 6d 00 38     ld      r3,56(r13)
c00000000000beb8:       60 64 00 01     ori     r4,r3,1
c00000000000bebc:       38 60 50 00     li      r3,20480


The idle task for cpu one is allocated at c0000002fff38000, so this
looks valid.  The instruction that causes the exception appears to be
reading the contents of r1 into r0, right?  That looks like a quite
valid virtual address to me.  Is this a problem with the SLB?

enter ? for help
1:mon> r
R00 = 0000000000000000   R16 = 0000000000000000
R01 = c0000002fff3bf90   R17 = 0000000000000000
R02 = c000000000500440   R18 = 0000000000000000
R03 = c000000000505f58   R19 = 0000000000000000
R04 = 000008d12e6ab480   R20 = 0000000000000000
R05 = 0000000000000000   R21 = 0000000000c00000
R06 = 0000000000000001   R22 = 0000000000000000
R07 = 0000000000c02000   R23 = 0000000000000001
R08 = c00000000035a000   R24 = 0000000000000001
R09 = d000000008000001   R25 = 0000000000000010
R10 = 0000000000000001   R26 = 0000000000000568
R11 = 0000000000000002   R27 = 000000000000041c
R12 = 2000000000000000   R28 = 0000000000000008
R13 = c00000000035a000   R29 = 00000000000009e8
R14 = 0000000000000000   R30 = 0000000003280000
R15 = 0000000000000000   R31 = 0000000000900000
pc  = 000000000000beac
lr  = 000000000000be88
msr = 8000000000001002   cr  = 42280484
ctr = 0000000000000000   xer = 0000000020000000   trap =      300


SLB contents of cpu 1
00 c000000008000000 00006a99b4b14580
01 d000000008000000 000008d12e6ab480
02 c0000002f8000000 0000171f7cb14580
03 c000000030000000 0000a12fdcb14580
04 0000000010000000 000055a12defac00
05 0000000000000000 00001fe68f09ec00
06 00000000f0000000 000030d55709ec00
07 0000000040000000 000013596f09ec00
08 c0000002f0000000 0000171f7cb14580
09 0000000010000000 0000dcc34709ec00
10 0000000000000000 000098c475efac00
11 00000000f0000000 0000a9b33defac00
12 0000000040000000 00008c3755efac00
13 c000000030000000 0000a12fdcb14580
14 0000000010000000 000055a12defac00
15 c0000002f0000000 0000171f7cb14580
16 0000000000000000 00001fe68f09ec00
17 00000000f0000000 000030d55709ec00
18 0000000040000000 000013596f09ec00
19 0000000010000000 0000dcc34709ec00
20 0000000000000000 000098c475efac00
21 00000000f0000000 0000a9b33defac00
22 0000000040000000 00008c3755efac00
23 c000000030000000 0000a12fdcb14580
24 0000000010000000 000055a12defac00
25 c0000002f0000000 0000171f7cb14580
26 c000000030000000 0000a12fdcb14580
27 0000000000000000 0000e3779b970c00
28 00000000f0000000 0000f46663970c00
29 0000000040000000 0000d6ea7b970c00
30 0000000010000000 0000a05453970c00
31 c0000002f0000000 0000171f7cb14580
32 e000000000000000 0000a708a8242480
33 c0000002f0000000 0000171f7cb14580
34 e000000080000000 00008dee68242480
35 0000000000000000 000098c475efac00
36 00000000f0000000 0000a9b33defac00
37 0000000040000000 00008c3755efac00
38 c000000030000000 0000a12fdcb14580
39 0000000010000000 000055a12defac00
40 0000000000000000 00001fe68f09ec00
41 00000000f0000000 000030d55709ec00
42 0000000040000000 000013596f09ec00
43 c0000002f0000000 0000171f7cb14580
44 0000000010000000 0000dcc34709ec00
45 0000000000000000 000036fbefa91c00
46 00000000f0000000 000047eab7a91c00
47 0000000040000000 00002a6ecfa91c00
48 0000000010000000 0000f3d8a7a91c00
49 0000000000000000 00001fe68f09ec00
50 00000000f0000000 000030d55709ec00
51 0000000040000000 000013596f09ec00
52 c000000030000000 0000a12fdcb14580
53 0000000010000000 0000dcc34709ec00
54 c0000002f0000000 0000171f7cb14580
55 c000000030000000 0000a12fdcb14580
56 c0000002f0000000 0000171f7cb14580
57 c000000030000000 0000a12fdcb14580
58 c0000002f0000000 0000171f7cb14580
59 c000000030000000 0000a12fdcb14580
60 c0000002f0000000 0000171f7cb14580
61 c000000030000000 0000a12fdcb14580
62 c0000002f0000000 0000171f7cb14580
63 0000000000000000 000098c475efac00

Could stab_initialize() have been screwed up?  Any suggestions how to
debug this?  I even tried dumping the PACA on a working, and non-working
kernel, but the only difference I got was at decimal offset 480:

--- paca0-mm2-virgin    2004-07-13 13:33:04.000000000 -0700
+++ paca0-mm2-nononl    2004-07-13 16:17:21.000000000 -0700
@@ -2,7 +2,7 @@
 ][0032]: 0100000000000000 0000000000000000 d397d78102800000 0000000000000000
 ][0128]: d397d9e204000000 0000000000000000 0000000000000000 0000000000000000
 ][0256]: 0000000300000000 0000000000000000 0000000000000000 0000000000000000
-][0480]: 0000003b6fae5692 0000000000000000 c000000000000000 0000000000000000
+][0480]: e000000000000000 0000000000000000 c000000000000000 0000000000000000
 ][1024]: c000000000346180 0000000000000000 0000000000000000 0000000000000000
 ][1056]: 0000000000000000 0000000000000000 d397d78102800000 0000000000000000
 ][1152]: d397d9e204000000 0000000000000000 0000000000000000 0000000000000000

And that's in a reserved area of the xLpPaca, so it's pretty much a
guess what the hypervisor was doing.  (btw, this diff is from a slightly
kernel different kernel than the above oops)

-- Dave


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/
** This list is shutting down 7/24/2004.





More information about the Linuxppc64-dev mailing list