Serial RAPID IO kernel hang on maintenance read transaction

Proicou, Mike mcp at lanl.gov
Sat Jun 2 06:40:49 EST 2012


I've been struggling with a kernel hang during bootup + enumeration of a Rapid IO system.

My current system contains a N.A.T MCH (using the IDT/Tundra Tsi 578 switch) and a Vadatech AMC719 card using the Freescale P4080 processor.  There will be other cards added to the system, but I'm testing with just this for now.

I'm using a Linux kernel version 2.6.34.6.  I've set riohdid=0 on the kernel command line, and I'm expecting Linux to fully enumerate and configure the Rapid IO fabric. (This may be a bad assumption on my part.)

After lots of tracing, I've determined that the kernel is hanging on the first maintenance transaction to the switch.  The hang will often be followed by a "machine check in kernel mode" exception and panic.

 This is very similar to the behavior reported in this mailing list  thread from 2010: http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-October/086235.html  I've read that thread several times and tries most of the suggestions, but they don't appear to apply in my hardware configuration.linu



Is it possible that something in the switch isn't completely initialized at the time that Linux tries to do the maintenance transaction?  If so, how do I find it?

Here's the console  log for a bootup using the supplied kernel:

Freescale XGMAC MDIO Bus: probed
Setting up RapidIO peer-to-peer network /rapidio at ffe0c0000
fsl-of-rio ffe0c0000.rapidio: Of-device full name /rapidio at ffe0c0000
fsl-of-rio ffe0c0000.rapidio: Regs: [mem 0xffe0c0000-0xffe0dffff]
fsl-of-rio ffe0c0000.rapidio: LAW start 0x0000000c20000000, size 0x0000000001000000.
fsl-of-rio ffe0c0000.rapidio: errirq: 16, bellirq: 57, txirq: 60, rxirq 61
fsl-of-rio ffe0c0000.rapidio: RapidIO PHY type: serial
SRIO Port 1 Status: Lane0Sync Lane1Sync Lane2Sync Lane3Sync Aligned
SRIO Port 2 Status: (Note: Freescale driver only supports Port 1)
fsl-of-rio ffe0c0000.rapidio: Hardware port width: 4
fsl-of-rio ffe0c0000.rapidio: Training connection status: Four-lane
fsl-of-rio ffe0c0000.rapidio: RapidIO Common Transport System size: 256
RIO: enumerate master port 0, RIO0 mport
Machine check in kernel mode.
RIO: port1 error
Caused by (from MCSR=a000): Load Error Report
Guarded Load Error Report
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=8 amc718_based
last sysfs file:
Modules linked in:
NIP: c001a460 LR: c01ee41c CTR: c001a420
REGS: effc9f10 TRAP: 0204   Not tainted  (2.6.34.6-vt3-svn36835)
MSR: 00021002 <ME,CE>  CR: 24022024  XER: 00000000
TASK = ebc68000[1] 'swapper' THREAD: ebc62000 CPU: 6
GPR00: f1200000 ebc63d10 ebc68000 00000000 00000000 3fc00000 3fc00000 f1200068
GPR08: 00000004 ebc63d18 f1190c20 eb530000 24022022 d811c00a 00000000 00000000
GPR16: 00000000 7ffe2a00 00000000 00000000 7fff0df0 00000000 00000000 00000000
GPR24: 00000081 000000ff 00000000 ebd89400 00000068 00029002 c05e8914 ebc63d58
NIP [c001a460] fsl_rio_config_read+0x40/0x78
LR [c01ee41c] rio_mport_read_config_32+0x7c/0xac
Call Trace:
[ebc63d50] [c01eed64] rio_get_host_deviceid_lock+0x3c/0x50
[ebc63d70] [c045acd4] rio_enum_peer+0x28/0x3e4
[ebc63dd0] [c045b178] rio_enum_mport+0xe8/0x244
[ebc63e10] [c045a59c] rio_init_mports+0x90/0xe4
[ebc63e30] [c0457a5c] fsl_of_rio_rpn_probe+0x3c/0x50
[ebc63e40] [c034abe4] of_platform_device_probe+0x58/0x98
[ebc63e60] [c02274d8] driver_probe_device+0xa4/0x1b4
[ebc63e80] [c02260cc] bus_for_each_drv+0x6c/0xa8
[ebc63eb0] [c022735c] device_attach+0xa4/0xc8
[ebc63ed0] [c0226afc] bus_probe_device+0x2c/0x44
[ebc63ee0] [c02245f8] device_add+0x460/0x5a8
[ebc63f30] [c034a750] of_device_register+0x34/0x48
[ebc63f40] [c0008d64] of_platform_device_create+0x44/0x74
[ebc63f50] [c0008f90] of_platform_bus_probe+0x130/0x15c
[ebc63f70] [c0565480] declare_of_platform_devices+0x24/0x140
[ebc63f90] [c05651cc] __machine_initcall_amc718_based_declare_of_platform_devices+0x2c/0x3c
[ebc63fa0] [c0001cb8] do_one_initcall+0x3c/0x1d0
[ebc63fd0] [c055e9b0] kernel_init+0x190/0x230
[ebc63ff0] [c000f284] kernel_thread+0x4c/0x68
Instruction dump:
814b000c 54e0ba7e 7cc60378 7c0004ac 90ca0000 2f880001 800b0018 7ce03a14
419e0020 2f880002 419e002c 38600000 <80e70000> 7c2006ac 90e90000 4e800020
---[ end trace 561bb236c800851f ]---
Kernel panic - not syncing: Attempted to kill init!
Call Trace:
Rebooting in 180 seconds..

Here's a partial log with some additional output and a dump of the error registers at the time of failure:



fsl-elo-dma ffe101300.dma: request channel 0 IRQ
fsl-elo-dma ffe101300.dma: request channel 1 IRQ
fsl-elo-dma ffe101300.dma: request channel 2 IRQ
fsl-elo-dma ffe101300.dma: request channel 3 IRQ
Freescale PowerQUICC MII Bus: probed
Freescale XGMAC MDIO Bus: probed
fsl-of-rio ffe0c0000.rapidio: Setting up RapidIO peer-to-peer network /rapidio at ffe0c0000
fsl-of-rio ffe0c0000.rapidio: Of-device full name /rapidio at ffe0c0000
fsl-of-rio ffe0c0000.rapidio: Regs: [mem 0xffe0c0000-0xffe0dffff]
fsl-of-rio ffe0c0000.rapidio: LAW start 0x0000000c20000000, size 0x0000000001000000
fsl-of-rio ffe0c0000.rapidio: get_immrbase() ffe000000
fsl-of-rio ffe0c0000.rapidio: IO c20000000 c20ffffff
  alloc irq_desc for 57 on node 0
  alloc kstat_irqs on node 0
irq: irq 57 on host /soc at ffe000000/pic at 40000 mapped to virtual irq 57
  alloc irq_desc for 60 on node 0
  alloc kstat_irqs on node 0
irq: irq 60 on host /soc at ffe000000/pic at 40000 mapped to virtual irq 60
  alloc irq_desc for 61 on node 0
  alloc kstat_irqs on node 0
irq: irq 61 on host /soc at ffe000000/pic at 40000 mapped to virtual irq 61
fsl-of-rio ffe0c0000.rapidio: errirq: 16, bellirq: 57, txirq: 60, rxirq 61
fsl-of-rio ffe0c0000.rapidio: Host deviceid 0
fsl-of-rio ffe0c0000.rapidio: RapidIO PHY type: serial
fsl-of-rio ffe0c0000.rapidio: SRIO Port 1 Status: Lane0Sync Lane1Sync Lane2Sync Lane3Sync Aligned
fsl-of-rio ffe0c0000.rapidio: SRIO Port 2 Status: (Note: Freescale driver only supports Port 1)
fsl-of-rio ffe0c0000.rapidio: Hardware port width: 4
fsl-of-rio ffe0c0000.rapidio: Training connection status: Four-lane
fsl-of-rio ffe0c0000.rapidio: RapidIO Common Transport System size: 256
RIO: enumerate master port 0, RIO0 mport
fsl_local_config_write: index 0 offset 00000068 data 00000000
fsl_local_config_read: index 0 offset 00000068 (ebc63da8) = 00000000
fsl_local_config_write: index 0 offset 00000060 data 00000000
fsl_local_config_read: index 0 offset 0000013c (ebc63da8) = e0000000
RIO0 mport PGCCSR e0000000
fsl_local_config_read: index 0 offset 0000000c (ebc63d58) = 00000100
fsl_local_config_read: index 0 offset 00000100 (ebc63d58) = 06000001
fsl_local_config_read: index 0 offset 00000158 (ebc63d88) = 00020302
fsl_local_config_read: index 0 offset 0000013c (ebc63da8) = e0000000
RIO0 mport is active PGCCSR e0000000
rio_enum_peer 1Machine check in kernel mode.
RIO: port1 error
 P1 error regs EDCSR 00000005 IECSR 00000000 ESCSR 00020302
   LTLEDCSR 00000000
Caused by (from MCSR=a000): Load Error Report
Guarded Load Error Report
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=8 amc718_based
last sysfs file:
Modules linked in:
NIP: c001a838 LR: c01f201c CTR: c001a748
REGS: effc9f10 TRAP: 0204   Not tainted  (2.6.34.6-MCP-svn1717)
MSR: 00021002 <ME,CE>  CR: 24022022  XER: 00000000
TASK = ebc68000[1] 'swapper' THREAD: ebc62000 CPU: 6
GPR00: 00000068 ebc63cf0 ebc68000 ffffffea 00000000 000000ff 00000000 00000068
GPR08: 00000004 ebd80000 3fc00000 f1190c20 24022022 d814c00a 00000000 00000000
GPR16: 00000000 7ffe2a00 00000000 00000000 7fff0df0 00000000 00000000 00000000
GPR24: 00000081 000000ff f1200068 00000000 ebc63d18 00000000 000000ff 00000068
NIP [c001a838] fsl_rio_config_read+0xf0/0x11c
LR [c01f201c] rio_mport_read_config_32+0x7c/0xac
Call Trace:
[ebc63cf0] [7ffe2a00] 0x7ffe2a00 (unreliable)
[ebc63d10] [c01f201c] rio_mport_read_config_32+0x7c/0xac
[ebc63d50] [c01f28d0] rio_get_host_deviceid_lock+0x3c/0x60
[ebc63d70] [c045ec8c] rio_enum_peer+0x34/0x4c0
[ebc63dd0] [c045f228] rio_enum_mport+0x110/0x290
[ebc63e10] [c045e484] rio_init_mports+0x90/0xe4
[ebc63e30] [c045b944] fsl_of_rio_rpn_probe+0x4c/0x60
[ebc63e40] [c034ea48] of_platform_device_probe+0x58/0x98
[ebc63e60] [c022b334] driver_probe_device+0xa4/0x1b4
[ebc63e80] [c0229f28] bus_for_each_drv+0x6c/0xa8
[ebc63eb0] [c022b1b8] device_attach+0xa4/0xc8
[ebc63ed0] [c022a958] bus_probe_device+0x2c/0x44
[ebc63ee0] [c0228454] device_add+0x460/0x5a8
[ebc63f30] [c034e5b4] of_device_register+0x34/0x48
[ebc63f40] [c0008d64] of_platform_device_create+0x44/0x74
[ebc63f50] [c0008f90] of_platform_bus_probe+0x130/0x15c
[ebc63f70] [c056b534] declare_of_platform_devices+0x24/0x140
[ebc63f90] [c056b280] __machine_initcall_amc718_based_declare_of_platform_devices+0x2c/0x3c
[ebc63fa0] [c0001cb8] do_one_initcall+0x3c/0x1d0
[ebc63fd0] [c05649b0] kernel_init+0x190/0x230
[ebc63ff0] [c000f284] kernel_thread+0x4c/0x68
Instruction dump:
7fa6eb78 7fe7fb78 7f49d378 4843d975 2f9b0000 409e0028 935c0000 7f63db78
4bffff58 a35a0000 7c2006ac 4bffffc8 <835a0000> 7c2006ac 4bffffbc 3c60c04e
---[ end trace 561bb236c800851f ]---
Kernel panic - not syncing: Attempted to kill init!
Call Trace:
Rebooting in 180 seconds..

Thanks for any help ...

Mike Proicou

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20120601/ff11a529/attachment.html>


More information about the Linuxppc-dev mailing list