MPC8315 reboot failure, lockdep splat possibly related?
Bhushan Bharat-R65777
R65777 at freescale.com
Sat Aug 17 14:58:34 EST 2013
> -----Original Message-----
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+bharat.bhushan=freescale.com at lists.ozlabs.org] On Behalf Of Anthony
> Foiani
> Sent: Saturday, August 17, 2013 7:10 AM
> To: linuxppc-dev at lists.ozlabs.org
> Subject: MPC8315 reboot failure, lockdep splat possibly related?
>
>
> Greetings.
>
> I've been experiencing occasional lockups at reboot for a few weeks, but only
> once every 10-20 boots. A good reboot looks like this:
>
> [47529.721640] lm77 0-0048: shutdown
> [47529.725160] rtc-m41t80 0-0068: shutdown
> [47529.729169] i2c i2c-0: shutdown
> [47529.732534] fsl-ehci fsl-ehci.0: shutdown
> [47529.736842] sd 1:0:0:0: shutdown
> [47529.740239] sd 1:0:0:0: [sda] Synchronizing SCSI cache
> [47529.747091] uio_pci_generic 0000:00:0a.0: shutdown
> [47529.752079] pci 0000:00:00.0: shutdown
> [47529.756021] Restarting system.
>
> While a bad one fails after the EHCI shutdown:
>
> [ 747.578001] lm77 0-0048: shutdown
> [ 747.581522] rtc-m41t80 0-0068: shutdown
> [ 747.585538] i2c i2c-0: shutdown
> [ 747.588909] sd 1:0:0:0: shutdown
> [ 747.592304] sd 1:0:0:0: [sda] Synchronizing SCSI cache
> [ 747.597973] fsl-ehci fsl-ehci.0: shutdown
>
> I enabled lockdep, and I get this splat on every boot, regardless of whether it
> locks up at reboot or not. Could it possibly be related?
> Any other ideas on how to avoid the reboot lockup?
>
> [ 9.086051] =================================
> [ 9.090393] [ INFO: inconsistent lock state ]
> [ 9.094744] 3.9.7-ajf-gc39503d #1 Not tainted
> [ 9.099087] ---------------------------------
> [ 9.103432] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 9.109431] scsi_eh_1/39 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 9.114642] (&(&host->lock)->rlock){?.+...}, at: [<c02f4168>]
> sata_fsl_interrupt+0x50/0x250
> [ 9.123137] {HARDIRQ-ON-W} state was registered at:
> [ 9.128004] [<c006cdb8>] lock_acquire+0x90/0xf4
> [ 9.132737] [<c043ef04>] _raw_spin_lock+0x34/0x4c
> [ 9.137645] [<c02f3560>] fsl_sata_set_irq_coalescing+0x68/0x100
> [ 9.143750] [<c02f36a0>] sata_fsl_init_controller+0xa8/0xc0
> [ 9.149505] [<c02f3f10>] sata_fsl_probe+0x17c/0x2e8
> [ 9.154568] [<c02acc90>] driver_probe_device+0x90/0x248
> [ 9.159987] [<c02acf0c>] __driver_attach+0xc4/0xc8
> [ 9.164964] [<c02aae74>] bus_for_each_dev+0x5c/0xa8
> [ 9.170028] [<c02ac218>] bus_add_driver+0x100/0x26c
> [ 9.175091] [<c02ad638>] driver_register+0x88/0x198
> [ 9.180155] [<c0003a24>] do_one_initcall+0x58/0x1b4
> [ 9.185226] [<c05aeeac>] kernel_init_freeable+0x118/0x1c0
> [ 9.190823] [<c0004110>] kernel_init+0x18/0x108
> [ 9.195542] [<c000f6b8>] ret_from_kernel_thread+0x64/0x6c
> [ 9.201142] irq event stamp: 160
> [ 9.204366] hardirqs last enabled at (159): [<c043f778>]
> _raw_spin_unlock_irq+0x30/0x50
> [ 9.212469] hardirqs last disabled at (160): [<c000f414>]
> reenable_mmu+0x30/0x88
> [ 9.219867] softirqs last enabled at (144): [<c002ae5c>]
> __do_softirq+0x168/0x218
> [ 9.227435] softirqs last disabled at (137): [<c002b0d4>]
> irq_exit+0xa8/0xb4
> [ 9.234481]
> [ 9.234481] other info that might help us debug this:
> [ 9.240995] Possible unsafe locking scenario:
> [ 9.240995]
> [ 9.246898] CPU0
> [ 9.249337] ----
> [ 9.251776] lock(&(&host->lock)->rlock);
> [ 9.255878] <Interrupt>
> [ 9.258492] lock(&(&host->lock)->rlock);
> [ 9.262765]
> [ 9.262765] *** DEADLOCK ***
You should get rid of this by changing spin_lock/unlock() in fsl_sata_set_irq_coalescing() to spin_lock_irqsave/restore()
-Bharat
> [ 9.262765]
> [ 9.268684] no locks held by scsi_eh_1/39.
> [ 9.272767]
> [ 9.272767] stack backtrace:
> [ 9.277117] Call Trace:
> [ 9.279589] [cfff9da0] [c0008504] show_stack+0x48/0x150 (unreliable)
> [ 9.285972] [cfff9de0] [c0447d5c] print_usage_bug.part.35+0x268/0x27c
> [ 9.292425] [cfff9e10] [c006ace4] mark_lock+0x2ac/0x658
> [ 9.297660] [cfff9e40] [c006b7e4] __lock_acquire+0x754/0x1840
> [ 9.303414] [cfff9ee0] [c006cdb8] lock_acquire+0x90/0xf4
> [ 9.308745] [cfff9f20] [c043ef04] _raw_spin_lock+0x34/0x4c
> [ 9.314250] [cfff9f30] [c02f4168] sata_fsl_interrupt+0x50/0x250
> [ 9.320187] [cfff9f70] [c0079ff0] handle_irq_event_percpu+0x90/0x254
> [ 9.326547] [cfff9fc0] [c007a1fc] handle_irq_event+0x48/0x78
> [ 9.332220] [cfff9fe0] [c007c95c] handle_level_irq+0x9c/0x104
> [ 9.337981] [cfff9ff0] [c000d978] call_handle_irq+0x18/0x28
> [ 9.343568] [cc7139f0] [c000608c] do_IRQ+0xf0/0x1a8
> [ 9.348464] [cc713a20] [c000fc8c] ret_from_except+0x0/0x14
> [ 9.353983] --- Exception: 501 at _raw_spin_unlock_irq+0x40/0x50
> [ 9.353983] LR = _raw_spin_unlock_irq+0x30/0x50
> [ 9.364839] [cc713af0] [c043db10] wait_for_common+0xac/0x188
> [ 9.370513] [cc713b30] [c02ddee4] ata_exec_internal_sg+0x2b0/0x4f0
> [ 9.376699] [cc713be0] [c02de18c] ata_exec_internal+0x68/0xa8
> [ 9.382454] [cc713c20] [c02de4b8] ata_dev_read_id+0x158/0x594
> [ 9.388205] [cc713ca0] [c02ec244] ata_eh_recover+0xd88/0x13d0
> [ 9.393962] [cc713d20] [c02f2520] sata_pmp_error_handler+0xc0/0x8ac
> [ 9.400234] [cc713dd0] [c02ecdc8] ata_scsi_port_error_handler+0x464/0x5e8
> [ 9.407023] [cc713e10] [c02ecfd0] ata_scsi_error+0x84/0xb8
> [ 9.412528] [cc713e40] [c02c4974] scsi_error_handler+0xd8/0x47c
> [ 9.418457] [cc713eb0] [c004737c] kthread+0xa8/0xac
> [ 9.423355] [cc713f40] [c000f6b8] ret_from_kernel_thread+0x64/0x6c
>
> A full set of kernel messages from a hanging (fails to reboot) session is here:
>
> http://scrye.com/~tkil/linux/fsl-sata-lockdep-201308/hang-log.txt
>
> And a full set of messages for a boot that reboots successfully:
>
> http://scrye.com/~tkil/linux/fsl-sata-lockdep-201308/no-hang-log.txt
>
> The associated config file:
>
> http://scrye.com/~tkil/linux/fsl-sata-lockdep-201308/config.txt
>
> The only addition I've made to this section of the kernel is my "SATA speed
> limit patch", discussed a few weeks back:
>
> http://permalink.gmane.org/gmane.linux.ports.ppc.embedded/58969
>
> That patch does touch sata_fsl_probe, but it just sets a value -- it doesn't do
> any locking (for better or for worse), nor does it call any other functions (and
> it seems that it's a function further down the stack that is triggering
> lockdep).
>
> I took a quick look at the diffs between 3.9.7 (both upstream and my
> variant) and the current head of linux-stable, and didn't see anything that
> looked relevant. The two main changes I saw were switching from dev_get_drvdata
> to platform_get_drvdata and adding the rx_watermark.
> I did see that dev_set_drvdata was removed from sata_fsl_probe's exit path; not
> sure if that could cause this sort of error.
>
> If anyone has ideas on how to avoid the reboot lockup, I would greatly
> appreciate it.
>
> Thank you for your time!
>
> Best regards,
> Anthony Foiani
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
More information about the Linuxppc-dev
mailing list