[PATCH] Bugfix: powerpc/eeh: Wrong place to call pci_get_slot()
Mike Qiu
qiudayu at linux.vnet.ibm.com
Mon Jul 14 23:35:18 EST 2014
On 07/14/2014 09:01 PM, Gavin Shan wrote:
> On Mon, Jul 14, 2014 at 04:19:23AM -0400, Mike Qiu wrote:
>> [ 121.133381] WARNING: at drivers/pci/search.c:223
>> [ 121.133422] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72
>> [ 121.133424] task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000
>> [ 121.133425] NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114
>> [ 121.133427] REGS: c000000001446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+)
>> [ 121.133428] MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 48002422 XER: 20000000
>> [ 121.133433] CFAR: c00000000003752c SOFTE: 0
>> GPR00: c000000000037530 c000000001447220 c000000001448c30 c0000003bca1dc00
>> GPR04: 0000000000000000 c000000000066064 9000000000009032 0000000000000008
>> GPR08: 0000000000000007 0000000000000001 0000000000000100 000000003003d200
>> GPR12: 0000000044002482 c00000000fee0000 0000000000000000 c0000000015e8830
>> GPR16: c0000000015e8c30 0000000000000000 c0000000015e8430 c0000000015e8030
>> GPR20: c000000001348c30 c000000001482180 0000000000000000 0000000000000000
>> GPR24: 0000000000200200 c0000003bc243500 c0000003feff4070 c0000003bcec3000
>> GPR28: c0000000014cac00 c0000003bca1dc00 0000000000000000 0000000000000000
>> [ 121.133454] NIP [c000000000497b70] .pci_get_slot+0x40/0x110
>> [ 121.133457] LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190
>> [ 121.133458] Call Trace:
>> [ 121.133461] [c000000001447220] [c000000000721730] .of_get_property+0x30/0x60 (unreliable)
>> [ 121.133464] [c0000000014472b0] [c000000000037530] .eeh_pe_loc_get+0x150/0x190
>> [ 121.133466] [c000000001447340] [c000000000034684] .eeh_dev_check_failure+0x1b4/0x550
>> [ 121.133468] [c0000000014473f0] [c000000000034ab0] .eeh_check_failure+0x90/0xf0
>> [ 121.133493] [c000000001447490] [d000000002c03e84] .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc]
>> [ 121.133501] [c000000001447520] [d000000002c041a4] .lpfc_poll_eratt+0x64/0x100 [lpfc]
>> [ 121.133504] [c0000000014475a0] [c0000000000b45b4] .call_timer_fn+0x64/0x190
>> [ 121.133506] [c000000001447650] [c0000000000b4d1c] .run_timer_softirq+0x2cc/0x3e0
>> [ 121.133508] [c000000001447760] [c0000000000a90c8] .__do_softirq+0x198/0x3c0
>> [ 121.133510] [c000000001447880] [c0000000000a9658] .irq_exit+0xc8/0x110
>> [ 121.133513] [c000000001447900] [c00000000001e010] .timer_interrupt+0xa0/0xe0
>> [ 121.133515] [c000000001447980] [c0000000000026d8] decrementer_common+0x158/0x180
>> [ 121.133518] --- Exception: 901 at .arch_local_irq_restore+0x74/0x90
>>
>> pci_get_slot() should not be used in interrupt. But eeh subsystem do
>> the error checking in interrupt in this situation.
>>
>> This patch is to solve this issue.
>>
> The commit log has been clear enough, but the following message might
> be better. I'm not good at writing good commit log as well:
>
> ---
>
> pci_get_slot() is called with hold of PCI bus semaphore and it's not
> safe to be called in interrupt context. However, we possibly checks
> EEH error and calls the function in interrupt context. To avoid using
> pci_get_slot(), we turn into device tree for fetching location code.
> Otherwise, we might run into WARN_ON() as following messages indicate:
>
> WARNING: at drivers/pci/search.c:223
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72
> task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000
> NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114
> REGS: c000000001446fa0 TRAP: 0700 Not tainted (3.16.0-rc3+)
> MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 48002422 XER: 20000000
> CFAR: c00000000003752c SOFTE: 0
> :
> NIP [c000000000497b70] .pci_get_slot+0x40/0x110
> LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190
> Call Trace:
> .of_get_property+0x30/0x60 (unreliable)
> .eeh_pe_loc_get+0x150/0x190
> .eeh_dev_check_failure+0x1b4/0x550
> .eeh_check_failure+0x90/0xf0
> .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc]
> .lpfc_poll_eratt+0x64/0x100 [lpfc]
> .call_timer_fn+0x64/0x190
> .run_timer_softirq+0x2cc/0x3e0
>
Yes, it's better enough.
>> Signed-off-by: Mike Qiu <qiudayu at linux.vnet.ibm.com>
>> ---
>> arch/powerpc/kernel/eeh_pe.c | 29 ++++++++++++++++++++++++++---
>> 1 file changed, 26 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
>> index fbd01eb..6f4bfee 100644
>> --- a/arch/powerpc/kernel/eeh_pe.c
>> +++ b/arch/powerpc/kernel/eeh_pe.c
>> @@ -792,6 +792,28 @@ void eeh_pe_restore_bars(struct eeh_pe *pe)
>> }
>>
>> /**
>> + * __dn_get_pdev - Retrieve the pci_dev from device_node by bus/devfn
>> + * @dn: device_node of the pci_dev
>> + * @data: the pci device's bus/devfn
>> + *
>> + * Retrieve the pci_dev using the given device_node and bus/devfn.
>> + */
>> +void *__dn_get_pdev(struct device_node *dn, void *data)
>> +{
> The function isn't necessarily public. "static" is enough, I think.
> I don't think we need this actually. Please refer to more comments
> below.
>
>> + struct pci_dn *pdn = PCI_DN(dn);
>> + int busno = *((int *)data) >> 8;
>> + int devfn = *((int *)data) & 0xff;
>> +
>> + if (!pdn)
>> + return NULL;
>> +
>> + if (pdn->busno == busno && pdn->devfn == devfn)
>> + return pdn->pcidev;
>> +
>> + return NULL;
>> +}
>> +
>> +/**
>> * eeh_pe_loc_get - Retrieve location code binding to the given PE
>> * @pe: EEH PE
>> *
>> @@ -807,6 +829,7 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe)
>> struct pci_dev *pdev;
>> struct device_node *dn;
>> const char *loc;
>> + int bdevfn;
>>
>> if (!bus)
>> return "N/A";
>> @@ -823,7 +846,9 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe)
>> if (loc)
>> return loc;
>>
>> - pdev = pci_get_slot(bus, 0x0);
>> + /* Get the root port */
>> + bdevfn = (bus->number) << 8 || 0x0;
>> + pdev = traverse_pci_devices(hose->dn, __dn_get_pdev, &bdevfn);
> We needn't search pdev from device-tree and then translate it to
> device-node. Root port is the only child hooked to root bus's
> device node (it's also PHB's device-node). So I guess you can
So here you mean, child hooked device node(root port) can be
hose->dn->child?
> just have something:
>
> /* Check PHB's device node */
> dn = pci_bus_to_OF_node(bus);
> if (unlikely(!dn)) {
> loc = "N/A";
> goto out;
> }
> loc = of_get_property(hose->dn,
> "ibm,loc-code", NULL);
> if (loc)
> return loc;
> loc = of_get_property(hose->dn,
> "ibm,io-base-loc-code", NULL);
> if (loc)
> return loc;
>
> /* Check root port */
> dn = dn->child;
>> } else {
>> pdev = bus->self;
> Here, we needn't grab the bridge as well:
>
> dn = pci_bus_to_OF_node(bus);
>> }
> Then check the device-node of bridge (or root port):
>
> if (unlikely(!dn)) {
> loc = "N/A";
> goto out;
> }
>
> loc = of_get_property(dn, "ibm,loc-code", NULL);
> if (!loc)
> loc = of_get_property(dn, "ibm,slot-location-code", NULL);
> if (!loc)
> loc = "N/A";
>
>> @@ -846,8 +871,6 @@ const char *eeh_pe_loc_get(struct eeh_pe *pe)
>> loc = "N/A";
>>
>> out:
>> - if (pci_is_root_bus(bus) && pdev)
>> - pci_dev_put(pdev);
>> return loc;
>> }
I will make a new patch, after tested, I will resend it out
Thanks
Mike
> Thanks,
> Gavin
>
More information about the Linuxppc-dev
mailing list