eeh bug
Benjamin Herrenschmidt
benh at kernel.crashing.org
Thu May 17 14:59:06 EST 2007
On Thu, 2007-05-17 at 14:46 +1000, Benjamin Herrenschmidt wrote:
> Hi Linas !
>
> While debugging some other issues, I had a couple of oopses caused by
> what looks like a bug in EEH:
>
> When an RTAS PCI config space call returns all f's, we do an eeh error
> check by calling eeh_dn_check_failure(pdn->node, NULL);
>
> The problem is that second argument... NULL for the pci_dev *. It looks
> like the EEH code will try to printk pci_name of that and later on
> dereference it within eehd, thus causing an oops.
Ok, so I just added a
if (dev == NULL)
dev = pdn->pcidev;
To eeh_dn_check_failure(), and that fixes one of the NULL (name
printing), but I get another one a bit later, in pci_find_capability
called from eeh_slot_error_detail called from handle_eeh_events.
(Probably in gather_pci_data).
One thing that looks suspicions is that just before that I see:
EEH: of node=/pci/@8000000200000d3/pci at 2,4
Which is not a device but the bridge above it... not sure why, maybe we
have a NULL pdn->pcidev at that level.. we should probably not sure
pci_find_capability in that code anyway and implent our own version
using RTAS in case we don't have a pci_dev around, don't you think ?
Cheers,
Ben.
More information about the Linuxppc-dev
mailing list