eeh bug

Benjamin Herrenschmidt benh at kernel.crashing.org
Thu May 17 14:59:06 EST 2007


On Thu, 2007-05-17 at 14:46 +1000, Benjamin Herrenschmidt wrote:
> Hi Linas !
> 
> While debugging some other issues, I had a couple of oopses caused by
> what looks like a bug in EEH:
> 
> When an RTAS PCI config space call returns all f's, we do an eeh error
> check by calling eeh_dn_check_failure(pdn->node, NULL);
> 
> The problem is that second argument... NULL for the pci_dev *. It looks
> like the EEH code will try to printk pci_name of that and later on
> dereference it within eehd, thus causing an oops.

Ok, so I just added a

	if (dev == NULL)
		dev = pdn->pcidev;

To eeh_dn_check_failure(), and that fixes one of the NULL (name
printing), but I get another one a bit later, in pci_find_capability
called from eeh_slot_error_detail called from handle_eeh_events.
(Probably in gather_pci_data).

One thing that looks suspicions is that just before that I see:

EEH: of node=/pci/@8000000200000d3/pci at 2,4

Which is not a device but the bridge above it... not sure why, maybe we
have a NULL pdn->pcidev at that level.. we should probably not sure
pci_find_capability in that code anyway and implent our own version
using RTAS in case we don't have a pci_dev around, don't you think ?

Cheers,
Ben.





More information about the Linuxppc-dev mailing list