[PATCH] Don't panic when EEH_MAX_FAILS is exceeded
Nathan Lynch
ntl at pobox.com
Mon Jul 21 06:33:33 EST 2008
Mike Mason wrote:
>
> This patch changes the EEH_MAX_FAILS action from panic to printing
> an error message. Panicking under under this condition is too
> harsh. Although performance will be affected and the device may not
> recover, the system is still running, which at the very least,
> should allow for a more graceful shutdown. The panic() is now
> wrapped in a DEBUG statement for development purposes. The patch
> also removes the msleep() within a spinlock, which is not allowed.
> @@ -509,18 +510,24 @@
For ease of review, please try to use diff -p to generate patches.
> rc = 1;
> if (pdn->eeh_mode & EEH_MODE_ISOLATED) {
> pdn->eeh_check_count ++;
> - if (pdn->eeh_check_count >= EEH_MAX_FAILS) {
> - printk (KERN_ERR "EEH: Device driver ignored %d bad reads, panicing\n",
> - pdn->eeh_check_count);
> + if (pdn->eeh_check_count % EEH_MAX_FAILS == 0) {
> + location = (char *) of_get_property(dn, "ibm,loc-code", NULL);
Unneeded cast here, I think.
> + printk (KERN_ERR "EEH: %d reads ignored for recovering device at "
> + "location=%s driver=%s pci addr=%s\n",
> + pdn->eeh_check_count, location,
> + dev->driver->name, pci_name(dev));
> + printk (KERN_ERR "EEH: Might be infinite loop in %s driver\n",
> + dev->driver->name);
> +#ifdef DEBUG
> dump_stack();
> - msleep(5000);
> -
> +
> /* re-read the slot reset state */
> if (read_slot_reset_state(pdn, rets) != 0)
> rets[0] = -1; /* reset state unknown */
>
> /* If we are here, then we hit an infinite loop. Stop. */
> panic("EEH: MMIO halt (%d) on device:%s\n", rets[0], pci_name(dev));
> +#endif
While I tend to agree that panic() is unnecessary, don't we want a
stack dump unconditionally (i.e. not bracketed in #ifdef DEBUG)?
I'd prefer just removing the code instead of adding #ifdef's in the
middle of this function. eeh.c needs less #ifdef DEBUG, not more :)
More information about the Linuxppc-dev
mailing list