[PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)

Fri Feb 25 01:56:56 EST 2005

Linas Vepstas wrote:
>>>+static void ipr_eeh_frozen (struct pci_dev *pdev, void * data)
>>>+{
>>>+	struct ipr_ioa_cfg *ioa_cfg = data;
>>
>>Probably don't need the second arg - void * data. You can get the
>>ioa_cfg pointer with pci_get_drvdata(pdev)
> 
> 
> OK, well, that's worth general discussion.  For the IPR, no,
> it may not be needed.  For general C-based OO style coding,
> this is the standard style for passing "user data" aka 
> "pointer to 'self'" aka "pointer to 'this'".  I notice that 
> more and more OO style is creeping into the kernel, and so 
> I thought I'd add the standard convention for this here.  
> 
> I note that this is *not* the convention currently used in 
> struct pci_driver; it uses the "pci_get_drvdata(pdev)" style,
> but that is frowned upon in standard OO-style circles.

I think we will all agree we are not working in a standard OO-style
circle;)

>>>+static void ipr_eeh_perm_failure (struct pci_dev *pdev, void * data)
>>>+{
>>>+	ipr_cmd->job_step = ipr_reset_shutdown_ioa;
>>
>>This needs to "bringdown" the adapter, but not actually touch it. Basically
>>stuff like unblocking requests so that we fail them instead of hanging, etc.
> 
> 
> Yes.  Actually, right after this, I unconfig the pci slot,
> which calls pci_remove_bus_device() .. pci_destroy_dev() .. 
> and eventually pci_driver->remove() and so IPR finds out about this
> sooner or later anyway.  The goal of this function was to provide
> an alternate and/or earlier warning that the device is going away.
> 
> Maybe its superfluous.  I tend to add callbacks like this because
> I know that sooner or later someone will want one ... 

Ok. I wasn't sure about that. It in that case, I think its fine to leave
the callback in. I still think the perm_failure handler should do
the bringdown in ipr, however.

>>>+#else
>>>+	if (!ipr_reset_allowed(ioa_cfg) && ipr_cmd->u.time_left 
>>>+	    && !eeh_slot_is_isolated (ioa_cfg->pdev)) {
>>>+			  
>>>+		ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT;
>>>+		ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT);
>>>+	} else {
>>>+		if (eeh_slot_is_isolated (ioa_cfg->pdev)) {
>>>+			ipr_cmd->job_step = ipr_reset_poll_eeh_recovery;
>>>+		} else {
>>>+			ipr_cmd->job_step = ipr_reset_start_bist;
>>>+		}
>>>+		rc = IPR_RC_JOB_CONTINUE;
>>>+	}
>>>+#endif
>>
>>Not sure what you are trying to accomplish with this bit of code. Does
>>not seem necessary to me.
> 
> 
> I want to make sure that the IPR driver holds off from starting the BIST
> until after the PCI slot has been reset.  The callback notification
> given above is async to the detection of the EEH error, and so the IPR
> driver may have already decided that something is wrong, and started a
> bist, before the "slot is frozen" callback arrives.  So I wanted to
> prevent this.
> 
> In other words, the IPR "bist" code should make sure that EEH is not to
> blame, and, if it is, wait for the EEH error to clear before continuing
> with the reset.

I still don't think it is needed, given the changes I suggested in my previous
mail. With the changes I proposed, ipr could detect a problem and start an adapter
reset before the freeze callback arrives, but that should do no harm. When
the freeze callback does come, it will start a new reset job and the old
one will get aborted.

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center