[PATCH/RFC] ppc64: EEH + SCSI recovery (IPR only)
Brian King
brking at us.ibm.com
Fri Feb 25 01:56:56 EST 2005
Linas Vepstas wrote:
>>>+static void ipr_eeh_frozen (struct pci_dev *pdev, void * data)
>>>+{
>>>+ struct ipr_ioa_cfg *ioa_cfg = data;
>>
>>Probably don't need the second arg - void * data. You can get the
>>ioa_cfg pointer with pci_get_drvdata(pdev)
>
>
> OK, well, that's worth general discussion. For the IPR, no,
> it may not be needed. For general C-based OO style coding,
> this is the standard style for passing "user data" aka
> "pointer to 'self'" aka "pointer to 'this'". I notice that
> more and more OO style is creeping into the kernel, and so
> I thought I'd add the standard convention for this here.
>
> I note that this is *not* the convention currently used in
> struct pci_driver; it uses the "pci_get_drvdata(pdev)" style,
> but that is frowned upon in standard OO-style circles.
I think we will all agree we are not working in a standard OO-style
circle;)
>>>+static void ipr_eeh_perm_failure (struct pci_dev *pdev, void * data)
>>>+{
>>>+ ipr_cmd->job_step = ipr_reset_shutdown_ioa;
>>
>>This needs to "bringdown" the adapter, but not actually touch it. Basically
>>stuff like unblocking requests so that we fail them instead of hanging, etc.
>
>
> Yes. Actually, right after this, I unconfig the pci slot,
> which calls pci_remove_bus_device() .. pci_destroy_dev() ..
> and eventually pci_driver->remove() and so IPR finds out about this
> sooner or later anyway. The goal of this function was to provide
> an alternate and/or earlier warning that the device is going away.
>
> Maybe its superfluous. I tend to add callbacks like this because
> I know that sooner or later someone will want one ...
Ok. I wasn't sure about that. It in that case, I think its fine to leave
the callback in. I still think the perm_failure handler should do
the bringdown in ipr, however.
>>>+#else
>>>+ if (!ipr_reset_allowed(ioa_cfg) && ipr_cmd->u.time_left
>>>+ && !eeh_slot_is_isolated (ioa_cfg->pdev)) {
>>>+
>>>+ ipr_cmd->u.time_left -= IPR_CHECK_FOR_RESET_TIMEOUT;
>>>+ ipr_reset_start_timer(ipr_cmd, IPR_CHECK_FOR_RESET_TIMEOUT);
>>>+ } else {
>>>+ if (eeh_slot_is_isolated (ioa_cfg->pdev)) {
>>>+ ipr_cmd->job_step = ipr_reset_poll_eeh_recovery;
>>>+ } else {
>>>+ ipr_cmd->job_step = ipr_reset_start_bist;
>>>+ }
>>>+ rc = IPR_RC_JOB_CONTINUE;
>>>+ }
>>>+#endif
>>
>>Not sure what you are trying to accomplish with this bit of code. Does
>>not seem necessary to me.
>
>
> I want to make sure that the IPR driver holds off from starting the BIST
> until after the PCI slot has been reset. The callback notification
> given above is async to the detection of the EEH error, and so the IPR
> driver may have already decided that something is wrong, and started a
> bist, before the "slot is frozen" callback arrives. So I wanted to
> prevent this.
>
> In other words, the IPR "bist" code should make sure that EEH is not to
> blame, and, if it is, wait for the EEH error to clear before continuing
> with the reset.
I still don't think it is needed, given the changes I suggested in my previous
mail. With the changes I proposed, ipr could detect a problem and start an adapter
reset before the freeze callback arrives, but that should do no harm. When
the freeze callback does come, it will start a new reset job and the old
one will get aborted.
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
More information about the Linuxppc64-dev
mailing list