PCI Error Recovery API Proposal (updated)

long tlnguyen at snoqualmie.dp.intel.com
Fri Apr 8 06:00:34 EST 2005


On Wed Apr  6 17:45:42 2005 Benjamin Herrenschmidt wrote:
>> >	3) link_reset()
>> >
>> >	This is called after the link has been reset. This is typically
>> >a PCI Express specific state at this point and is done wether a non fatal
>> >error has been detected that can be "solved" by resetting the link. The
>> >driver is informed here of that reset and should check if the device
>> >appears to be in working condition. This function acts a bit like 2)
>> >error_recover(), that is it is not supposed to restart normal driver IO
>> >operations right away, just "probe" the device to check it's 
>> >recoverability status. If all is right, then the core will call
>> >error_restart() once all driver have ack'd link_reset().
>> 
>> API 3) is not like error_recover(). This is basically a PCI Express
>> specific when a fatal error has been reported to the Root Port. This
>> fatal error can be "solved" by resetting the link at upstream port
>> associated with a hierarchy in question. An upstream port driver is informed
>> here to reset its link to return to reliable. After a completion of link
>> reset, we go to 4) and 5). Please change your description accordingly. 
>
>Wait ... Once you have reset the link, you call 3). At this point, the
>card should be operational again right ? That is, the next callback
>should be 5) not 4). Unless the driver here decides it can't recover and
>need a full hard reset of the slot (which is a different thing) and thus
>you end up power cycling the slot and go to 4).
>
>That is, in this regard, the action of a driver in 3) is similar to the
>action of a driver in "recover", in that sense that the link has been
>reset, the card might not (depending on wether the link reset triggers a
>card reset or not, this is device specific, the driver will know what to
>do) and can recover from it. The next step to expect is 5). Did I get
>something wrong ? 

Thanks for clearifying the callback after completion of link reset.
Regarding the callback of API 3), AER code makes the callback API 3)
to reset the link. Once the upstream Port driver completes link reset
with a return of PCIERR_RESULT_RECOVERED, meaning the link fully
operational. AER code makes API 2) callback to the downstream device
driver. If the downstream driver returns PCIERR_RESULT_RECOVERED,
the next callback is API 5). This is my understanding of how things
should work, please correct me if mistaken.

Thanks,
Long



More information about the Linuxppc64-dev mailing list