[RFC/PATCH] Re: How to block pci config-reads during device self-test?
Linas Vepstas
linas at denx.de
Wed Jul 14 09:03:28 EST 2004
Hi,
On Thu, Jul 08, 2004 at 11:36:28AM -0500, Brian King was heard to remark:
> I've been doing some talking with various hardware folks to see if
> there is a way to get this fixed and so far the answer I have been
> getting is no. It is how the hardware works and there isn't much that
> can be done about it ...
Actually, the hardware isn't broken ... its working as designed, more
or less per pci spec. See below for techninical details.
> I don't like the idea of "learning to live with this". People do
> run into this problem and telling them it can't be fixed is not an
> acceptable answer.
>
> linas at austin.ibm.com wrote:
> >
> >Am having trouble with PCI config-space reads ... I have a device
> >(actualy Brian King has it) that can perform a built-in-self test
> >(BIST). However, if anything does a PCI config-read during BIST, then
> >the device does something crazy that makes the PCI controller chip
> >take it offline.
actually, the device didn't do anything 'crazy' actually, it just
sat there silently, ignoring the i/o.
> >I'm not sure what's doing the config-spcae reads ... seems
> >to be some user-space tool or daemon. I'm wondering if
> >there is any practical way to block such reads to a given
> >device until its self-test sequence is completed. I could
> >try to modify the architecture-specific pci files to do this
> >(arch/ppc64/kernel/pSeries_pci.c) but this seems a tad ugly ... is
> >there another way? or do we have to just learn to live with this
> >hardware?
Here's a patch w/ a recap description of the problem.
The following patch addresses a peculiar PCI i/o problem.
This patch is specific to ppc64, although one could argue that
a better patch should probably be arch indep. But there's also
a different way to fix this problem, which is why this is an RFC.
The problem:
Some device drivers use the PCI BIST (built-in self-test) to reset
the PCI card if the card hangs in some error condition. While the
BIST is running, the card, almost by definiition, is unable to respond
to pci i/o. If the pci controller tries to do any i/o, the card simply
won't respond, and the pci controller will do a 'master abort' a few
pci bus cycles later. This is normal and expected behaviour for both
card and the controller.
The problem that I'm having is that I have a pci controller that
thinks master-abort==dead card, and so offlines the card. Which
is actually reasonable too ... no one should be doing i/o during BIST
anyway. Unfortunately various user-space daemons & tools can
do pci config-space i/o at anytime (typically to perform a hardware
inventory) ... and invariably one of these user-land tools runs while
the card is doing a BIST; the pci controller then off-lines this "dead
card", and we are unhappy.
There are two possible solutions. (1) block pci config-space i/o
during a BIST, and (2) let the device driver detect that the card
has been off-lined, and so reset the pci controller chip.
I'm starting to think (2) is better, but this patch implements (1).
Opinions? Comments?
The patch below uses rw semaphores to block i/o access during
BIST. All i/o access is locked with a read semaphore: thus, access
remains fast-n-cheap. If the driver needs to BIST, it will then need
to do a down_write()/up_write() surrounding the BIST.
--linas
-------------- next part --------------
===== arch/ppc64/kernel/pSeries_pci.c 1.39 vs edited =====
--- 1.39/arch/ppc64/kernel/pSeries_pci.c Mon Jul 12 18:29:16 2004
+++ edited/arch/ppc64/kernel/pSeries_pci.c Tue Jul 13 15:46:08 2004
@@ -76,14 +76,22 @@
addr = (dn->busno << 16) | (dn->devfn << 8) | where;
buid = dn->phb->buid;
+
+ if (in_interrupt()) {
+ /* Driver should unplug interrupts during BIST */
+ BUG_ON (down_read_trylock(&dn->bist_lock) != 1);
+ } else {
+ down_read (&dn->bist_lock);
+ }
if (buid) {
ret = rtas_call(ibm_read_pci_config, 4, 2, &returnval,
addr, buid >> 32, buid & 0xffffffff, size);
} else {
ret = rtas_call(read_pci_config, 2, 2, &returnval, addr, size);
}
- *val = returnval;
+ up_read (&dn->bist_lock);
+ *val = returnval;
if (ret)
return PCIBIOS_DEVICE_NOT_FOUND;
@@ -129,11 +137,19 @@
addr = (dn->busno << 16) | (dn->devfn << 8) | where;
buid = dn->phb->buid;
+
+ if (in_interrupt()) {
+ /* Driver should unplug interrupts during BIST */
+ BUG_ON (down_read_trylock(&dn->bist_lock) != 1);
+ } else {
+ down_read (&dn->bist_lock);
+ }
if (buid) {
ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr, buid >> 32, buid & 0xffffffff, size, (ulong) val);
} else {
ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, (ulong)val);
}
+ up_read (&dn->bist_lock);
if (ret)
return PCIBIOS_DEVICE_NOT_FOUND;
===== arch/ppc64/kernel/prom.c 1.89 vs edited =====
--- 1.89/arch/ppc64/kernel/prom.c Tue Jul 13 15:26:58 2004
+++ edited/arch/ppc64/kernel/prom.c Tue Jul 13 16:03:01 2004
@@ -31,6 +31,7 @@
#include <linux/stringify.h>
#include <linux/delay.h>
#include <linux/initrd.h>
+#include <linux/rwsem.h>
#include <asm/prom.h>
#include <asm/rtas.h>
#include <asm/lmb.h>
@@ -3021,6 +3022,7 @@
kfree(np);
return err;
}
+ init_rwsem (&np->bist_lock);
write_lock(&devtree_lock);
np->sibling = np->parent->child;
===== include/asm-ppc64/prom.h 1.20 vs edited =====
--- 1.20/include/asm-ppc64/prom.h Thu May 27 20:37:20 2004
+++ edited/include/asm-ppc64/prom.h Tue Jul 13 15:45:28 2004
@@ -16,6 +16,7 @@
*/
#include <linux/proc_fs.h>
#include <asm/atomic.h>
+#include <asm/rwsem.h>
#define PTRRELOC(x) ((typeof(x))((unsigned long)(x) - offset))
#define PTRUNRELOC(x) ((typeof(x))((unsigned long)(x) + offset))
@@ -151,6 +152,12 @@
int busno; /* for pci devices */
int bussubno; /* for pci devices */
int devfn; /* for pci devices */
+
+ /* Lock to prevent 3rd-party PCI config space i/o while device
+ * driver is running BIST on the controller; driver should obtain
+ * down_write()/up_write() in this lock for the duration of BIST. */
+ struct rw_semaphore bist_lock;
+
#define DN_STATUS_BIST_FAILED (1<<0)
int status; /* Current device status (non-zero is bad) */
int eeh_mode; /* See eeh.h for possible EEH_MODEs */
More information about the Linuxppc64-dev
mailing list