Gothic horrors in pci_dn.c

Benjamin Herrenschmidt benh at kernel.crashing.org
Thu Oct 7 18:30:09 EST 2004


Hi !

To all those who had to deal with the guts of the PCI layer on ppc64,
I'd like your comments about these and what do you think I may break.

Currently, the code in pci_dn.c does 2 things articulated around a
single function. That function is traverse_pci_devices() and is supposed
to traverse the PCI tree exposed by Open Firmware and call back a
function passed as an argument for each node in the tree.

The 2 things it does are

 - Setting up the "devfn" and "busno" fields of the device nodes in the
tree in an initial traversal pass at boot
 - "Finding" a device node for a given pci_dev at any time

However, the current code does a number of assumptions and is bogus in
many cases. Among the issues are:

 - The tree traversal goes all the way down the tree only skipping
things that don't have a "class code". That means potentially walking on
subtrees of a PCI device that aren't PCI themselves (USB ? FireWire ?)
and we have no guarantee that those busses have no "class-code"
property, though we are sure to misinterpret anything we find here.

 -  We try to manipulate host bridge nodes as if they were PCI devices,
which leads us to various funny and totally bogus special cases. First,
in update_dn_pci_info(), where we have an "intersting" (at least)
heuristic to find out if a node if a host bridge or not, with an
horrible special case for avoiding setting the devfn 0 on U3 on blades,
and then we "use" those devfn and busno of the host bridge property in
is_devfn_node() later on when trying to match which is why we have to do
the above bogus workaround.

 - Our firmware (and Apple's too in some cases) is broken in the sense
that it doesn't show the host bridge in the tree as a PCI device. Host
bridges that are themselves visible as devices on their own PCI bus
should have an additional node in the PCI domain named "host" that
represent them.

The solution to this however is very simple, but I need to make sure I
won't break anything else by doing so. It's based on a few facts:

 - The "node" of the host bridge is _NOT_ a PCI node, and thus should
not be traversed by traverse_pci_devices(). This is very easy to do
without any assumption due to the way this function works, just remove 2
lines near the beginning before the for loop.

 - The result for the update_dn_pci_info() pass is that we can rip off
the workaround completely. busno and devfn in the host bridge node are
undefined and that how they should be as they won't be traversed. There
is no "driver" for the host bridge that should make use of them.

 - Same thing with is_devfn_node().

 - We initialize "sysdata" of all pci_dev to point the the host bridge
by default. So if the host bridge happens to have an associated pci_dev,
and no "specific" node (as explained above), then we'll point to the
root node of that pci tree which is exactly what we want, cool !

 - Now the only remaining problem is the test 

	if (dn->devfn == dev->devfn && dn->busno == (dev->bus->number&0xff))

Which will result in incorrect result if the host bridge has undefined
(and typically 0) values in devfn and busno fields and the device we are
looking for happens to really be 0:00.0. This is fixed by forcing those
fields on all PHB nodes to -1. (No special U3 case, all of them).

Here's a patch (untested, it's getting late here) implementing those, I
need to know if it will work at all. Comments welcome :)

Note to Anton & Milton: Pretty much nothing relies anymore on the device
nodes for PCI devices to exist. The only mandatory ones are PHBs, but you
can easily statically lay them out in a static device-tree blob for BM.
By default, all pci_dev point to the PHB. I have a couple of fixes coming
in for u3_iommu to properly setup iommu_table for PHB nodes (it forgot to
do it) and I confirm it works with no OF nodes for the devices themselves.
Config space accesses never need the OF node neither except when you have
RTAS, but then you don't care since you have real nodes for everything.
I added a simple helper to my tree (will be pushed after 2.6.9) that gives
you the pci_controller* from the pci_dev* without doing a full device-tree
walk, and I use that for pmac & maple. You should do the same for PM.

Ben.

===== arch/ppc64/kernel/pci_dn.c 1.18 vs edited =====
--- 1.18/arch/ppc64/kernel/pci_dn.c	2004-10-05 17:24:47 +10:00
+++ edited/arch/ppc64/kernel/pci_dn.c	2004-10-07 18:35:41 +10:00
@@ -46,28 +46,13 @@
 {
 	struct pci_controller *phb = data;
 	u32 *regs;
-	char *device_type = get_property(dn, "device_type", NULL);
-	char *model;
 
 	dn->phb = phb;
-	if (device_type && (strcmp(device_type, "pci") == 0) &&
-			(get_property(dn, "class-code", NULL) == 0)) {
-		/* special case for PHB's.  Sigh. */
-		regs = (u32 *)get_property(dn, "bus-range", NULL);
-		dn->busno = regs[0];
-
-		model = (char *)get_property(dn, "model", NULL);
-		if (model && strstr(model, "U3"))
-			dn->devfn = -1;
-		else
-			dn->devfn = 0;	/* assumption */
-	} else {
-		regs = (u32 *)get_property(dn, "reg", NULL);
-		if (regs) {
-			/* First register entry is addr (00BBSS00)  */
-			dn->busno = (regs[0] >> 16) & 0xff;
-			dn->devfn = (regs[0] >> 8) & 0xff;
-		}
+	regs = (u32 *)get_property(dn, "reg", NULL);
+	if (regs) {
+		/* First register entry is addr (00BBSS00)  */
+		dn->busno = (regs[0] >> 16) & 0xff;
+		dn->devfn = (regs[0] >> 8) & 0xff;
 	}
 	return NULL;
 }
@@ -96,20 +81,25 @@
 	struct device_node *dn, *nextdn;
 	void *ret;
 
-	if (pre && ((ret = pre(start, data)) != NULL))
-		return ret;
+	/* We started with a phb, iterate all childs */
 	for (dn = start->child; dn; dn = nextdn) {
+		u32 *classp, class;
+
 		nextdn = NULL;
-		if (get_property(dn, "class-code", NULL)) {
-			if (pre && ((ret = pre(dn, data)) != NULL))
-				return ret;
-			if (dn->child)
-				/* Depth first...do children */
-				nextdn = dn->child;
-			else if (dn->sibling)
-				/* ok, try next sibling instead. */
-				nextdn = dn->sibling;
-		}
+		classp = (u32 *)get_property(dn, "class-code", NULL);
+		class = classp ? *classp : 0;
+
+		if (pre && ((ret = pre(dn, data)) != NULL))
+			return ret;
+
+		/* If we are a PCI bridge, go down */
+		if (dn->child && (class >> 8) == PCI_CLASS_BRIDGE_PCI &&
+		    (class >> 8) == PCI_CLASS_BRIDGE_CARDBUS)
+			/* Depth first...do children */
+			nextdn = dn->child;
+		else if (dn->sibling)
+			/* ok, try next sibling instead. */
+			nextdn = dn->sibling;
 		if (!nextdn) {
 			/* Walk up to next valid sibling. */
 			do {
@@ -123,21 +113,6 @@
 	return NULL;
 }
 
-/*
- * Same as traverse_pci_devices except this does it for all phbs.
- */
-static void *traverse_all_pci_devices(traverse_func pre)
-{
-	struct pci_controller *phb, *tmp;
-	void *ret;
-
-	list_for_each_entry_safe(phb, tmp, &hose_list, list_node)
-		if ((ret = traverse_pci_devices(phb->arch_data, pre, phb))
-				!= NULL)
-			return ret;
-	return NULL;
-}
-
 
 /*
  * Traversal func that looks for a <busno,devfcn> value.
@@ -147,6 +122,7 @@
 {
 	int busno = ((unsigned long)data >> 8) & 0xff;
 	int devfn = ((unsigned long)data) & 0xff;
+
 	return ((devfn == dn->devfn) && (busno == dn->busno)) ? dn : NULL;
 }
 
@@ -173,10 +149,8 @@
 
 	phb_dn = phb->arch_data;
 	dn = traverse_pci_devices(phb_dn, is_devfn_node, (void *)searchval);
-	if (dn) {
+	if (dn)
 		dev->sysdata = dn;
-		/* ToDo: call some device init hook here */
-	}
 	return dn;
 }
 EXPORT_SYMBOL(fetch_dev_dn);
@@ -188,8 +162,16 @@
  */
 void __init pci_devs_phb_init(void)
 {
+	struct pci_controller *phb, *tmp;
+
 	/* This must be done first so the device nodes have valid pci info! */
-	traverse_all_pci_devices(update_dn_pci_info);
+	list_for_each_entry_safe(phb, tmp, &hose_list, list_node) {
+		struct device_node * dn = (struct device_node *) phb->arch_data;
+		/* PHB nodes themselves must not match */
+		dn->devfn = dn->busno = -1;
+		dn->phb = phb;
+		traverse_pci_devices(phb->arch_data, update_dn_pci_info, phb);
+	}
 }
 
 





More information about the Linuxppc64-dev mailing list