[Skiboot] [RFC PATCH] pci: Change bus number assignment policy

Oliver O'Halloran oohall at gmail.com
Tue Mar 12 17:57:24 AEDT 2019


Change the way we assign bus numbers so that rather than assigning
bus numbers densely, we spread the available bus numbers across the
downstream bridges of a given topology.

For PCIe topologies with a wide-fan out this allows hot-plugging
large numbers of devices into a downstream port near the top of the
PCIe topology.

This patch changes the bus number assignment policy to evenly split the
available bus numbers across the bridges on bus. This allows us to
support hot-adding of large numbers of devices near the root of the
topology. Currently we only assign a handful of spare buses to each
hotplug port so if a large number of devices are added at the root
we won't have enough spare bus numbers for them.

e.g. On a zaius system with a two-switch (HBA switch and drawer switch) topology:

Before:

-+-[0030:00]---00.0-[01-40]----00.0-[02-40]--+-04.0-[03-1e]----00.0-[04-1e]--+-08.0-[05]----00.0
 |                                           |                               +-09.0-[06]----00.0
 |                                           |                               +-0a.0-[07-0b]--
 |                                           |                               +-0b.0-[0c-10]--
 |                                           |                               +-10.0-[11]----00.0
 |                                           |                               +-11.0-[12]----00.0
 |                                           |                               +-12.0-[13-17]--
 |                                           |                               +-13.0-[18-1c]--
 |                                           |                               +-15.0-[1d]----00.0
 |                                           |                               \-16.0-[1e]----00.0
 |                                           +-05.0-[1f-36]----00.0-[20-36]--+-04.0-[21]----00.0
 |                                           |                               +-05.0-[22]----00.0
 |                                           |                               +-06.0-[23]----00.0
 |                                           |                               +-07.0-[24]----00.0
 |                                           |                               +-0c.0-[25-29]--
 |                                           |                               +-0d.0-[2a]----00.0
 |                                           |                               +-0e.0-[2b-2f]--
 |                                           |                               +-0f.0-[30]----00.0
 |                                           |                               +-14.0-[31-35]--
 |                                           |                               \-17.0-[36]----00.0
 |                                           +-06.0-[37-3b]--
 |                                           \-07.0-[3c-40]--

After:

-+-[0030:00]---00.0-[01-fe]----00.0-[02-fd]--+-04.0-[03-41]----00.0-[04-40]--+-08.0-[05-0a]----00.0
 |                                           |                               +-09.0-[0b-10]----00.0
 |                                           |                               +-0a.0-[11-16]--
 |                                           |                               +-0b.0-[17-1c]--
 |                                           |                               +-10.0-[1d-22]----00.0
 |                                           |                               +-11.0-[23-28]----00.0
 |                                           |                               +-12.0-[29-2e]--
 |                                           |                               +-13.0-[2f-34]--
 |                                           |                               +-15.0-[35-3a]----00.0
 |                                           |                               \-16.0-[3b-40]----00.0
 |                                           +-05.0-[42-80]----00.0-[43-7f]--+-04.0-[44-49]----00.0
 |                                           |                               +-05.0-[4a-4f]--
 |                                           |                               +-06.0-[50-55]----00.0
 |                                           |                               +-07.0-[56-5b]----00.0
 |                                           |                               +-0c.0-[5c-61]--
 |                                           |                               +-0d.0-[62-67]----00.0
 |                                           |                               +-0e.0-[68-6d]--
 |                                           |                               +-0f.0-[6e-73]----00.0
 |                                           |                               +-14.0-[74-79]--
 |                                           |                               \-17.0-[7a-7f]----00.0
 |                                           +-06.0-[81-bf]--
 |                                           \-07.0-[c0-fe]--

This does however have the disadvantage that we can't really support
deep rather than wide topologies. In the above example you can see that
when we hit the lowest level there is only 5 buses available per port,
so if we had an architecture where downstream storage was daisy-chained
we would run out bus numbers pretty quickly.

These are solvable problems, but I figure I should see what people think
before spending a lot of time on this.

Not-Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
Cc: Sergey Miroshnichenko <s.miroshnichenko at yadro.com>
---
Sergey, would something like this work for the NVMe drawers you've been
working with? I think we'll need to support bus-number reassignment at
eventually, but if we could kick that can down the road a bit it'd be
helpful.
---
 core/pci.c | 44 ++++++++++++++++----------------------------
 1 file changed, 16 insertions(+), 28 deletions(-)

diff --git a/core/pci.c b/core/pci.c
index 454b50102e59..d5537b6a376b 100644
--- a/core/pci.c
+++ b/core/pci.c
@@ -743,9 +743,10 @@ uint8_t pci_scan_bus(struct phb *phb, uint8_t bus, uint8_t max_bus,
 		     bool scan_downstream)
 {
 	struct pci_device *pd = NULL, *rc = NULL;
-	uint8_t dev, fn, next_bus, max_sub, save_max;
+	uint8_t dev, fn, next_bus, max_sub;
 	uint32_t scan_map;
 	bool use_max;
+	int bridges = 0, buses_per_bridge;
 
 	/* Decide what to scan  */
 	scan_map = parent ? parent->scan_map : phb->scan_map;
@@ -810,7 +811,18 @@ uint8_t pci_scan_bus(struct phb *phb, uint8_t bus, uint8_t max_bus,
 
 	next_bus = bus + 1;
 	max_sub = bus;
-	save_max = max_bus;
+
+	list_for_each(list, pd, link)
+		if (pd->is_bridge)
+			bridges++;
+
+	buses_per_bridge = max_bus - next_bus - 1;
+	if (bridges)
+		buses_per_bridge /= bridges;
+
+	PCIERR(phb, pd->bdfn, "found %d [%x:%x] downstream bridges, %sscanning down, %d\n",
+		bridges, next_bus, max_bus,  scan_downstream ? "" : "not ",
+		buses_per_bridge);
 
 	/* Scan down bridges */
 	list_for_each(list, pd, link) {
@@ -819,32 +831,8 @@ uint8_t pci_scan_bus(struct phb *phb, uint8_t bus, uint8_t max_bus,
 		if (!pd->is_bridge)
 			continue;
 
-		/* We need to figure out a new bus number to start from.
-		 *
-		 * This can be tricky due to our HW constraints which differ
-		 * from bridge to bridge so we are going to let the phb
-		 * driver decide what to do. This can return us a maximum
-		 * bus number to assign as well
-		 *
-		 * This function will:
-		 *
-		 *  - Return the bus number to use as secondary for the
-		 *    bridge or 0 for a failure
-		 *
-		 *  - "max_bus" will be adjusted to represent the max
-		 *    subordinate that can be associated with the downstream
-		 *    device
-		 *
-		 *  - "use_max" will be set to true if the returned max_bus
-		 *    *must* be used as the subordinate bus number of that
-		 *    bridge (when we need to give aligned powers of two's
-		 *    on P7IOC). If is is set to false, we just adjust the
-		 *    subordinate bus number based on what we probed.
-		 *
-		 */
-		max_bus = save_max;
-		next_bus = phb->ops->choose_bus(phb, pd, next_bus,
-						&max_bus, &use_max);
+		use_max = 1;
+		max_bus = next_bus + buses_per_bridge;
 
 		/* Configure the bridge with the returned values */
 		if (next_bus <= bus) {
-- 
2.20.1



More information about the Skiboot mailing list