[Skiboot] [PATCH 1/2] npu2: Rework phb-index assignments for virtual PHBs

Fri Aug 2 18:10:00 AEST 2019

On Thu, 2019-08-01 at 10:54 +0200, Frederic Barrat wrote:
> Until now, opencapi PHBs were not using the 'ibm,phb-index' property,
> as it was thought unnecessary. For nvlink, a phb-index was associated
> to the npu when parsing hdat data, and the nvlink PHB was reusing the
> same value.
> 
> It turns out it helps to have the 'ibm,phb-index' property for
> opencapi PHBs after all. Otherwise it can lead to wrong results on
> platforms like mihawk when trying to match entries in the slot
> table. We end up with an opencapi device inheriting wrong properties
> in the device tree, because match_slot_phb_entry() default to
> phb-index 0 if it cannot find the property. Though it doesn't seem to
> cause any harm, it's wrong and a future patch is expected to start
> using the slot table for opencapi, so it needs fixing.

Yeah, that sounds broken.

> The twist is that with opencapi, we can have multiple virtual PHBs for
> a single NPU, at least on P9. Therefore there's no 1-to-1 mapping
> between the NPU and PHB index. It doesn't really make sense to
> associate a phb-index to a npu.

This isn't all that different to how the actual PHBs work. The PCI
equivilent of an NPU would be a PEC (PowerBus<->PHB glue) and PECs can
support multiple PHBs behind them. For P9 the structure is:

PEC0:
	PHB0
PEC1:
	PHB1
	PHB2
PEC3:
	PHB3
	PHB4
	PHB5

We use the phb-index for normal PHBs to calculate what the OPAL PHB ID
(linux domain number) should be for that PHB. The virtual PHBs will
register the first number available rather than claiming a fixed number
based on the phb-index and I think this is a problem.

Currently we use fixed numbers for real PHBs based on this:

> #define PHB4_PER_CHIP 6 /* Max 6 PHBs per chipon p9 */
> 
> static inline int phb4_get_opal_id(unsigned int chip_id, unsigned int > index)
> {
>         return chip_id * PHB4_PER_CHIP + index;
> }

and:

> pci_register_phb(&p->phb, phb4_get_opal_id(p->chip_id, p->index));

While the virtual PHBs use:

pci_register_phb(&npu->phb_nvlink, OPAL_DYNAMIC_PHB_ID);

So the virtual PHBs take the first PHB ID that's free. On most systems
we use chip-as-node mode so the 2nd chip has chip ID 8 and the virtual
PHBs use PHB IDs 7 or 8.

Now, if we want to use fixed phb-index values for the virtual PHBs then
I'm all for it, but right now we don't have room to add a set of
virtual PHBs to each chip without changing the PCI domain number for
all the real PHBs. Changing the domain number also changes the systemd
persistent device naming of network adapters so I don't think we can
change any of this for currently GAed platforms, but it's something I'd
like to sort out for Axone.

If we're going to have a fixed phb-index for the virtual PHBs too, I
think you need to work out how that's going to map to a fixed ibm,opal-
phbid too.