[Skiboot] [PATCH 1/2] npu2: Rework phb-index assignments for virtual PHBs

Frederic Barrat fbarrat at linux.ibm.com
Fri Aug 2 23:37:02 AEST 2019



Le 02/08/2019 à 10:10, Oliver O'Halloran a écrit :
> On Thu, 2019-08-01 at 10:54 +0200, Frederic Barrat wrote:
>> Until now, opencapi PHBs were not using the 'ibm,phb-index' property,
>> as it was thought unnecessary. For nvlink, a phb-index was associated
>> to the npu when parsing hdat data, and the nvlink PHB was reusing the
>> same value.
>>
>> It turns out it helps to have the 'ibm,phb-index' property for
>> opencapi PHBs after all. Otherwise it can lead to wrong results on
>> platforms like mihawk when trying to match entries in the slot
>> table. We end up with an opencapi device inheriting wrong properties
>> in the device tree, because match_slot_phb_entry() default to
>> phb-index 0 if it cannot find the property. Though it doesn't seem to
>> cause any harm, it's wrong and a future patch is expected to start
>> using the slot table for opencapi, so it needs fixing.
> 
> Yeah, that sounds broken.
> 
>> The twist is that with opencapi, we can have multiple virtual PHBs for
>> a single NPU, at least on P9. Therefore there's no 1-to-1 mapping
>> between the NPU and PHB index. It doesn't really make sense to
>> associate a phb-index to a npu.
> 
> This isn't all that different to how the actual PHBs work. The PCI
> equivilent of an NPU would be a PEC (PowerBus<->PHB glue) and PECs can
> support multiple PHBs behind them. For P9 the structure is:
> 
> PEC0:
> 	PHB0
> PEC1:
> 	PHB1
> 	PHB2
> PEC3:
> 	PHB3
> 	PHB4
> 	PHB5


Yes, what we do for opencapi deviates from nvlink (1 PHB per link as 
opposed to 1 per NPU). That's unfortunate and confusing but 1) we had to 
because of hardware considerations related to BAR assignments and 2) the 
opencapi layout is the similar to real PCIs as you describe above, so we 
didn't feel too bad about it.


> We use the phb-index for normal PHBs to calculate what the OPAL PHB ID
> (linux domain number) should be for that PHB. The virtual PHBs will
> register the first number available rather than claiming a fixed number
> based on the phb-index and I think this is a problem.
> 
> 
> Currently we use fixed numbers for real PHBs based on this:
> 
>> #define PHB4_PER_CHIP 6 /* Max 6 PHBs per chipon p9 */
>>
>> static inline int phb4_get_opal_id(unsigned int chip_id, unsigned int > index)
>> {
>>          return chip_id * PHB4_PER_CHIP + index;
>> }
> 
> and:
> 
>> pci_register_phb(&p->phb, phb4_get_opal_id(p->chip_id, p->index));
> 
> While the virtual PHBs use:
> 
> pci_register_phb(&npu->phb_nvlink, OPAL_DYNAMIC_PHB_ID);
> 
> So the virtual PHBs take the first PHB ID that's free. On most systems
> we use chip-as-node mode so the 2nd chip has chip ID 8 and the virtual
> PHBs use PHB IDs 7 or 8.
> 
> Now, if we want to use fixed phb-index values for the virtual PHBs then
> I'm all for it, but right now we don't have room to add a set of
> virtual PHBs to each chip without changing the PCI domain number for
> all the real PHBs. Changing the domain number also changes the systemd
> persistent device naming of network adapters so I don't think we can
> change any of this for currently GAed platforms, but it's something I'd
> like to sort out for Axone.
> 
> If we're going to have a fixed phb-index for the virtual PHBs too, I
> think you need to work out how that's going to map to a fixed ibm,opal-
> phbid too.


I agree we should shoot for fixed phb-index and opal-phbid, as it makes 
things simpler. It's also a bit troubling to see those virtual PHBs with 
low opal IDs, when they are related to devices plugged to the 2nd chip.

For Axone, it seems doable: 6 PHBs + 3 NPUs per chip (I'm not aware of 
PCI changes, if any, for Axone. Is it still 3 PECs and 6 PHBs per 
chip?). In the recently merged npu3 patches, we have a 1-to-1 mapping 
between NPU and PHB for nvlink, and the upcoming opencapi patches will 
keep it that way.

For P9, it becomes ugly really fast if we want to preserve compatibility 
with current phb/domain IDs seen by userland (and we should). At the 
same time, while desirable, I don't think we really *have to* have fixed 
phbid either. On a given platform, the dynamic IDs end up always being 
the same anyway.
So if I do the right thing for Axone, but keep things as is (with this 
patch) on P9, would that fly?

   Fred





More information about the Skiboot mailing list