[RFC] Efficiency of the phandle_cache on ppc64/SLOF
Segher Boessenkool
segher at kernel.crashing.org
Wed Dec 4 05:35:31 AEDT 2019
Hi!
On Tue, Dec 03, 2019 at 03:03:22PM +1100, Michael Ellerman wrote:
> Sebastian Andrzej Siewior <bigeasy at linutronix.de> writes:
> I've certainly heard it said that on some OF's the phandle was just ==
> the address of the internal representation, and I guess maybe for SLOF
> that is true.
It is (or was). In many OFs it is just the effective address of some
node structure. SLOF runs with translation off normally.
> They seem to vary wildly though, eg. on an Apple G5:
Apple OF runs with translation on usually. IIRC these are effective
addresses as well.
The OF they have on G5 machines is mostly 32-bit, for compatibility is my
guess (for userland things dealing with addresses from OF, importantly).
> $ find /proc/device-tree/ -name phandle | xargs lsprop | head -10
> /proc/device-tree/vsp at 0,f9000000/veo at f9180000/phandle ff970848
> /proc/device-tree/vsp at 0,f9000000/phandle ff970360
> /proc/device-tree/vsp at 0,f9000000/veo at f9080000/phandle ff970730
> /proc/device-tree/nvram at 0,fff04000/phandle ff967fb8
> /proc/device-tree/xmodem/phandle ff9655e8
> /proc/device-tree/multiboot/phandle ff9504f0
> /proc/device-tree/diagnostics/phandle ff965550
> /proc/device-tree/options/phandle ff893cf0
> /proc/device-tree/openprom/client-services/phandle ff8925b8
> /proc/device-tree/openprom/phandle ff892458
>
> That machine does not have enough RAM for those to be 32-bit real
> addresses. I think Apple OF is running in virtual mode though (?), so
> maybe they are pointers?
Yes, I think the default is to have 8MB ram at the top of 4GB (which is
the physical address of the bootrom, btw) for OF.
> And on an IBM pseries machine they're a bit all over the place:
>
> /proc/device-tree/cpus/PowerPC,POWER8 at 40/ibm,phandle 10000040
> /proc/device-tree/cpus/l2-cache at 2005/ibm,phandle 00002005
> /proc/device-tree/cpus/PowerPC,POWER8 at 30/ibm,phandle 10000030
> /proc/device-tree/cpus/PowerPC,POWER8 at 20/ibm,phandle 10000020
> /proc/device-tree/cpus/PowerPC,POWER8 at 10/ibm,phandle 10000010
> /proc/device-tree/cpus/l2-cache at 2003/ibm,phandle 00002003
> /proc/device-tree/cpus/l2-cache at 200a/ibm,phandle 0000200a
> /proc/device-tree/cpus/l3-cache at 3108/ibm,phandle 00003108
> /proc/device-tree/cpus/l2-cache at 2001/ibm,phandle 00002001
> /proc/device-tree/cpus/l3-cache at 3106/ibm,phandle 00003106
> /proc/device-tree/cpus/ibm,phandle fffffff8
> /proc/device-tree/cpus/l3-cache at 3104/ibm,phandle 00003104
> /proc/device-tree/cpus/l2-cache at 2008/ibm,phandle 00002008
> /proc/device-tree/cpus/l3-cache at 3102/ibm,phandle 00003102
> /proc/device-tree/cpus/l2-cache at 2006/ibm,phandle 00002006
> /proc/device-tree/cpus/l3-cache at 3100/ibm,phandle 00003100
> /proc/device-tree/cpus/PowerPC,POWER8 at 8/ibm,phandle 10000008
> /proc/device-tree/cpus/l2-cache at 2004/ibm,phandle 00002004
> /proc/device-tree/cpus/PowerPC,POWER8 at 48/ibm,phandle 10000048
> /proc/device-tree/cpus/PowerPC,POWER8 at 38/ibm,phandle 10000038
> /proc/device-tree/cpus/l2-cache at 2002/ibm,phandle 00002002
> /proc/device-tree/cpus/PowerPC,POWER8 at 28/ibm,phandle 10000028
> /proc/device-tree/cpus/l3-cache at 3107/ibm,phandle 00003107
> /proc/device-tree/cpus/PowerPC,POWER8 at 18/ibm,phandle 10000018
> /proc/device-tree/cpus/l2-cache at 2000/ibm,phandle 00002000
> /proc/device-tree/cpus/l3-cache at 3105/ibm,phandle 00003105
> /proc/device-tree/cpus/l3-cache at 3103/ibm,phandle 00003103
> /proc/device-tree/cpus/l3-cache at 310a/ibm,phandle 0000310a
> /proc/device-tree/cpus/PowerPC,POWER8 at 0/ibm,phandle 10000000
> /proc/device-tree/cpus/l2-cache at 2007/ibm,phandle 00002007
> /proc/device-tree/cpus/l3-cache at 3101/ibm,phandle 00003101
> /proc/device-tree/pci at 80000002000001b/ibm,phandle 2000001b
Some (the 1000xxxx) look like addresses as well.
> > So the hash array has 64 entries out which only 8 are populated. Using
> > hash_32() populates 29 entries.
> On the G5 it's similarly inefficient:
> [ 0.007379] OF: of_populate_phandle_cache(242) Used entries: 31, hashed: 111
> And some output from a "real" pseries machine (IBM OF), which is
> slightly better:
> [ 0.129467] OF: of_populate_phandle_cache(242) Used entries: 39, hashed: 81
> So yeah using hash_32() is quite a bit better in both cases.
Yup, no surprise there. And hash_32 is very cheap to compute.
> And if I'm reading your patch right it would be a single line change to
> switch, so that seems like it's worth doing to me.
Agreed!
Btw. Some OFs mangle the phandles some way, to make it easier to catch
people using it as an address (and similarly, mangle ihandles differently,
so you catch confusion between ihandles and phandles as well). Like a
simple xor, with some odd number preferably. You should assume *nothing*
about phandles, they are opaque identifiers.
Segher
More information about the Linuxppc-dev
mailing list