[PATCH] powerpc/vdso64: inline __get_datapage()
Nathan Lynch
nathanl at linux.ibm.com
Thu Aug 22 01:58:57 AEST 2019
Christophe Leroy <christophe.leroy at c-s.fr> writes:
> Le 21/08/2019 à 11:29, Santosh Sivaraj a écrit :
>> __get_datapage() is only a few instructions to retrieve the
>> address of the page where the kernel stores data to the VDSO.
>>
>> By inlining this function into its users, a bl/blr pair and
>> a mflr/mtlr pair is avoided, plus a few reg moves.
>>
>> clock-gettime-monotonic: syscall: 514 nsec/call 396 nsec/call
>> clock-gettime-monotonic: libc: 25 nsec/call 24 nsec/call
>> clock-gettime-monotonic: vdso: 20 nsec/call 20 nsec/call
>> clock-getres-monotonic: syscall: 347 nsec/call 372 nsec/call
>> clock-getres-monotonic: libc: 19 nsec/call 19 nsec/call
>> clock-getres-monotonic: vdso: 10 nsec/call 10 nsec/call
>> clock-gettime-monotonic-coarse: syscall: 511 nsec/call 396 nsec/call
>> clock-gettime-monotonic-coarse: libc: 23 nsec/call 21 nsec/call
>> clock-gettime-monotonic-coarse: vdso: 15 nsec/call 13 nsec/call
>> clock-gettime-realtime: syscall: 526 nsec/call 405 nsec/call
>> clock-gettime-realtime: libc: 24 nsec/call 23 nsec/call
>> clock-gettime-realtime: vdso: 18 nsec/call 18 nsec/call
>> clock-getres-realtime: syscall: 342 nsec/call 372 nsec/call
>> clock-getres-realtime: libc: 19 nsec/call 19 nsec/call
>> clock-getres-realtime: vdso: 10 nsec/call 10 nsec/call
>> clock-gettime-realtime-coarse: syscall: 515 nsec/call 373 nsec/call
>> clock-gettime-realtime-coarse: libc: 23 nsec/call 22 nsec/call
>> clock-gettime-realtime-coarse: vdso: 14 nsec/call 13 nsec/call
>
> I think you should only put the measurements on vdso calls, and only the
> ones that are impacted by the change. For exemple, getres function
> doesn't use __get_datapage so showing it here is pointless.
I agree with this point, but also, I would caution against using
vdsotest's benchmark function for anything like rigorous performance
analysis. The intention was to roughly confirm the VDSO's relative
performance vs the in-kernel implementations. Not to compare one VDSO
implementation of (say) clock_gettime to another.
I suggest using perf to confirm the expected effects of the change, if
possible.
More information about the Linuxppc-dev
mailing list