[PATCH] powerpc/kernel: Avoid redundancies on giveup_all
Breno Leitao
leitao at debian.org
Sat Jun 24 03:51:44 AEST 2017
Hi Cyril,
On Fri, Jun 23, 2017 at 04:03:12PM +1000, Cyril Bur wrote:
> On Thu, 2017-06-22 at 17:27 -0300, Breno Leitao wrote:
> > Currently giveup_all() calls __giveup_fpu(), __giveup_altivec(), and
> > __giveup_vsx(). But __giveup_vsx() also calls __giveup_fpu() and
> > __giveup_altivec() again, in a redudant manner.
> >
> > Other than giving up FP and Altivec, __giveup_vsx() also disables
> > MSR_VSX on MSR, but this is already done by __giveup_{fpu,altivec}().
> > As VSX can not be enabled alone on MSR (without FP and/or VEC
> > enabled), this is also a redundancy. VSX is never enabled alone (without
> > FP and VEC) because every time VSX is enabled, as through load_up_vsx()
> > and restore_math(), FP is also enabled together.
> >
> > This change improves giveup_all() in average just 3%, but since
> > giveup_all() is called very frequently, around 8x per CPU per second on
> > an idle machine, this change might show some noticeable improvement.
> >
>
> So I totally agree except this makes me quite nervous. I know we're
> quite good at always disabling VSX when we disable FPU and ALTIVEC and
> we do always turn VSX on when we enable FPU AND ALTIVEC. But still, if
> we ever get that wrong...
Right, I understand your point, we can consider this code as a
'fallback' if we, somehow, forget to disable VSX when disabling
FPU/ALTIVEC. Good point.
> I'm more interested in how this improves giveup_all() performance by so
> much, but then hardware often surprises - I guess that's the cost of a
> function call.
I got this number using ftrace. I used the 'funcgraph' tracer with the
trace_options set to 'funcgraph-duration'. Then I set set_ftrace_filter with
giveup_all().
There is also a tool that helps with it if you wish. It uses the
exactly same mechanism I used but in a more automated way. The tool name
is funcgraph by Brendan.
https://github.com/brendangregg/perf-tools/blob/master/kernel/funcgraph
> Perhaps caching the thread.regs->msr isn't a good idea.
Yes, I looked at it, but it seems that the compiler is optimizing it, keeping
it at r30, and not saving in the memory/stack. This is the code being generated
here, where r9 contains the task pointer.
usermsr = tsk->thread.regs->msr;
c0000000000199c4: 08 01 c9 eb ld r30,264(r9)
if ((usermsr & msr_all_available) == 0)
c0000000000199c8: 60 5f 2a e9 ld r9,24416(r10)
c0000000000199cc: 39 48 ca 7f and. r10,r30,r9
c0000000000199d0: 20 00 82 40 bne c0000000000199f0 <giveup_all+0x60>
> If we could
> branch over in the common case and but still have the call to the
> function in case something goes horribly wrong?
Yes, we can revisit it on a future opportunity. Thanks for sharing your opinion.
More information about the Linuxppc-dev
mailing list