[PATCH 0/2] Enable MSR_TM lazily

Nicholas Piggin npiggin at gmail.com
Wed Sep 14 22:12:22 AEST 2016


On Wed, 14 Sep 2016 21:46:39 +1000
Michael Neuling <mikey at neuling.org> wrote:

> On Wed, 2016-09-14 at 21:28 +1000, Nicholas Piggin wrote:
> > Cc'ing Carlos
> > 
> > On Wed, 14 Sep 2016 18:02:14 +1000
> > Cyril Bur <cyrilbur at gmail.com> wrote:
> >   
> > > 
> > > Currently the kernel checks to see if the hardware is transactional
> > > memory capable and always enables the MSR_TM bit. The problem with
> > > this is that the TM related SPRs become available to userspace,
> > > requiring them to be switched between processes. It turns out these
> > > SPRs are expensive to read and write and if a thread doesn't use TM
> > > (or worse yet isn't even TM aware) then context switching incurs this
> > > penalty for nothing.
> > > 
> > > The solution here is to leave the MSR_TM bit disabled and enable it
> > > more 'on demand'. Leaving MSR_TM disabled cause a thread to take a
> > > facility unavailable fault if and when it does decide to use TM. As
> > > with recent updates to the FPU, VMX and VSX units the MSR_TM bit will
> > > be enabled upon taking the fault and left on for some time afterwards
> > > as the assumption is that if a thread used TM ones it may well use it
> > > again. The kernel will turn the MSR_TM bit off after some number of
> > > context switches of that thread.
> > > 
> > > Performance numbers haven't been completely gathered as yet but early
> > > runs of tools/testing/selftests/powerpc/benchmarks/context_switch
> > > (which doesn't use TM) yields a jump from ~160000 switches per second
> > > to ~180000 switches per second with patch 3/3 applied.  
> > Cool!
> > 
> > Question: glibc when built with lock elision seems like it will
> > execute tabort. before every syscall, to work around old kernel
> > behaviour. That's always going to fault TM on, isn't it?  
> 
> I think we might be able to detect this case in the kernel. If it's a tabort
> that's trapped on, we can't have been transactional.  Hence we can safely PC+=4
> and leave off TM off. 
> 
> It would cost us a get_user(inst, regs->nip); but it might be worth it for this
> special but common case.

That would take an extra trap for every syscall, I think.


> > How common it is for glibc to be built with elision?  
> 
> IIRC Ubuntu uses it on 16.04 (and maybe 15.10).

Ah yes, but I was wrong: it also has to be linked against -lpthread
because it depends on r13 != 0. That's why I couldn't see it in my
trace. Now I do when using -lpthread. On 16.04.


> > We should probably be testing PPC_FEATURE2_HTM_NOSC to skip the
> > tabort.  
> 
> Agree, that would be idea. Binary patching glic at runtime.

That would be nice. Does glibc support binary patching? I'm not very
familiar with the code. Current syscall code ends up something like
this:

    cmpdi    13,0
    beq      1f
    lwz      0,TM_CAPABLE(13)
    cmpwi    0,0
    beq      1f
    li       11,_ABORT_SYSCALL
    tabort.  11
    .align 4
1:
    li 0,syscall
    sc

Without runtime patching, then if we had another variable that meant
we are TM capable *and* need to issue a tabort., then we can do the
same sequence without extra instructions. That might be the first
step.

Thanks,
Nick


More information about the Linuxppc-dev mailing list