[RFC PATCH v2 00/14] New TM Model

Breno Leitao leitao at debian.org
Wed Nov 7 06:31:55 AEDT 2018


hi Florian,

On 11/06/2018 04:32 PM, Florian Weimer wrote:
> * Breno Leitao:
> 
>> This  patchset for the hardware transactional memory (TM) subsystem
>> aims to avoid spending a lot of time on TM suspended mode in kernel
>> space.  It basically changes where the reclaim/recheckpoint will be
>> executed.
> 
> I assumed that we want to abort on every system call these days?
> 
> We have this commit in glibc:
> 
> commit f0458cf4f9ff3d870c43b624e6dccaaf657d5e83
> Author: Adhemerval Zanella <adhemerval.zanella at linaro.org>
> Date:   Mon Aug 27 09:42:50 2018 -0300
> 
>     powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSC
>     
>     Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls,
>     instead it suspend and resume it when leaving the kernel.  The
>     side-effects of the syscall will always remain visible, even if the
>     transaction is aborted.  This is an issue when transaction is used along
>     with futex syscall, on pthread_cond_wait for instance, where the futex
>     call might succeed but the transaction is rolled back leading the
>     pthread_cond object in an inconsistent state.
>     
>     Glibc used to prevent it by always aborting a transaction before issuing
>     a syscall.  Linux 4.2 also decided to abort active transaction in
>     syscalls which makes the glibc workaround superfluous.  Worse, glibc
>     transaction abortion leads to a performance issue on recent kernels
>     where the HTM state is saved/restore lazily (v4.9).  By aborting a
>     transaction on every syscalls, regardless whether a transaction has being
>     initiated before, GLIBS makes the kernel always save/restore HTM state
>     (it can not even lazily disable it after a certain number of syscall
>     iterations).
>     
>     Because of this shortcoming, Transactional Lock Elision is just enabled
>     when it has been explicitly set (either by tunables of by a configure
>     switch) and if kernel aborts HTM transactions on syscalls
>     (PPC_FEATURE2_HTM_NOSC).  It is reported that using simple benchmark [1],
>     the context-switch is about 5% faster by not issuing a tabort in every
>     syscall in newer kernels.
> 
> I wonder how the new TM model interacts with the assumption we currently
> have in glibc.

This new TM model is almost transparent to userspace. My patchset basically
affects where recheckpoint and reclaim happens inside kernel space, and
should not change userspace behavior.

I say "almost transparent" because it might cause some very specific
transactions to have a higher doom rate, see patch 14/14 for a more detailed
information, and also a reference for GLIBCs "tabort prior system calls"
behavior.

Regarding Adhemerval's patch, it is unaffected to this new model. Prior to
kernel 4.2, kernel was executing a syscall independently of the TM state,
which caused undesired side effect, thus GLIBC decision to abort a
transaction prior to calling a syscall.

Later, kernel system call mechanism was aware of the TM state, and this GLIBC
workaround was not necessary anymore.

More than that, this workaround started to cause  performance degradation on
context switches, mainly when TM facility became lazy enabled, i.e, the TM
facility mechanism would be enabled on demand (a task uses TM explicitly).
This happens because this "abort prior to every system call" workaround
started to trigger the TM facility to be enabled for every task that calls
system calls.

In fact, I was the one that identified this performance degradation issue,
and reported to Adhemerval who kindly fixed it with
f0458cf4f9ff3d870c43b624e6dccaaf657d5e83.

Anyway, I think we are safe here.

Thanks for bringing this up.
Breno









More information about the Linuxppc-dev mailing list