[PATCH 1/4] powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks

Sat May 1 11:22:07 AEST 2021

Excerpts from Naveen N. Rao's message of April 27, 2021 11:43 pm:
> Nicholas Piggin wrote:
>> The paravit queued spinlock slow path adds itself to the queue then
>> calls pv_wait to wait for the lock to become free. This is implemented
>> by calling H_CONFER to donate cycles.
>> 
>> When hcall tracing is enabled, this H_CONFER call can lead to a spin
>> lock being taken in the tracing code, which will result in the lock to
>> be taken again, which will also go to the slow path because it queues
>> behind itself and so won't ever make progress.
>> 
>> An example trace of a deadlock:
>> 
>>   __pv_queued_spin_lock_slowpath
>>   trace_clock_global
>>   ring_buffer_lock_reserve
>>   trace_event_buffer_lock_reserve
>>   trace_event_buffer_reserve
>>   trace_event_raw_event_hcall_exit
>>   __trace_hcall_exit
>>   plpar_hcall_norets_trace
>>   __pv_queued_spin_lock_slowpath
>>   trace_clock_global
>>   ring_buffer_lock_reserve
>>   trace_event_buffer_lock_reserve
>>   trace_event_buffer_reserve
>>   trace_event_raw_event_rcu_dyntick
>>   rcu_irq_exit
>>   irq_exit
>>   __do_irq
>>   call_do_irq
>>   do_IRQ
>>   hardware_interrupt_common_virt
>> 
>> Fix this by introducing plpar_hcall_norets_notrace(), and using that to
>> make SPLPAR virtual processor dispatching hcalls by the paravirt
>> spinlock code.
>> 
>> Fixes: 20c0e8269e9d ("powerpc/pseries: Implement paravirt qspinlocks for SPLPAR")
>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> ---
>>  arch/powerpc/include/asm/hvcall.h       |  3 +++
>>  arch/powerpc/include/asm/paravirt.h     | 22 +++++++++++++++++++---
>>  arch/powerpc/platforms/pseries/hvCall.S | 10 ++++++++++
>>  arch/powerpc/platforms/pseries/lpar.c   |  4 ++--
>>  4 files changed, 34 insertions(+), 5 deletions(-)
> 
> Thanks for the fix! Some very minor nits below, but none the less:
> Reviewed-by: Naveen N. Rao <naveen.n.rao at linux.vnet.ibm.com>
> 
>> 
>> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
>> index ed6086d57b22..0c92b01a3c3c 100644
>> --- a/arch/powerpc/include/asm/hvcall.h
>> +++ b/arch/powerpc/include/asm/hvcall.h
>> @@ -446,6 +446,9 @@
>>   */
>>  long plpar_hcall_norets(unsigned long opcode, ...);
>> 
>> +/* Variant which does not do hcall tracing */
>> +long plpar_hcall_norets_notrace(unsigned long opcode, ...);
>> +
>>  /**
>>   * plpar_hcall: - Make a pseries hypervisor call
>>   * @opcode: The hypervisor call to make.
>> diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h
>> index 5d1726bb28e7..3c13c2ec70a9 100644
>> --- a/arch/powerpc/include/asm/paravirt.h
>> +++ b/arch/powerpc/include/asm/paravirt.h
>> @@ -30,17 +30,33 @@ static inline u32 yield_count_of(int cpu)
>> 
>>  static inline void yield_to_preempted(int cpu, u32 yield_count)
>>  {
> 
> It looks like yield_to_preempted() is only used by simple spin locks 
> today. I wonder if it makes more sense to put the below comment in 
> yield_to_any() which is used by the qspinlock code.

Yeah, I just put it above the functions entirely because it refers to 
all of them.

> 
>> -	plpar_hcall_norets(H_CONFER, get_hard_smp_processor_id(cpu), yield_count);
>> +	/*
>> +	 * Spinlock code yields and prods, so don't trace the hcalls because
>> +	 * tracing code takes spinlocks which could recurse.
>> +	 *
>> +	 * These calls are made while the lock is not held, the lock slowpath
>> +	 * yields if it can not acquire the lock, and unlock slow path might
>> +	 * prod if a waiter has yielded). So this did not seem to be a problem
>> +	 * for simple spin locks because technically it didn't recuse on the
> 							       ^^^^^^
> 							       recurse
> 
>> +	 * lock. However the queued spin lock contended path is more strictly
>> +	 * ordered: the H_CONFER hcall is made after the task has queued itself
>> +	 * on the lock, so then recursing on the lock will queue up behind that
>> +	 * (or worse: queued spinlocks uses tricks that assume a context never
>> +	 * waits on more than one spinlock, so that may cause random
>> +	 * corruption).
>> +	 */
>> +	plpar_hcall_norets_notrace(H_CONFER,
>> +				   get_hard_smp_processor_id(cpu), yield_count);
> 
> This can all be on a single line.

Should it though? Linux in general allegedly changed to 100 column 
lines for checkpatch, but it seems to still be frowned upon to go
beyond 80 deliberately. What about arch/powerpc?

Thanks,
Nick