[PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux

Fri Apr 3 10:11:50 AEDT 2015

On Fri, 2015-03-27 at 19:07 +0200, Purcareata Bogdan wrote:
> On 27.02.2015 03:05, Scott Wood wrote:
> > On Thu, 2015-02-26 at 14:31 +0100, Sebastian Andrzej Siewior wrote:
> >> On 02/26/2015 02:02 PM, Paolo Bonzini wrote:
> >>>
> >>>
> >>> On 24/02/2015 00:27, Scott Wood wrote:
> >>>> This isn't a host PIC driver.  It's guest PIC emulation, some of which
> >>>> is indeed not suitable for a rawlock (in particular, openpic_update_irq
> >>>> which loops on the number of vcpus, with a loop body that calls
> >>>> IRQ_check() which loops over all pending IRQs).
> >>>
> >>> The question is what behavior is wanted of code that isn't quite
> >>> RT-ready.  What is preferred, bugs or bad latency?
> >>>
> >>> If the answer is bad latency (which can be avoided simply by not running
> >>> KVM on a RT kernel in production), patch 1 can be applied.  If the
> >> can be applied *but* makes no difference if applied or not.
> >>
> >>> answer is bugs, patch 1 is not upstream material.
> >>>
> >>> I myself prefer to have bad latency; if something takes a spinlock in
> >>> atomic context, that spinlock should be raw.  If it hurts (latency),
> >>> don't do it (use the affected code).
> >>
> >> The problem, that is fixed by this s/spin_lock/raw_spin_lock/, exists
> >> only in -RT. There is no change upstream. In general we fix such things
> >> in -RT first and forward the patches upstream if possible. This convert
> >> thingy would be possible.
> >> Bug fixing comes before latency no matter if RT or not. Converting
> >> every lock into a rawlock is not always the answer.
> >> Last thing I read from Scott is that he is not entirely sure if this is
> >> the right approach or not and patch #1 was not acked-by him either.
> >>
> >> So for now I wait for Scott's feedback and maybe a backtrace :)
> >
> > Obviously leaving it in a buggy state is not what we want -- but I lean
> > towards a short term "fix" of putting "depends on !PREEMPT_RT" on the
> > in-kernel MPIC emulation (which is itself just an optimization -- you
> > can still use KVM without it).  This way people don't enable it with RT
> > without being aware of the issue, and there's more of an incentive to
> > fix it properly.
> >
> > I'll let Bogdan supply the backtrace.
> 
> So about the backtrace. Wasn't really sure how to "catch" this, so what 
> I did was to start a 24 VCPUs guest on a 24 CPU board, and in the guest 
> run 24 netperf flows with an external back to back board of the same 
> kind. I assumed this would provide the sufficient VCPUs and external 
> interrupt to expose an alleged culprit.
> 
> With regards to measuring the latency, I thought of using ftrace, 
> specifically the preemptirqsoff latency histogram. Unfortunately, I 
> wasn't able to capture any major differences between running a guest 
> with in-kernel MPIC emulation (with the openpic raw_spinlock_conversion 
> applied) vs. no in-kernel MPIC emulation. Function profiling 
> (trace_stat) shows that in the second case there's a far greater time 
> spent in kvm_handle_exit (100x), but overall, the maximum latencies for 
> preemptirqsoff don't look that much different.
> 
> Here are the max numbers (preemptirqsoff) for the 24 CPUs, on the host 
> RT Linux, sorted in descending order, expressed in microseconds:
> 
> In-kernel MPIC		QEMU MPIC
> 3975			5105

What are you measuring?  Latency in the host, or in the guest?

-Scott