[PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule

Tue Jun 17 22:00:21 EST 2014

> -----Original Message-----
> From: Alexander Graf [mailto:agraf at suse.de]
> Sent: Tuesday, June 17, 2014 12:09 PM
> To: Wood Scott-B07421
> Cc: Caraman Mihai Claudiu-B02008; kvm-ppc at vger.kernel.org;
> kvm at vger.kernel.org; linuxppc-dev at lists.ozlabs.org
> Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition
> on vcpu schedule
> 
> 
> On 13.06.14 21:42, Scott Wood wrote:
> > On Fri, 2014-06-13 at 16:55 +0200, Alexander Graf wrote:
> >> On 13.06.14 16:43, mihai.caraman at freescale.com wrote:
> >>>> -----Original Message-----
> >>>> From: Alexander Graf [mailto:agraf at suse.de]
> >>>> Sent: Thursday, June 12, 2014 8:05 PM
> >>>> To: Caraman Mihai Claudiu-B02008
> >>>> Cc: kvm-ppc at vger.kernel.org; kvm at vger.kernel.org; linuxppc-
> >>>> dev at lists.ozlabs.org; Wood Scott-B07421
> >>>> Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation
> condition
> >>>> on vcpu schedule
> >>>>
> >>>> On 06/12/2014 04:00 PM, Mihai Caraman wrote:
> >>>>> On vcpu schedule, the condition checked for tlb pollution is too
> tight.
> >>>>> The tlb entries of one vcpu are polluted when a different vcpu from
> the
> >>>>> same partition runs in-between. Relax the current tlb invalidation
> >>>>> condition taking into account the lpid.
> > Can you quantify the performance improvement from this?  We've had bugs
> > in this area before, so let's make sure it's worth it before making
> this
> > more complicated.
> >
> >>>>> Signed-off-by: Mihai Caraman <mihai.caraman <at> freescale.com>
> >>>> Your mailer is broken? :)
> >>>> This really should be an @.
> >>>>
> >>>> I think this should work. Scott, please ack.
> >>> Alex, you were right. I screwed up the patch description by inverting
> relax
> >>> and tight terms :) It should have been more like this:
> >>>
> >>> KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule
> >>>
> >>> On vcpu schedule, the condition checked for tlb pollution is too
> loose.
> >>> The tlb entries of a vcpu are polluted (vs stale) only when a
> different vcpu
> >>> within the same logical partition runs in-between. Optimize the tlb
> invalidation
> >>> condition taking into account the lpid.
> >> Can't we give every vcpu its own lpid? Or don't we trap on global
> >> invalidates?
> > That would significantly increase the odds of exhausting LPIDs,
> > especially on large chips like t4240 with similarly large VMs.  If we
> > were to do that, the LPIDs would need to be dynamically assigned (like
> > PIDs), and should probably be a separate numberspace per physical core.
> 
> True, I didn't realize we only have so few of them. It would however
> save us from most flushing as long as we have spare LPIDs available :).

Yes, we had this proposal on the table for e6500 multithreaded core. This
core lacks tlb write conditional instruction, so an OS needs to use locks
to protect itself against concurrent tlb writes executed from sibling threads.
When we expose hw treads as single-threaded vcpus (useful when the user opt
not to pin vcpus), the guest can't no longer protect itself optimally
(it can protect tlb writes across all threads but this is not acceptable).
So instead, we found a solution at hypervisor level by assigning different
logical partition ids to guest's vcpus running simultaneous on sibling hw
threads. Currently in FSL SDK we allocate two lpids to each guest.

I am also a proponent for using all LPID space (63 values) per (multi-threaded)
physical core, which will lead to fewer invalidates on vcpu schedule and will
accommodate the solution described above.

-Mike