[PATCH 0/4] powernv: kvm: numa fault improvement

Liu ping fan kernelfans at gmail.com
Tue Jan 21 13:30:28 EST 2014

On Mon, Jan 20, 2014 at 11:45 PM, Aneesh Kumar K.V
<aneesh.kumar at linux.vnet.ibm.com> wrote:
> Liu ping fan <kernelfans at gmail.com> writes:
>> On Thu, Jan 9, 2014 at 8:08 PM, Alexander Graf <agraf at suse.de> wrote:
>>> On 11.12.2013, at 09:47, Liu Ping Fan <kernelfans at gmail.com> wrote:
>>>> This series is based on Aneesh's series  "[PATCH -V2 0/5] powerpc: mm: Numa faults support for ppc64"
>>>> For this series, I apply the same idea from the previous thread "[PATCH 0/3] optimize for powerpc _PAGE_NUMA"
>>>> (for which, I still try to get a machine to show nums)
>>>> But for this series, I think that I have a good justification -- the fact of heavy cost when switching context between guest and host,
>>>> which is  well known.
>>> This cover letter isn't really telling me anything. Please put a proper description of what you're trying to achieve, why you're trying to achieve what you're trying and convince your readers that it's a good idea to do it the way you do it.
>> Sorry for the unclear message. After introducing the _PAGE_NUMA,
>> kvmppc_do_h_enter() can not fill up the hpte for guest. Instead, it
>> should rely on host's kvmppc_book3s_hv_page_fault() to call
>> do_numa_page() to do the numa fault check. This incurs the overhead
>> when exiting from rmode to vmode.  My idea is that in
>> kvmppc_do_h_enter(), we do a quick check, if the page is right placed,
>> there is no need to exit to vmode (i.e saving htab, slab switching)
> Can you explain more. Are we looking at hcall from guest  and
> hypervisor handling them in real mode ? If so why would guest issue a
> hcall on a pte entry that have PAGE_NUMA set. Or is this about
> hypervisor handling a missing hpte, because of host swapping this page
> out ? In that case how we end up in h_enter ? IIUC for that case we
> should get to kvmppc_hpte_hv_fault.
After setting _PAGE_NUMA, we should flush out all hptes both in host's
htab and guest's. So when guest tries to access memory, host finds
that there is not hpte ready for guest in guest's htab. And host
should raise dsi to guest. This incurs that guest ends up in h_enter.
And you can see in current code, we also try this quick path firstly.
Only if fail, we will resort to slow path --  kvmppc_hpte_hv_fault.

Thanks and regards,
>>>> If my suppose is correct, will CCing kvm at vger.kernel.org from next version.
>>> This translates to me as "This is an RFC"?
>> Yes, I am not quite sure about it. I have no bare-metal to verify it.
>> So I hope at least, from the theory, it is correct.
> -aneesh

More information about the Linuxppc-dev mailing list