[PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect

Mon Apr 14 19:01:28 EST 2014

On 14.04.14 10:08, liu ping fan wrote:
> On Mon, Apr 14, 2014 at 2:43 PM, Alexander Graf <agraf at suse.de> wrote:
>> On 13.04.14 04:27, Liu ping fan wrote:
>>> On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf at suse.de> wrote:
>>>> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank at linux.vnet.ibm.com>
>>>> wrote:
>>>>
>>>>> When we mark pte with _PAGE_NUMA we already call
>>>>> mmu_notifier_invalidate_range_start
>>>>> and mmu_notifier_invalidate_range_end, which will mark existing guest
>>>>> hpte
>>>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>>>> guest hpte entries.
>>>> What happens when we don't? Why do we need the check? Why isn't it done
>>>> implicitly? What happens when we treat a NUMA marked page as non-present?
>>>> Why does it work out for us?
>>>>
>>>> Assume you have no idea what PAGE_NUMA is, but try to figure out what
>>>> this patch does and whether you need to cherry-pick it into your downstream
>>>> kernel. The description as is still is not very helpful for that. It doesn't
>>>> even explain what really changes with this patch applied.
>>>>
>>> Yeah.  what about appending the following description?  Can it make
>>> the context clear?
>>> "Guest should not setup a hpte for the page whose pte is marked with
>>> _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
>>> to check whether the page is placed correctly or not."
>>
>> Try to come up with a text that answers the following questions in order:
>>
> I divide them into 3 groups, and answer them by 3 sections. Seems that
> it has the total story :)
> Please take a look.
>
>>    - What does _PAGE_NUMA mean?
> Group 1 -> section 2
>
>>    - How does page migration with _PAGE_NUMA work?
>>    -> Why should we not map pages when _PAGE_NUMA is set?
> Group 2 -> section 1
> (Note: for the 1st question in this group, I am not sure about the
> details, except that we can fix numa balancing by moving task or
> moving page.  So I comment as " migration should be involved to cut
> down the distance between the cpu and pages")
>
>>    - Which part of what needs to be done did the previous _PAGE_NUMA patch
>> address?
>>    - What's the situation without this patch?
>>    - Which scenario does this patch fix?
>>
> Group 3 -> section 3
>
>
> Numa fault is a method which help to achieve auto numa balancing.
> When such a page fault takes place, the page fault handler will check
> whether the page is placed correctly. If not, migration should be
> involved to cut down the distance between the cpu and pages.
>
> A pte with _PAGE_NUMA help to implement numa fault. It means not to
> allow the MMU to access the page directly. So a page fault is triggered
> and numa fault handler gets the opportunity to run checker.
>
> As for the access of MMU, we need special handling for the powernv's guest.
> When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
> invalidate it in guest's htab, but when we tried to re-insert them,
> we firstly try to fix it in real-mode. Only after this fails, we fallback
> to virt mode, and most of important, we run numa fault handler in virt
> mode.  This patch guards the way of real-mode to ensure that if a pte is
> marked with _PAGE_NUMA, it will NOT be fixed in real mode, instead, it will
> be fixed in virt mode and have the opportunity to be checked with placement.

s/fixed/mapped/g

Otherwise works as patch description for me :).

Alex