[PATCH v3] powerpc: kvm: make _PAGE_NUMA take effect

Mon Apr 14 18:08:51 EST 2014

On Mon, Apr 14, 2014 at 2:43 PM, Alexander Graf <agraf at suse.de> wrote:
>
> On 13.04.14 04:27, Liu ping fan wrote:
>>
>> On Fri, Apr 11, 2014 at 10:03 PM, Alexander Graf <agraf at suse.de> wrote:
>>>
>>> On 11.04.2014, at 13:45, Liu Ping Fan <pingfank at linux.vnet.ibm.com>
>>> wrote:
>>>
>>>> When we mark pte with _PAGE_NUMA we already call
>>>> mmu_notifier_invalidate_range_start
>>>> and mmu_notifier_invalidate_range_end, which will mark existing guest
>>>> hpte
>>>> entry as HPTE_V_ABSENT. Now we need to do that when we are inserting new
>>>> guest hpte entries.
>>>
>>> What happens when we don't? Why do we need the check? Why isn't it done
>>> implicitly? What happens when we treat a NUMA marked page as non-present?
>>> Why does it work out for us?
>>>
>>> Assume you have no idea what PAGE_NUMA is, but try to figure out what
>>> this patch does and whether you need to cherry-pick it into your downstream
>>> kernel. The description as is still is not very helpful for that. It doesn't
>>> even explain what really changes with this patch applied.
>>>
>> Yeah.  what about appending the following description?  Can it make
>> the context clear?
>> "Guest should not setup a hpte for the page whose pte is marked with
>> _PAGE_NUMA, so on the host, the numa-fault mechanism can take effect
>> to check whether the page is placed correctly or not."
>
>
> Try to come up with a text that answers the following questions in order:
>
I divide them into 3 groups, and answer them by 3 sections. Seems that
it has the total story :)
Please take a look.

>   - What does _PAGE_NUMA mean?
Group 1 -> section 2

>   - How does page migration with _PAGE_NUMA work?
>   -> Why should we not map pages when _PAGE_NUMA is set?
Group 2 -> section 1
(Note: for the 1st question in this group, I am not sure about the
details, except that we can fix numa balancing by moving task or
moving page.  So I comment as " migration should be involved to cut
down the distance between the cpu and pages")

>   - Which part of what needs to be done did the previous _PAGE_NUMA patch
> address?
>   - What's the situation without this patch?
>   - Which scenario does this patch fix?
>
Group 3 -> section 3

Numa fault is a method which help to achieve auto numa balancing.
When such a page fault takes place, the page fault handler will check
whether the page is placed correctly. If not, migration should be
involved to cut down the distance between the cpu and pages.

A pte with _PAGE_NUMA help to implement numa fault. It means not to
allow the MMU to access the page directly. So a page fault is triggered
and numa fault handler gets the opportunity to run checker.

As for the access of MMU, we need special handling for the powernv's guest.
When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
invalidate it in guest's htab, but when we tried to re-insert them,
we firstly try to fix it in real-mode. Only after this fails, we fallback
to virt mode, and most of important, we run numa fault handler in virt
mode.  This patch guards the way of real-mode to ensure that if a pte is
marked with _PAGE_NUMA, it will NOT be fixed in real mode, instead, it will
be fixed in virt mode and have the opportunity to be checked with placement.

Thx,
Fan

> Once you have a text that answers those, you should have a good patch
> description :).
>
> Alex
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html