tlb flushing on Power

Tue Mar 6 04:56:33 EST 2012

Hey Ben,

Thanks for the help!  I was wondering if you could take a look at something
for me.

I've been working on this staging driver (zsmalloc memory allocator)
that does virtual mapping of two pages.

I have a github repo with the driver and the unsubmitted changes.  I'm
trying to make to get the pte/tlb stuff working in a portable way:

git://github.com/spartacus06/linux.git (portable branch)

The experimental commits are the top 5 and the branch is based on
Greg's staging-next + frontswap-v11 patches.

Could you take a look at the zs_map_object() and zs_unmap_object()
in drivers/staging/zsmalloc/zsmalloc-main.c and see if they should
work for PPC64?

I'm using set_pte_at() to map.  Then I'm using and ptep_get_and_clear()
and local_flush_tlb_kernel_page() to unmap (I #defined 
local_flush_tlb_kernel_page() to local_flush_tlb_page() for PPC64 which
is a no-op).

It will work most of the time, but then I'll get a crash and the 
mapped memory won't be what I expect.  I know the cause lies in the
virtual mapping because if I reduce the max_zspage_order to
1, preventing the "else" branch in zs_[un]map_object() from being
called, everything is stable.

Any feedback you (or others) can provide would be appreciated!

Thanks,
Seth

On 02/16/2012 02:31 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2012-02-16 at 11:11 -0600, Seth Jennings wrote:
> 
>> Just wanted to bump you again about this.  You mentioned that if I wanted to
>> do a cpu-local flush of a single tlb entry, that there would have to be a new
>> hook.  Is that right?
>>
>> I've been looking through the powerpc arch and I thought that I might have
>> found the power analog to __flush_tlb_one() for x86 in local_flush_tlb_page()
>> with a NULL vma argument.  This doesn't seem to work though, as indicated
>> by a crash when I tried it.  After looking in tlbflush.h again, it seems
>> that local_flush_tlb_page() is a no-op when CONFIG_PPC_STD_MMU_64 is set,
>> as are almost ALL of the tlb flushing functions... which makes no sense to
>> me.
>>
>> Any help you (or anyone) can give me would be greatly appreciated.
> 
> On ppc64 with hash-table MMU, we handle the flushes very differently.
> PTEs that are modified are added to a list at the time of the
> modification and either flushed immediately if no lazy tlb batching is
> in progress or flushed when leaving the lazy tlb op.
> 
> This is to avoid a problem where we might otherwise, under some
> circumstances, create a new TLB which can be hashed in to the hash table
> before the previous one has been flushed out. That would lead to a dup
> in the hash table which is architecturally illegal.
> 
> This happens via the call to hpte_need_flush() in pte_update().
> 
> Unfortunately, it will always consider all kernel mappings as global,
> so the per-cpu "optimization" won't be usable in this case, at least
> not until we add some kind of additional argument to that function.
> 
> Cheers,
> Ben.
> 
>