Understanding how kernel updates MMU hash table

Thu Dec 6 04:14:23 EST 2012

Hi Ben.

Thanks for your input. Please find my comments inline.

Benjamin Herrenschmidt wrote:
> 
> On Tue, 2012-12-04 at 21:56 -0800, Pegasus11 wrote:
>> Hello.
>> 
>> Ive been trying to understand how an hash PTE is updated. Im on a
>> PPC970MP
>> machine which using the IBM PowerPC 604e core. 
> 
> Ben: Ah no, the 970 is a ... 970 core :-) It's a derivative of POWER4+
> which
> is quite different from the old 32-bit 604e.
> 
> Peg: So the 970 is a 64bit core whereas the 604e is a 32 bit core. The
> former is used in the embedded segment whereas the latter for server
> market right?
> 
>> My Linux version is 2.6.10 (I
>> am sorry I cannot migrate at the moment. Management issues and I can't
>> help
>> :-(( )
>> 
>> Now onto the problem:
>> hpte_update is invoked to sync the on-chip MMU cache which Linux uses as
>> its
>> TLB.
> 
> Ben: It's actually in-memory cache. There's also an on-chip TLB.
> Peg: An in-memory cache of what? You mean the kernel caches the PTEs in
> its own software cache as well? And is this cache not related in anyway to
> the on-chip TLB? If that is indeed the case, then ive read a paper on some
> of the MMU tricks for the PPC by court dougan which says Linux uses (or
> perhaps used to when he wrote that) the MMU hardware cache as the hardware
> TLB. What is that all about? Its called : Optimizing the Idle Task and
> Other MMU Tricks - Usenix
> 
>>  So whenever a change is made to the PTE, it has to be propagated to the
>> corresponding TLB entry. And this uses hpte_update for the same. Am I
>> right
>> here?
> 
> Ben: hpte_update takes care of tracking whether a Linux PTE was also
> cached
> into the hash, in which case the hash is marked for invalidation. I
> don't remember precisely how we did it in 2.6.10 but it's possible that
> the actual invalidation of the hash and the corresponding TLB
> invalidations are delayed.
> Peg: But in 2.6.10, Ive seen the code first check for the existence of the
> HASHPTE flag in a given PTE and if it exists, only then is this
> hpte_update function being called. Could you for the love of tux elaborate
> a bit on how the hash and the underlying TLB entries are related? I'll
> then try to see how it was done back then..since it would probably be
> quite similar at least conceptually (if I am lucky :jumping:)
> 
>> Now  http://lxr.linux.no/linux-bk+*/+code=hpte_update hpte_update  is
>> declared as
>>  
>> ' void hpte_update(pte_t *ptep, unsigned long pte, int wrprot) '. 
>> The arguments to this function is a POINTER to the PTE entry (needed to
>> make
>> a change persistent across function call right?), the PTE entry (as in
>> the
>> value) as well the wrprot flag.
>> 
>> Now the code snippet thats bothering me is this:
>> '
>>   86        ptepage = virt_to_page(ptep);
>>   87        mm = (struct mm_struct *) ptepage->mapping;
>>   88        addr = ptepage->index +
>>   89                (((unsigned long)ptep & ~PAGE_MASK) * PTRS_PER_PTE);
>> '
>> 
>> On line 86, we get the page structure for a given PTE but we pass the
>> pointer to PTE not the PTE itself whereas virt_to_page is a macro defined
>> as:
> 
> I don't remember why we did that in 2.6.10 however...
> 
>> #define virt_to_page(kaddr)   pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
>> 
>> Why are passing the POINTER to pte here? I mean are we looking for the
>> PAGE
>> that is described by the PTE or are we looking for the PAGE which
>> contains
>> the pointer to PTE? Me things it is the later since the former is given
>> by
>> the VALUE of the PTE not its POINTER. Right?
> 
> Ben: The above gets the page that contains the PTEs indeed, in order to
> get
> the associated mapping pointer which points to the struct mm_struct, and
> the index, which together are used to re-constitute the virtual address,
> probably in order to perform the actual invalidation. Nowadays, we just
> pass the virtual address down from the call site.
> Peg: Re-constitute the virtual address of what exactly? The virtual
> address that led us to the PTE is the most natural thought that comes to
> mind. However, the page which contains all these PTEs, would be typically
> categorized as a page directory right? So are we trying to get the page
> directory here...Sorry for sounding a bit hazy on this one...but I really
> am on this...:confused:
> 
> 
>> So if it indeed the later, what trickery are we here after? Perhaps
>> following the snippet will make us understand? As I see from above, after
>> that we get the 'address space object' associated with this page. 
>> 
>> What I don't understand is the following line:
>>  addr = ptepage->index + (((unsigned long)ptep & ~PAGE_MASK) *
>> PTRS_PER_PTE);
>> 
>> First we get the index of the page in the file i.e. the number of pages
>> preceding the page which holds the address of PTEP. Then we get the lower
>> 12
>> bits of this page. Then we shift that these bits to the left by 12 again
>> and
>> to it we add the above index. What is this doing?
>> 
>> There are other things in this function that I do not understand. I'd be
>> glad if someone could give me a heads up on this.
> 
> Ben: It's gross, the point is to rebuild the virtual address. You should
> *REALLY* update to a more recent kernel, that ancient code is broken in
> many ways as far as I can tell.
> Peg: Well Ben, if I could I would..but you do know the higher ups..and the
> way those baldies think now don't u? Its hard as such to work with
> them..helping them to a platter of such goodies would only mean that one
> is trying to undermine them (or so they'll think)...So Im between a rock
> and a hard place here....hence..i'd rather go with the hard place..and
> hope nice folks like yourself would help me make my life just a lil bit
> easier...:handshake:
> 
> Thanks again.
> 
> Pegasus
> 
> Cheers,
> Ben.
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 

-- 
View this message in context: http://old.nabble.com/Understanding-how-kernel-updates-MMU-hash-table-tp34760537p34762800.html
Sent from the linuxppc-dev mailing list archive at Nabble.com.