[Cbe-oss-dev] 64K Page Support for Kexec

Benjamin Herrenschmidt benh at kernel.crashing.org
Mon Mar 26 07:11:31 EST 2007


On Sun, 2007-03-25 at 11:00 -0300, luke wrote:
> There are some XXX comments pertaining to large page support in
> slot2va() that I am not clear about.  
> 
> I wrote the following routine to determine the large page size encoding.
> 
> static inline int hpte_decode_lpsize(unsigned long pa)
> {
>         int i;
> 
>         for (i = MMU_PAGE_COUNT - 1; i > 0; i--) {
>                 unsigned int penc = mmu_psize_defs[i].penc;
>                 if ((penc & pa) == penc)
>                         break;
>         }
>         return i;
> }

Looks good.

> It is invoked as follows:
> 
> static void native_hpte_clear(void)
> {
>         unsigned long slot, slots, flags;
>         hpte_t *hptep = htab_address;
>         unsigned long hpte_v;
>         unsigned long pteg_count;
> 
>         pteg_count = htab_hash_mask + 1;
> 
>         local_irq_save(flags);
> 
>         /* we take the tlbie lock and hold it.  Some hardware will
>          * deadlock if we try to tlbie from two processors at once.
>          */
>         spin_lock(&native_tlbie_lock);
> 
>         slots = pteg_count * HPTES_PER_GROUP;
> 
>         for (slot = 0; slot < slots; slot++, hptep++) {
>                 /*
>                  * we could lock the pte here, but we are the only cpu
>                  * running,  right?  and for crash dump, we probably
>                  * don't want to wait for a maybe bad cpu.
>                  */
>                 hpte_v = hptep->v;
> 
>                 /*
>                  * Call __tlbie() here rather than tlbie() since we
>                  * already hold the native_tlbie_lock.
>                  */
>                 if (hpte_v & HPTE_V_VALID) {
>                         if (!(hpte_v & HPTE_V_LARGE))
>                                 psize = MMU_PAGE_4K;
>                         else
>                                 psize = hpte_decode_lpsize(hptep->r);
>                         hptep->v = 0;
>                         __tlbie(slot2va(hpte_v, slot), psize);
>                 }
>         }
> 
>         asm volatile("eieio; tlbsync; ptesync":::"memory");
>         spin_unlock(&native_tlbie_lock);
>         local_irq_restore(flags);
> }

Looks good too

> I am confused about the following comment.
> 
> /*
>  * XXX This need fixing based on page size. It's only used by
>  * native_hpte_clear() for now which needs fixing too so they
>  * make a good pair...
>  */
> static unsigned long slot2va(unsigned long hpte_v, unsigned long slot)
> {
>         unsigned long avpn = HPTE_V_AVPN_VAL(hpte_v);
>         unsigned long va;
> 
>         va = avpn << 23;
> 
>         if (! (hpte_v & HPTE_V_LARGE)) {
>                 unsigned long vpi, pteg;
> 
>                 pteg = slot / HPTES_PER_GROUP;
>                 if (hpte_v & HPTE_V_SECONDARY)
>                         pteg = ~pteg;
> 
>                 vpi = ((va >> 28) ^ pteg) & htab_hash_mask;
> 
>                 va |= vpi << PAGE_SHIFT;
>         }
> 
>         return va;
> }
> 
> The routine above looks OK to me.  What am I missing?

A quick look without my brain's math function enabled says it ought to
be right for 256M segments (we don't quite do 1T segments yet anyway)
for the native page size (page size == PAGE_SIZE). You may want to
double-check for non native though.

For example, you can have 64K pages in the hash table with PAGE_SHIFT
set to 12 (4k base page size) or 16M pages (the former with the 64K LS
hack, the later with hugetlbfs), or you can have 16M pages in with a
PAGE_SHIFT of 16. I'm not sure va |= vpi << PAGE_SHIFT statement is
correct in that context. Also, for 16M pages, I suppose avpn << 23 is
correct (doesn't need to mask out the low bit) because the avpn in the
hash pte already had the low bit clear but it might be worth dbl
checking.

Ben.


Ben.





More information about the cbe-oss-dev mailing list