[Cbe-oss-dev] [PATCH] 64K page support for kexec
Luke Browning
lukebr at linux.vnet.ibm.com
Wed Apr 25 23:06:00 EST 2007
On Wed, 2007-04-25 at 08:48 +1000, Benjamin Herrenschmidt wrote:
> Getting better :-)
>
> Sorry for the constant nagging, let's say I'm a bit perfectionist...
>
so am I. That is why I redid the patch instead of going with the
previous version.
> > -static unsigned long slot2va(unsigned long hpte_v, unsigned long slot)
> > -{
> > - unsigned long avpn = HPTE_V_AVPN_VAL(hpte_v);
> > - unsigned long va;
> > -
> > - va = avpn << 23;
> > -
> > - if (! (hpte_v & HPTE_V_LARGE)) {
> > - unsigned long vpi, pteg;
> > -
> > - pteg = slot / HPTES_PER_GROUP;
> > - if (hpte_v & HPTE_V_SECONDARY)
> > - pteg = ~pteg;
>
> Hrm... hpte_decode ends up being a pretty big function... I suppose
> that's ok.
>
It is bigger because it combines two functions. I think it is better
this way as it will make it easier to add support for 16G pages and 1T
segments in the future.
> > +#define LP_SHIFT 12
> > +#define LP_BITS 8
> > +#define LP_MASK(i) ((((1 << LP_BITS) - 1) >> (i)) << LP_SHIFT)
> > +
> > +static void hpte_decode(hpte_t *hpte, unsigned long slot,
> > + int *psize, unsigned long *va)
> > +{
> > + unsigned long hpte_r = hpte->r;
> > + unsigned long hpte_v = hpte->v;
> > + unsigned long avpn;
> > + int i, size, shift, penc, avpnm_bits;
> > +
> > + if (!(hpte_v & HPTE_V_LARGE))
> > + size = MMU_PAGE_4K;
> > +#if 0
> > + else if (hpte_v & 0x4000000000000000UL)
> > + size = MMU_PAGE_16G;
> > +#endif
>
> Remove the above. I don't think it's right anyway. We'll deal with 16G
> pages when we start using them.
OK. I put it there mainly as a place holder. How many page sizes do
you expect to support in a 1TB segment?
>
> > + else if (!(hpte_r & LP_MASK(0)))
> > + size = MMU_PAGE_16M;
>
> Is that correct ? (The above).
> I haven't quite noticed in the previous
> instances of the patch, sorry about that but I don't think that's the
> way to detect the "old style" 16M pages... I -suspect- that the normal
> algorithm will work for them, that is, they'll have a penc of 0 which
> will match what's in the mmu_psize_defs[MMU_PAGE_16M] but it's worth
> actually testing it.
Ok.
>
> Now that I think about it, it's possible that I lead you on the wrong
> track there initially... Sorry about that.
>
> I think your above code will treat anything with a penc of 0 as a 16M
> page, which might be true with current implementations, is also, I
> think, not mandated by the arch, is it ? (I don't have my 2.03 at hand
> as I'm writing this email).
right. The encoding is implementation dependent. There is a minimum
size requirement that comes into play, but that works the other way.
penc of zero has a minimum page size requirement of 8K.
I will remove the test and use the array for the comparison.
>
> > + else {
> > + for (i = 0; i < LP_BITS; i++) {
> > + if ((hpte_r & LP_MASK(i+1)) == LP_MASK(i+1))
> > + break;
> > + }
> > + penc = LP_MASK(i+1) >> LP_SHIFT;
> > + for (size = MMU_PAGE_64K; size < MMU_PAGE_16M; size++) {
> > + if (!mmu_psize_defs[size].shift)
> > + continue;
> > + if (penc == mmu_psize_defs[size].penc)
> > + break;
> > + }
> > + }
> >
> > - vpi = ((va >> 28) ^ pteg) & htab_hash_mask;
> > + /*
> > + * FIXME, this could be made more efficient by storing the type
> > + * of hash algorithm in mmu_psize_defs[]. The code below assumes
> > + * the number of bits in the va representing the offset in the
> > + * page is less than 23. This affects the hash algorithm that is
> > + * used. When 16G pages are supported, a new hash algorithm
> > + * needs to be provided. See POWER ISA Book III.
> > + *
> > + * The code below works for 16M, 64K, and 4K pages.
> > + */
>
> I'm not 100% certain about your comment. The by type of hash algorithm
> you mean the segment size right ? This is not directly related to the
> page size. While 1T segments are mandatory for 16G pages, they can also
> hold normal page sizes... If we're going to implement support for 1T
> segment, we should get the segment size (and thus the hash algorithm)
> from the B bit of the PTE.
yes. what you said is correct. I was thinking that you might want to
double the number of array elements so that you could apply a different
hash algorithm based on the type of segments.
>
> In fact, when doing 1T segments, we'll have to deal with them regardless
> of the page size (the hashing will be different for all page sizes).
>
> > + shift = mmu_psize_defs[size].shift;
> > + if (mmu_psize_defs[size].avpnm)
> > + avpnm_bits = __ilog2_u64(mmu_psize_defs[size].avpnm) + 1;
> > + else
> > + avpnm_bits = 0;
> > + if (shift - avpnm_bits <= 23) {
> > + avpn = HPTE_V_AVPN_VAL(hpte_v) << 23;
> > +
> > + if (shift < 23) {
> > + unsigned long vpi, pteg;
> > +
> > + pteg = slot / HPTES_PER_GROUP;
> > + if (hpte_v & HPTE_V_SECONDARY)
> > + pteg = ~pteg;
> > + vpi = ((avpn >> 28) ^ pteg) & htab_hash_mask;
> > + avpn |= (vpi << mmu_psize_defs[size].shift);
> > + }
> > + }
> > +#if 0
> > + /* 16GB page hash, p > 23 */
> > + else {
> >
> > - va |= vpi << PAGE_SHIFT;
> > }
> > +#endif
>
> Just don't keep the code in #if 0, just a comment about something
> needing to be done for 16G ...
>
ok.
> > - return va;
> > + *va = avpn;
> > + *psize = size;
> > }
> >
> > /*
> > @@ -374,8 +420,6 @@ static unsigned long slot2va(unsigned lo
> > *
> > * TODO: add batching support when enabled. remember, no dynamic memory here,
> > * athough there is the control page available...
> > - *
> > - * XXX FIXME: 4k only for now !
> > */
> > static void native_hpte_clear(void)
> > {
> > @@ -383,6 +427,7 @@ static void native_hpte_clear(void)
> > hpte_t *hptep = htab_address;
> > unsigned long hpte_v;
> > unsigned long pteg_count;
> > + int psize;
> >
> > pteg_count = htab_hash_mask + 1;
> >
> > @@ -408,8 +453,9 @@ static void native_hpte_clear(void)
> > * already hold the native_tlbie_lock.
> > */
> > if (hpte_v & HPTE_V_VALID) {
> > + hpte_decode(hptep, slot, &psize, &hpte_v);
> > hptep->v = 0;
> > - __tlbie(slot2va(hpte_v, slot), MMU_PAGE_4K);
> > + __tlbie(hpte_v, psize);
> > }
> > }
> >
> >
>
More information about the cbe-oss-dev
mailing list