[PATCH -V1 09/24] powerpc: Decode the pte-lp-encoding bits correctly.
Aneesh Kumar K.V
aneesh.kumar at linux.vnet.ibm.com
Mon Mar 4 22:41:53 EST 2013
Paul Mackerras <paulus at samba.org> writes:
> On Tue, Feb 26, 2013 at 01:34:59PM +0530, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar at linux.vnet.ibm.com>
>>
>> We look at both the segment base page size and actual page size and store
>> the pte-lp-encodings in an array per base page size.
>>
>> We also update all relevant functions to take actual page size argument
>> so that we can use the correct PTE LP encoding in HPTE. This should also
>> get the basic Multiple Page Size per Segment (MPSS) support. This is needed
>> to enable THP on ppc64.
>
> Mostly looks OK, comments below...
>
>> +/*
>> + * HPTE LP details
>> + */
>> +#define LP_SHIFT 12
>> +#define LP_BITS 8
>> +#define LP_MASK(i) ((0xFF >> (i)) << LP_SHIFT)
>
> The reader might be wondering at this point what "LP" is; be kind and
> make it "large page" in the comment for them.
Fixed.
>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 71d0c90..48f6d99 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -1515,7 +1515,7 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
>> (*sps)->page_shift = def->shift;
>> (*sps)->slb_enc = def->sllp;
>> (*sps)->enc[0].page_shift = def->shift;
>> - (*sps)->enc[0].pte_enc = def->penc;
>> + (*sps)->enc[0].pte_enc = def->penc[linux_psize];
>> (*sps)++;
>> }
>
> This will only return the entries where actual page size == base page
> size, which basically means that KVM guests won't be able to use
> MPSS. We will need to return multiple entries in that case.
I did that as a the follow up patch.
[PATCH -V1 10/24] powerpc: Return all the valid pte ecndoing in
KVM_PPC_GET_SMMU_INFO ioct
>
>> +static inline int hpte_actual_psize(struct hash_pte *hptep, int psize)
>> +{
>> + unsigned int mask;
>> + int i, penc, shift;
>> + /* Look at the 8 bit LP value */
>> + unsigned int lp = (hptep->r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
>> +
>> + penc = 0;
>> + for (i = 0; i < MMU_PAGE_COUNT; i++) {
>> + /* valid entries have a shift value */
>> + if (!mmu_psize_defs[i].shift)
>> + continue;
>> +
>> + /* encoding bits per actual page size */
>> + shift = mmu_psize_defs[i].shift - 11;
>> + if (shift > 9)
>> + shift = 9;
>> + mask = (1 << shift) - 1;
>> + if ((lp & mask) == mmu_psize_defs[psize].penc[i])
>> + return i;
>> + }
>> + return -1;
>> +}
>
> This doesn't look right to me. First, it's not clear what the 11 and
> 9 refer to, and I think the 9 should be LP_BITS (i.e. 8). Secondly,
> the mask for the comparison needs to depend on the actual page size
> not the base page size.
That 11 should be 12.That depends on the fact that we have below mapping
rrrr rrrz ≥8KB
Yes, that 9 should be LP_BITs.
We are generating mask based on actual page size above (variable i in
the for loop).
>
> I strongly suggest you pull out this code together with
> native_hpte_insert into a little userspace test program that runs
> through all the possible page size combinations, creating an HPTE and
> then decoding it with hpte_actual_psize() to check that you get back
> the correct actual page size.
>
will do.
>> static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
>> - int *psize, int *ssize, unsigned long *vpn)
>> + int *psize, int *apsize, int *ssize, unsigned long *vpn)
>> {
>> unsigned long avpn, pteg, vpi;
>> unsigned long hpte_r = hpte->r;
>> unsigned long hpte_v = hpte->v;
>> unsigned long vsid, seg_off;
>> - int i, size, shift, penc;
>> + int i, size, a_size = MMU_PAGE_4K, shift, penc;
>>
>> if (!(hpte_v & HPTE_V_LARGE))
>> size = MMU_PAGE_4K;
>> @@ -395,12 +422,13 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
>> /* valid entries have a shift value */
>> if (!mmu_psize_defs[size].shift)
>> continue;
>> -
>> - if (penc == mmu_psize_defs[size].penc)
>> - break;
>> + for (a_size = 0; a_size < MMU_PAGE_COUNT; a_size++)
>> + if (penc == mmu_psize_defs[size].penc[a_size])
>> + goto out;
>
> Once again I don't think this is correct, since the number of bits in
> the page size encoding depends on the page size. In fact the
> calculation of penc in that function looks completely bogus to me (not
> that that is code that you have written or modified, but it looks to
> me like it needs fixing).
I am fixing that in the later patch
powerpc: Fix hpte_decode to use the correct decoding for page sizes
But that will also need fixing as you suggested above.
>
>> static int __init htab_dt_scan_page_sizes(unsigned long node,
>> const char *uname, int depth,
>> void *data)
>> @@ -294,60 +318,57 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
>> size /= 4;
>> cur_cpu_spec->mmu_features &= ~(MMU_FTR_16M_PAGE);
>> while(size > 0) {
>> - unsigned int shift = prop[0];
>> + unsigned int base_shift = prop[0];
>> unsigned int slbenc = prop[1];
>> unsigned int lpnum = prop[2];
>> - unsigned int lpenc = 0;
>> struct mmu_psize_def *def;
>> - int idx = -1;
>> + int idx, base_idx;
>>
>> size -= 3; prop += 3;
>> - while(size > 0 && lpnum) {
>> - if (prop[0] == shift)
>> - lpenc = prop[1];
>> + base_idx = get_idx_from_shift(base_shift);
>> + if (base_idx < 0) {
>> + /*
>> + * skip the pte encoding also
>> + */
>> prop += 2; size -= 2;
>> - lpnum--;
>> + continue;
>> }
>> - switch(shift) {
>> - case 0xc:
>> - idx = MMU_PAGE_4K;
>> - break;
>> - case 0x10:
>> - idx = MMU_PAGE_64K;
>> - break;
>> - case 0x14:
>> - idx = MMU_PAGE_1M;
>> - break;
>> - case 0x18:
>> - idx = MMU_PAGE_16M;
>> + def = &mmu_psize_defs[base_idx];
>> + if (base_idx == MMU_PAGE_16M)
>> cur_cpu_spec->mmu_features |= MMU_FTR_16M_PAGE;
>> - break;
>> - case 0x22:
>> - idx = MMU_PAGE_16G;
>> - break;
>> - }
>> - if (idx < 0)
>> - continue;
>> - def = &mmu_psize_defs[idx];
>> - def->shift = shift;
>> - if (shift <= 23)
>> +
>> + def->shift = base_shift;
>> + if (base_shift <= 23)
>> def->avpnm = 0;
>> else
>> - def->avpnm = (1 << (shift - 23)) - 1;
>> + def->avpnm = (1 << (base_shift - 23)) - 1;
>> def->sllp = slbenc;
>> - def->penc = lpenc;
>> - /* We don't know for sure what's up with tlbiel, so
>> + /*
>> + * We don't know for sure what's up with tlbiel, so
>> * for now we only set it for 4K and 64K pages
>> */
>> - if (idx == MMU_PAGE_4K || idx == MMU_PAGE_64K)
>> + if (base_idx == MMU_PAGE_4K || base_idx == MMU_PAGE_64K)
>> def->tlbiel = 1;
>> else
>> def->tlbiel = 0;
>>
>> - DBG(" %d: shift=%02x, sllp=%04lx, avpnm=%08lx, "
>> - "tlbiel=%d, penc=%d\n",
>> - idx, shift, def->sllp, def->avpnm, def->tlbiel,
>> - def->penc);
>> + while (size > 0 && lpnum) {
>> + unsigned int shift = prop[0];
>> + unsigned int penc = prop[1];
>> +
>> + prop += 2; size -= 2;
>> + lpnum--;
>> +
>> + idx = get_idx_from_shift(shift);
>> + if (idx < 0)
>> + continue;
>> +
>> + def->penc[idx] = penc;
>> + DBG(" %d: shift=%02x, sllp=%04lx, "
>> + "avpnm=%08lx, tlbiel=%d, penc=%d\n",
>> + idx, shift, def->sllp, def->avpnm,
>> + def->tlbiel, def->penc[idx]);
>> + }
>
> I don't see where in this function you set the penc[] elements for
> invalid actual page sizes to -1.
We do the below
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -125,7 +125,7 @@ static struct mmu_psize_def mmu_psize_defaults_old[] = {
[MMU_PAGE_4K] = {
.shift = 12,
.sllp = 0,
- .penc = 0,
+ .penc = { [0 ... MMU_PAGE_COUNT - 1] = -1 },
.avpnm = 0,
-aneesh
More information about the Linuxppc-dev
mailing list