[PATCH 4/7] powerpc: Free up four 64K PTE bits in 64K backed HPTE pages
Ram Pai
linuxram at us.ibm.com
Tue Oct 24 10:42:46 AEDT 2017
On Mon, Oct 23, 2017 at 02:22:44PM +0530, Aneesh Kumar K.V wrote:
> Benjamin Herrenschmidt <benh at kernel.crashing.org> writes:
>
> > On Fri, 2017-09-08 at 15:44 -0700, Ram Pai wrote:
> >> The second part of the PTE will hold
> >> (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
> >> NOTE: None of the bits in the secondary PTE were not used
> >> by 64k-HPTE backed PTE.
> >
> > Have you measured the performance impact of this ? The second part of
> > the PTE being in a different cache line there could be one...
> >
>
> I am also looking at a patch series removing the slot tracking
> completely. With randomize address turned off and no swap in guest/host
> and making sure we touched most of guest ram, I don't find much impact
> in performance when we don't track the slot at all. I will post the
> patch series with numbers in a day or two. But my test was
>
> while (5000) {
> mmap(128M)
> touch every page of 2048 pages
> munmap()
> }
>
> I could also be the best case in my run because i might have always
> found the hash pte slot in the primary. In one measurement with swap on
> and address randmization enabled, i did find a 50% impact. But then i
> was not able to recreate that again. So could be something i did wrong
> in the test setup.
>
> Ram,
>
> Will you be able to get a test run with the above loop?
Yes. results with patch look good; better than w/o patch.
/-----------------------------------------------\
|Itteratn| secs w/ patch |secs w/o patch |
-------------------------------------------------
|1 | 45.572621 | 49.046994 |
|2 | 46.049545 | 49.378756 |
|3 | 46.103657 | 49.223591 |
|4 | 46.298903 | 48.991245 |
|5 | 46.353202 | 48.988033 |
|6 | 45.440878 | 49.175846 |
|7 | 46.860373 | 49.008395 |
|8 | 46.221390 | 49.236964 |
|9 | 45.794993 | 49.171927 |
|10 | 46.569491 | 48.995628 |
|-----------------------------------------------|
|average | 46.1265053 | 49.1217379 |
\-----------------------------------------------/
The code is as follows:
diff --git a/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c b/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
index 8d084a2..ef2ad87 100644
--- a/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
+++ b/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
@@ -10,14 +10,14 @@
#include "utils.h"
-#define ITERATIONS 5000000
+#define ITERATIONS 5000
#define MEMSIZE (128 * 1024 * 1024)
int test_mmap(void)
{
struct timespec ts_start, ts_end;
- unsigned long i = ITERATIONS;
+ unsigned long i = ITERATIONS, j;
clock_gettime(CLOCK_MONOTONIC, &ts_start);
@@ -25,6 +25,10 @@ int test_mmap(void)
char *c = mmap(NULL, MEMSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
FAIL_IF(c == MAP_FAILED);
+
+ for (j=0; j < (MEMSIZE >> 16); j++)
+ c[j<<16] = 0xf;
+
munmap(c, MEMSIZE);
}
More information about the Linuxppc-dev
mailing list