[PATCH] powerpc: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim
Mel Gorman
mel at csn.ul.ie
Wed Feb 24 03:23:12 EST 2010
On Tue, Feb 23, 2010 at 12:55:51PM +1100, Anton Blanchard wrote:
>
> Hi Mel,
>
I'm afraid I'm on vacation at the moment. This mail is costing me shots with
penaltys every minute it's open. It'll be early next week before I can look
at this closely.
Sorry.
> > You're pretty much on the button here. Only one thread at a time enters
> > zone_reclaim. The others back off and try the next zone in the zonelist
> > instead. I'm not sure what the original intention was but most likely it
> > was to prevent too many parallel reclaimers in the same zone potentially
> > dumping out way more data than necessary.
> >
> > > I'm not sure if there is an easy way to fix this without penalising other
> > > workloads though.
> > >
> >
> > You could experiment with waiting on the bit if the GFP flags allowi it? The
> > expectation would be that the reclaim operation does not take long. Wait
> > on the bit, if you are making the forward progress, recheck the
> > watermarks before continueing.
>
> Thanks to you and Christoph for some suggestions to try. Attached is a
> chart showing the results of the following tests:
>
>
> baseline.txt
> The current ppc64 default of zone_reclaim_mode = 0. As expected we see
> no change in remote node memory usage even after 10 iterations.
>
> zone_reclaim_mode.txt
> Now we set zone_reclaim_mode = 1. On each iteration we continue to improve,
> but even after 10 runs of stream we have > 10% remote node memory usage.
>
> reclaim_4096_pages.txt
> Instead of reclaiming 32 pages at a time, we try for a much larger batch
> of 4096. The slope is much steeper but it still takes around 6 iterations
> to get almost all local node memory.
>
> wait_on_busy_flag.txt
> Here we busy wait if the ZONE_RECLAIM_LOCKED flag is set. As you suggest
> we would need to check the GFP flags etc, but so far it looks the most
> promising. We only get a few percent of remote node memory on the first
> iteration and get all local node by the second.
>
>
> Perhaps a combination of larger batch size and waiting on the busy
> flag is the way to go?
>
> Anton
> --- mm/vmscan.c~ 2010-02-21 23:47:14.000000000 -0600
> +++ mm/vmscan.c 2010-02-22 03:22:01.000000000 -0600
> @@ -2534,7 +2534,7 @@
> .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
> .may_swap = 1,
> .nr_to_reclaim = max_t(unsigned long, nr_pages,
> - SWAP_CLUSTER_MAX),
> + 4096),
> .gfp_mask = gfp_mask,
> .swappiness = vm_swappiness,
> .order = order,
> --- mm/vmscan.c~ 2010-02-21 23:47:14.000000000 -0600
> +++ mm/vmscan.c 2010-02-21 23:47:31.000000000 -0600
> @@ -2634,8 +2634,8 @@
> if (node_state(node_id, N_CPU) && node_id != numa_node_id())
> return ZONE_RECLAIM_NOSCAN;
>
> - if (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED))
> - return ZONE_RECLAIM_NOSCAN;
> + while (zone_test_and_set_flag(zone, ZONE_RECLAIM_LOCKED))
> + cpu_relax();
>
> ret = __zone_reclaim(zone, gfp_mask, order);
> zone_clear_flag(zone, ZONE_RECLAIM_LOCKED);
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
More information about the Linuxppc-dev
mailing list