Infinite looping observed in __offline_pages
Rashmica
rashmica.g at gmail.com
Wed Aug 1 11:37:05 AEST 2018
On 26/07/18 04:11, John Allen wrote:
> Hi All,
>
> Under heavy stress and constant memory hot add/remove, I have observed
> the following loop to occasionally loop infinitely:
>
> mm/memory_hotplug.c:__offline_pages
>
> repeat:
> /* start memory hot removal */
> ret = -EINTR;
> if (signal_pending(current))
> goto failed_removal;
>
> cond_resched();
> lru_add_drain_all();
> drain_all_pages(zone);
>
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> goto repeat;
> }
>
What is CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE set to for you?
I have also observed this when hot removing and adding memory. However I
only have only seen this when my kernel has
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n (when it is set to online
automatically I do not have this issue) so I assumed that I wasn't
onlining the memory properly...
> What appears to be happening in this case is that do_migrate_range
> returns a failure code which is being ignored. The failure is stemming
> from migrate_pages returning "1" which I'm guessing is the result of
> us hitting the following case:
>
> mm/migrate.c: migrate_pages
>
> default:
> /*
> * Permanent failure (-EBUSY, -ENOSYS, etc.):
> * unlike -EAGAIN case, the failed page is
> * removed from migration page list and not
> * retried in the next outer loop.
> */
> nr_failed++;
> break;
> }
>
> Does a failure in do_migrate_range indicate that the range is
> unmigratable and the loop in __offline_pages should terminate and goto
> failed_removal? Or should we allow a certain number of retrys before we
> give up on migrating the range?
>
> This issue was observed on a ppc64le lpar on a 4.18-rc6 kernel.
>
> -John
>
More information about the Linuxppc-dev
mailing list