[PATCH 1/2] powerpc/mm: Enable HugeTLB page migration

Anshuman Khandual khandual at linux.vnet.ibm.com
Tue Feb 2 17:19:55 AEDT 2016


On 01/29/2016 02:45 PM, Anshuman Khandual wrote:
> On 01/28/2016 08:14 PM, Aneesh Kumar K.V wrote:
>> > Anshuman Khandual <khandual at linux.vnet.ibm.com> writes:
>> > 
>>> >> This enables HugeTLB page migration for PPC64_BOOK3S systems which implement
>>> >> HugeTLB page at the PMD level. It enables the kernel configuration option
>>> >> CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION by default which turns on the function
>>> >> hugepage_migration_supported() during migration. After the recent changes
>>> >> to the PTE format, HugeTLB page migration happens successfully.
>>> >>
>>> >> Signed-off-by: Anshuman Khandual <khandual at linux.vnet.ibm.com>
>>> >> ---
>>> >>  arch/powerpc/Kconfig | 4 ++++
>>> >>  1 file changed, 4 insertions(+)
>>> >>
>>> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> >> index e4824fd..65d52a0 100644
>>> >> --- a/arch/powerpc/Kconfig
>>> >> +++ b/arch/powerpc/Kconfig
>>> >> @@ -82,6 +82,10 @@ config GENERIC_HWEIGHT
>>> >>  config ARCH_HAS_DMA_SET_COHERENT_MASK
>>> >>          bool
>>> >>
>>> >> +config ARCH_ENABLE_HUGEPAGE_MIGRATION
>>> >> +	def_bool y
>>> >> +	depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
>>> >> +
>>> >>  config PPC
>>> >>  	bool
>>> >>  	default y
>> > 
>> > 
>> > Are you sure this is all that is needed ? We will get a FOLL_GET with hugetlb
>> > migration and our follow_huge_addr will BUG_ON on that. Look at
>> > e66f17ff71772b209eed39de35aaa99ba819c93d (" mm/hugetlb: take page table
>> > lock in follow_huge_pmd()").
> HugeTLB page migration was successful without any error and data integrity
> check passed on them as well. But yes there might be some corner cases which
> trigger the race condition we have not faced yet. Will try to understand the
> situation there and get back.

Aneesh,

The current test which passed successfully in moving HugeTLB pages is
driven from the soft offline sysfs interface taking PFN (from pagemap
interface) as the argument. Its kind of directly calls migrate_pages()
handing out the page struct list as the argument. But the sample test
case in commit e66f17ff71772b ("mm/hugetlb: take page table lock in
follow_huge_pmd()") is able to crash the kernel at the BUG_ON as you
had mentioned before.

CPU: 2 PID: 6386 Comm: movepages Not tainted 4.5.0-rc2-00002-gc3ac0a3 #3
task: c0000003e8792400 ti: c0000003f65cc000 task.ti: c0000003f65cc000
NIP: c0000000001f485c LR: c0000000001f4844 CTR: 0000000000000000
REGS: c0000003f65cfa20 TRAP: 0700   Not tainted  (4.5.0-rc2-00002-gc3ac0a3)
MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 28044488  XER: 00000000
CFAR: c000000000050310 SOFTE: 1 
GPR00: c0000000001f4844 c0000003f65cfca0 c000000000d82e00 f000000000ec0000 
GPR04: 00003efff6000000 c0000003f65cfc74 c0000003f65cfc70 0000000000000001 
GPR08: f000000000000000 0000000000000000 fffffffffffff000 0000000000000000 
GPR12: 0000000000004400 c00000000ea70800 fffffffffffffff2 c0000003d3330000 
GPR16: 0000000000000a00 c0000003f65cfd30 0000000000000000 c0000003d3330000 
GPR20: c000000000239f50 0000000000000100 fffffffffffff000 0000000001320122 
GPR24: c0000003ed267ee8 c0000003f65cfd70 00003efff6000000 c0000003f65cfd80 
GPR28: 000000000000008c c0000003ed267e80 c0000003ed2299d0 0000000000000004 
NIP [c0000000001f485c] follow_page_mask+0x7c/0x440
LR [c0000000001f4844] follow_page_mask+0x64/0x440
Call Trace:
[c0000003f65cfca0] [c0000000001f4844] follow_page_mask+0x64/0x440 (unreliable)
[c0000003f65cfd10] [c00000000023d2d8] SyS_move_pages+0x518/0x820
[c0000003f65cfe30] [c000000000009260] system_call+0x38/0xd0
Instruction dump:
7cdb3378 7c9a2378 eba30040 91260000 7fa3eb78 4be5b9f9 60000000 3940f000 
7fa35040 41dd0044 57ff077a 7bff0020 <0b1f0000> 38210070 e8010010 eae1ffb8

In the function follow_page_mask() we have this code block right at the
front where it fails.

	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
	if (!IS_ERR(page)) {
		BUG_ON(flags & FOLL_GET);
		return page;
	}

As you mentioned, currently we dont implement CONFIG_ARCH_WANT_GENERAL_HUGETLB.
So we define follow_huge_addr() function which returns a valid page struct
for any given address. But then dont understand why we bug on for FOLL_GET.
move_pages() had called follow_page() with FOLL_GET flag at the first place.
So this condition is going to be true all the time. I am still digging into
this but meanwhile if you can throw some light on why we have BUG_ON for
FOLL_GET flag it will really be helpful. Did not get much clues looking into
the previous commit which added this BUG_ON.



More information about the Linuxppc-dev mailing list