[PATCH v3 04/13] powerpc: assert_pte_locked() use pte_offset_map_nolock()

Fri Jul 21 23:13:41 AEST 2023

On Jul 19 2023, Aneesh Kumar K V wrote:
> On 7/19/23 10:34 AM, Hugh Dickins wrote:
> > On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote:
> >> Hugh Dickins <hughd at google.com> writes:
> >>
> >>> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock()
> >>> in assert_pte_locked().  BUG if pte_offset_map_nolock() fails: this is
> >>> stricter than the previous implementation, which skipped when pmd_none()
> >>> (with a comment on khugepaged collapse transitions): but wouldn't we want
> >>> to know, if an assert_pte_locked() caller can be racing such transitions?
> >>>
> >>
> >> The reason we had that pmd_none check there was to handle khugpaged. In
> >> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear.
> >> ppc64 had the assert_pte_locked check inside that ptep_clear.
> >>
> >> _pmd = pmdp_collapse_flush(vma, address, pmd);
> >> ..
> >> ptep_clear()
> >> -> asset_ptep_locked()
> >> ---> pmd_none
> >> -----> BUG
> >>
> >>
> >> The problem is how assert_pte_locked() verify whether we are holding
> >> ptl. It does that by walking the page table again and in this specific
> >> case by the time we call the function we already had cleared pmd .
> > 
> > Aneesh, please clarify, I've spent hours on this.
> > 
> > From all your use of past tense ("had"), I thought you were Acking my
> > patch; but now, after looking again at v3.11 source and today's,
> > I think you are NAKing my patch in its present form.
> > 
> 
> Sorry for the confusion my reply created. 
> 
> > You are pointing out that anon THP's __collapse_huge_page_copy_succeeded()
> > uses ptep_clear() at a point after pmdp_collapse_flush() already cleared
> > *pmd, so my patch now leads that one use of assert_pte_locked() to BUG.
> > Is that your point?
> > 
> 
> Yes. I haven't tested this yet to verify that it is indeed hitting that BUG.
> But a code inspection tells me we will hit that BUG on powerpc because of
> the above details.
>
Hi Aneesh,

After testing it, I can confirm that it encountered a BUG on powerpc.
Log report as attached

Thanks,
Jay Patel 
> > I can easily restore that khugepaged comment (which had appeared to me
> > out of date at the time, but now looks still relevant) and pmd_none(*pmd)
> > check: but please clarify.
> > 
> 
> That is correct. if we add that pmd_none check back we should be good here.
> 
> 
> -aneesh
-------------- next part --------------
[   53.513058][  T105] ------------[ cut here ]------------
[   53.513080][  T105] kernel BUG at arch/powerpc/mm/pgtable.c:327!
[   53.513090][  T105] Oops: Exception in kernel mode, sig: 5 [#1]
[   53.513099][  T105] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   53.513109][  T105] Modules linked in: bonding pseries_rng rng_core vmx_crypto gf128mul ibmveth crc32c_vpmsum fuse autofs4                                                                              
[   53.513135][  T105] CPU: 3 PID: 105 Comm: khugepaged Not tainted 6.5.0-rc1-gebfaf626e99f-dirty #1
[   53.513146][  T105] Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries                                                                             
[   53.513156][  T105] NIP:  c000000000079478 LR: c00000000007946c CTR: 0000000000000000
[   53.513165][  T105] REGS: c000000008e9b930 TRAP: 0700   Not tainted  (6.5.0-rc1-gebfaf626e99f-dirty)                                                                                                    
[   53.513175][  T105] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002882  XER: 20040000                                                                                               
[   53.513202][  T105] CFAR: c000000000412a30 IRQMASK: 0
[   53.513202][  T105] GPR00: c00000000007946c c000000008e9bbd0 c0000000012d3500 0000000000000001
[   53.513202][  T105] GPR04: 0000000011000000 c000000008e9bbb0 c000000008e9bbf0 ffffffffffffffff
[   53.513202][  T105] GPR08: 00000000000003ff 0000000000000000 0000000000000000 000000000000000a
[   53.513202][  T105] GPR12: 00000000497b0000 c00000001ec9d480 c00c00000016fe00 c000000051455000
[   53.513202][  T105] GPR16: 0000000000000000 ffffffffffffffff 0000000000000001 0000000000000001
[   53.513202][  T105] GPR20: c000000002912430 c000000051455000 0000000000000000 c00000000946e650
[   53.513202][  T105] GPR24: c0000000029800e8 0000000011000000 c00c000000145168 c000000002980180
[   53.513202][  T105] GPR28: 0000000011000000 8603f85b000080c0 c000000008e9bc70 c00000001bd0d680
[   53.513304][  T105] NIP [c000000000079478] assert_pte_locked.part.18+0x168/0x1b0
[   53.513318][  T105] LR [c00000000007946c] assert_pte_locked.part.18+0x15c/0x1b0
[   53.513329][  T105] Call Trace:
[   53.513335][  T105] [c000000008e9bbd0] [c00000000007946c] assert_pte_locked.part.18+0x15c/0x1b0 (unreliable)                                                                                            
[   53.513350][  T105] [c000000008e9bc00] [c00000000048e10c] collapse_huge_page+0x11dc/0x1700
[   53.513362][  T105] [c000000008e9bd40] [c00000000048ed18] hpage_collapse_scan_pmd+0x6e8/0x850
[   53.513374][  T105] [c000000008e9be30] [c000000000492544] khugepaged+0x7e4/0xb70
[   53.513386][  T105] [c000000008e9bf90] [c00000000015f038] kthread+0x138/0x140
[   53.513399][  T105] [c000000008e9bfe0] [c00000000000dd58] start_kernel_thread+0x14/0x18
[   53.513411][  T105] Code: 7c852378 7c844c36 794a1564 7c894038 794af082 79241f24 78eaf00e 7c8a2214 48399541 60000000 7c630074 7863d182 <0b030000> e9210020 81290000 7d290074                             
[   53.513448][  T105] ---[ end trace 0000000000000000 ]---
[   53.516544][  T105]
[   53.516551][  T105] note: khugepaged[105] exited with irqs disabled
[  182.648447][ T1952] mconf[1952]: segfault (11) at 110efa38 nip 1001e97c lr 1001e8a8 code 1
[  182.648482][ T1952] mconf[1952]: code: 60420000 4bfffd59 4bffffd0 60000000 60420000 e93f0070 2fa90000 409e0014
[  182.648494][ T1952] mconf[1952]: code: 480000cc e9290000 2fa90000 419e00c0 <81490008> 2f8a0005 409effec e9490028
[  962.694079][T39811] sda2: Can't mount, would change RO state