[PATCH v3 04/25] KVM: x86/mmu: Add dedicated API to map guest_memfd pfn into TDP MMU
Yan Zhao
yan.y.zhao at intel.com
Thu Oct 30 19:34:06 AEDT 2025
On Wed, Oct 22, 2025 at 12:53:53PM +0800, Yan Zhao wrote:
> On Thu, Oct 16, 2025 at 05:32:22PM -0700, Sean Christopherson wrote:
> > Link: https://lore.kernel.org/all/20250709232103.zwmufocd3l7sqk7y@amd.com
>
> Hi Sean,
>
> Will you post [1] to fix the AB-BA deadlock issue for huge page in-place
> conversion as well?
>
> Without it, the "WARNING: possible circular locking dependency detected" would
> still appear due to
>
> - lock(mapping.invalidate_lock#4) --> lock(&mm->mmap_lock)
> for init mem on non-in-place-conversion guest_memfd
> - rlock(&mm->mmap_lock) --> rlock(mapping.invalidate_lock#4)
> for faulting shared pages on in-place-convertion guest_memfd
>
> [1] https://lore.kernel.org/all/aHEwT4X0RcfZzHlt@google.com/
[2] https://lore.kernel.org/all/cover.1760731772.git.ackerleytng@google.com/
Note: [1] is still required even with [2].
Consider the following scenario (assuming vm_memory_attributes=Y):
1. Create a TDX VM with non-in-place-conversion guest_memfd.
In the init mem path, the lock sequence is
lock(mapping.invalidate_lock#4) --> lock(&mm->mmap_lock)
2. Create a normal VM with in-place-conversion guest_memfd, with guest_memfd
memory defaulting to shared by specifying flags
GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED.
(Since kvm_arch_supports_gmem_init_shared() returns true for normal VMs due
to kvm->arch.has_private_mem == false, GUEST_MEMFD_FLAG_INIT_SHARED is a
valid flag).
Accessing the mmap'ed VA of this guest_memfd invokes
kvm_gmem_fault_user_mapping().
The lock sequence in this path is
rlock(&mm->mmap_lock) --> rlock(mapping.invalidate_lock#4)
Running 1 & 2 in the same process would trigger a circular locking warning:
[ 297.090165][ T3469] ======================================================
[ 297.099976][ T3469] WARNING: possible circular locking dependency detected
[ 297.109830][ T3469] 6.17.0-rc7-upstream+ #109 Tainted: G S
[ 297.119825][ T3469] ------------------------------------------------------
[ 297.129795][ T3469] tdx_vm_huge_pag/3469 is trying to acquire lock:
[ 297.139032][ T3469] ff110004a0625c70 (mapping.invalidate_lock#4){++++}-{4:4}, at: kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm]
[ 297.156463][ T3469]
[ 297.156463][ T3469] but task is already holding lock:
[ 297.169168][ T3469] ff110004db628d80 (&mm->mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x2d/0x520
[ 297.184330][ T3469]
[ 297.184330][ T3469] which lock already depends on the new lock.
[ 297.184330][ T3469]
[ 297.202954][ T3469]
[ 297.202954][ T3469] the existing dependency chain (in reverse order) is:
[ 297.217582][ T3469]
[ 297.217582][ T3469] -> #1 (&mm->mmap_lock){++++}-{4:4}:
[ 297.230618][ T3469] __lock_acquire+0x5ba/0xa20
[ 297.238730][ T3469] lock_acquire.part.0+0xb4/0x240
[ 297.247200][ T3469] lock_acquire+0x60/0x130
[ 297.254942][ T3469] gup_fast_fallback+0x1fb/0x390
[ 297.263269][ T3469] get_user_pages_fast+0x8f/0xd0
[ 297.271610][ T3469] tdx_gmem_post_populate+0x163/0x640 [kvm_intel]
[ 297.281603][ T3469] kvm_gmem_populate+0x53b/0x960 [kvm]
[ 297.290663][ T3469] tdx_vcpu_init_mem_region+0x33b/0x530 [kvm_intel]
[ 297.300978][ T3469] tdx_vcpu_unlocked_ioctl+0x16f/0x250 [kvm_intel]
[ 297.311245][ T3469] vt_vcpu_mem_enc_unlocked_ioctl+0x6b/0xa0 [kvm_intel]
[ 297.322045][ T3469] kvm_arch_vcpu_unlocked_ioctl+0x50/0x80 [kvm]
[ 297.332167][ T3469] kvm_vcpu_ioctl+0x27b/0xf30 [kvm]
[ 297.341084][ T3469] __x64_sys_ioctl+0x13c/0x1d0
[ 297.349416][ T3469] x64_sys_call+0x10ee/0x20d0
[ 297.357566][ T3469] do_syscall_64+0xc9/0x400
[ 297.365507][ T3469] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 297.375053][ T3469]
[ 297.375053][ T3469] -> #0 (mapping.invalidate_lock#4){++++}-{4:4}:
[ 297.389364][ T3469] check_prev_add+0x8b/0x4c0
[ 297.397442][ T3469] validate_chain+0x367/0x440
[ 297.405580][ T3469] __lock_acquire+0x5ba/0xa20
[ 297.413664][ T3469] lock_acquire.part.0+0xb4/0x240
[ 297.422123][ T3469] lock_acquire+0x60/0x130
[ 297.429836][ T3469] down_read+0x9f/0x540
[ 297.437187][ T3469] kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm]
[ 297.446895][ T3469] __do_fault+0xf8/0x690
[ 297.454304][ T3469] do_shared_fault+0x8a/0x3b0
[ 297.462205][ T3469] do_fault+0xf0/0xb80
[ 297.469355][ T3469] handle_pte_fault+0x499/0x9a0
[ 297.477294][ T3469] __handle_mm_fault+0x98d/0x1100
[ 297.485449][ T3469] handle_mm_fault+0x1e2/0x500
[ 297.493288][ T3469] do_user_addr_fault+0x4f3/0xf20
[ 297.501419][ T3469] exc_page_fault+0x5d/0xc0
[ 297.509027][ T3469] asm_exc_page_fault+0x27/0x30
[ 297.517003][ T3469]
[ 297.517003][ T3469] other info that might help us debug this:
[ 297.517003][ T3469]
[ 297.534317][ T3469] Possible unsafe locking scenario:
[ 297.534317][ T3469]
[ 297.546565][ T3469] CPU0 CPU1
[ 297.554486][ T3469] ---- ----
[ 297.562385][ T3469] rlock(&mm->mmap_lock);
[ 297.569203][ T3469] lock(mapping.invalidate_lock#4);
[ 297.579871][ T3469] lock(&mm->mmap_lock);
[ 297.589429][ T3469] rlock(mapping.invalidate_lock#4);
[ 297.597345][ T3469]
[ 297.597345][ T3469] *** DEADLOCK ***
[ 297.597345][ T3469]
[ 297.611988][ T3469] 1 lock held by tdx_vm_huge_pag/3469:
[ 297.619863][ T3469] #0: ff110004db628d80 (&mm->mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x2d/0x520
[ 297.634775][ T3469]
[ 297.634775][ T3469] stack backtrace:
[ 297.645161][ T3469] CPU: 7 UID: 0 PID: 3469 Comm: tdx_vm_huge_pag Tainted: G S 6.17.0-rc7-upstream+ #109 PREEMPT(voluntary) cdf4eff053c68cc34a4de47b373cdf3e020105d7
[ 297.645166][ T3469] Tainted: [S]=CPU_OUT_OF_SPEC
[ 297.645167][ T3469] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0101.D29.2303301937 03/30/2023
[ 297.645168][ T3469] Call Trace:
[ 297.645170][ T3469] <TASK>
[ 297.645171][ T3469] dump_stack_lvl+0x81/0xe0
[ 297.645176][ T3469] dump_stack+0x10/0x20
[ 297.645178][ T3469] print_circular_bug+0xf3/0x120
[ 297.645181][ T3469] check_noncircular+0x135/0x150
[ 297.645186][ T3469] check_prev_add+0x8b/0x4c0
[ 297.645189][ T3469] validate_chain+0x367/0x440
[ 297.645192][ T3469] __lock_acquire+0x5ba/0xa20
[ 297.645196][ T3469] lock_acquire.part.0+0xb4/0x240
[ 297.645198][ T3469] ? kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm 92b56a1aeace799385454e64f4d853f860f01956]
[ 297.645279][ T3469] lock_acquire+0x60/0x130
[ 297.645281][ T3469] ? kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm 92b56a1aeace799385454e64f4d853f860f01956]
[ 297.645360][ T3469] down_read+0x9f/0x540
[ 297.645363][ T3469] ? kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm 92b56a1aeace799385454e64f4d853f860f01956]
[ 297.645441][ T3469] ? __pfx_down_read+0x10/0x10
[ 297.645444][ T3469] ? __this_cpu_preempt_check+0x13/0x20
[ 297.645447][ T3469] kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm 92b56a1aeace799385454e64f4d853f860f01956]
[ 297.645527][ T3469] __do_fault+0xf8/0x690
[ 297.645530][ T3469] do_shared_fault+0x8a/0x3b0
[ 297.645532][ T3469] do_fault+0xf0/0xb80
[ 297.645534][ T3469] ? __this_cpu_preempt_check+0x13/0x20
[ 297.645537][ T3469] handle_pte_fault+0x499/0x9a0
[ 297.645541][ T3469] ? __pfx_handle_pte_fault+0x10/0x10
[ 297.645545][ T3469] __handle_mm_fault+0x98d/0x1100
[ 297.645547][ T3469] ? mt_find+0x3e3/0x5d0
[ 297.645552][ T3469] ? __pfx___handle_mm_fault+0x10/0x10
[ 297.645557][ T3469] ? __this_cpu_preempt_check+0x13/0x20
[ 297.645560][ T3469] handle_mm_fault+0x1e2/0x500
[ 297.645563][ T3469] ? __pfx_handle_mm_fault+0x10/0x10
[ 297.645566][ T3469] ? down_read_trylock+0x49/0x60
[ 297.645571][ T3469] do_user_addr_fault+0x4f3/0xf20
[ 297.645575][ T3469] exc_page_fault+0x5d/0xc0
[ 297.645577][ T3469] asm_exc_page_fault+0x27/0x30
[ 297.645579][ T3469] RIP: 0033:0x41fba0
[ 297.645581][ T3469] Code: f8 41 89 f0 48 8d 3c 17 48 89 c1 48 85 d2 74 2a 48 89 fa 48 29 c2 83 e2 01 74 0f 48 8d 48 01 40 88 71 ff 48 39 cf 74 13 66 90 <44> 88 01 48 83 c1 02 44 88 41 ff 48 39 cf 75 f0 c3 c3 66 66 2e 0f
[ 297.645583][ T3469] RSP: 002b:00007ffc8037f1c8 EFLAGS: 00010246
[ 297.645585][ T3469] RAX: 00007f604ee9d000 RBX: 00007f604ee906a8 RCX: 00007f604ee9d000
[ 297.645587][ T3469] RDX: 0000000000000000 RSI: 00000000000000aa RDI: 00007f604ee9e000
[ 297.645588][ T3469] RBP: 00007f604ee9d000 R08: 00000000000000aa R09: 0000000000426886
[ 297.645589][ T3469] R10: 0000000000000001 R11: 0000000000000246 R12: 000000003b5502a0
[ 297.645591][ T3469] R13: 0000000000001000 R14: 0000000000000200 R15: 00007f604eee4000
[ 297.645595][ T3469] </TASK>
More information about the Linuxppc-dev
mailing list