[PATCH v3 04/25] KVM: x86/mmu: Add dedicated API to map guest_memfd pfn into TDP MMU

Yan Zhao yan.y.zhao at intel.com
Thu Oct 30 19:34:06 AEDT 2025


On Wed, Oct 22, 2025 at 12:53:53PM +0800, Yan Zhao wrote:
> On Thu, Oct 16, 2025 at 05:32:22PM -0700, Sean Christopherson wrote:
> > Link: https://lore.kernel.org/all/20250709232103.zwmufocd3l7sqk7y@amd.com
> 
> Hi Sean,                                                                         
> 
> Will you post [1] to fix the AB-BA deadlock issue for huge page in-place
> conversion as well?
> 
> Without it, the "WARNING: possible circular locking dependency detected" would
> still appear due to
> 
> - lock(mapping.invalidate_lock#4) --> lock(&mm->mmap_lock)
>   for init mem on non-in-place-conversion guest_memfd
> - rlock(&mm->mmap_lock) --> rlock(mapping.invalidate_lock#4)
>   for faulting shared pages on in-place-convertion guest_memfd
> 
> [1] https://lore.kernel.org/all/aHEwT4X0RcfZzHlt@google.com/
[2] https://lore.kernel.org/all/cover.1760731772.git.ackerleytng@google.com/

Note: [1] is still required even with [2].

Consider the following scenario (assuming vm_memory_attributes=Y):

1. Create a TDX VM with non-in-place-conversion guest_memfd.

   In the init mem path, the lock sequence is
   lock(mapping.invalidate_lock#4) --> lock(&mm->mmap_lock)

2. Create a normal VM with in-place-conversion guest_memfd, with guest_memfd
   memory defaulting to shared by specifying flags
   GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED.
   (Since kvm_arch_supports_gmem_init_shared() returns true for normal VMs due
    to kvm->arch.has_private_mem == false, GUEST_MEMFD_FLAG_INIT_SHARED is a
    valid flag).

   Accessing the mmap'ed VA of this guest_memfd invokes
   kvm_gmem_fault_user_mapping().
   
   The lock sequence in this path is
   rlock(&mm->mmap_lock) --> rlock(mapping.invalidate_lock#4)

Running 1 & 2 in the same process would trigger a circular locking warning:

[  297.090165][ T3469] ======================================================
[  297.099976][ T3469] WARNING: possible circular locking dependency detected
[  297.109830][ T3469] 6.17.0-rc7-upstream+ #109 Tainted: G S
[  297.119825][ T3469] ------------------------------------------------------
[  297.129795][ T3469] tdx_vm_huge_pag/3469 is trying to acquire lock:
[  297.139032][ T3469] ff110004a0625c70 (mapping.invalidate_lock#4){++++}-{4:4}, at: kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm]
[  297.156463][ T3469]
[  297.156463][ T3469] but task is already holding lock:
[  297.169168][ T3469] ff110004db628d80 (&mm->mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x2d/0x520
[  297.184330][ T3469]
[  297.184330][ T3469] which lock already depends on the new lock.
[  297.184330][ T3469]
[  297.202954][ T3469]
[  297.202954][ T3469] the existing dependency chain (in reverse order) is:
[  297.217582][ T3469]
[  297.217582][ T3469] -> #1 (&mm->mmap_lock){++++}-{4:4}:
[  297.230618][ T3469]        __lock_acquire+0x5ba/0xa20
[  297.238730][ T3469]        lock_acquire.part.0+0xb4/0x240
[  297.247200][ T3469]        lock_acquire+0x60/0x130
[  297.254942][ T3469]        gup_fast_fallback+0x1fb/0x390
[  297.263269][ T3469]        get_user_pages_fast+0x8f/0xd0
[  297.271610][ T3469]        tdx_gmem_post_populate+0x163/0x640 [kvm_intel]
[  297.281603][ T3469]        kvm_gmem_populate+0x53b/0x960 [kvm]
[  297.290663][ T3469]        tdx_vcpu_init_mem_region+0x33b/0x530 [kvm_intel]
[  297.300978][ T3469]        tdx_vcpu_unlocked_ioctl+0x16f/0x250 [kvm_intel]
[  297.311245][ T3469]        vt_vcpu_mem_enc_unlocked_ioctl+0x6b/0xa0 [kvm_intel]
[  297.322045][ T3469]        kvm_arch_vcpu_unlocked_ioctl+0x50/0x80 [kvm]
[  297.332167][ T3469]        kvm_vcpu_ioctl+0x27b/0xf30 [kvm]
[  297.341084][ T3469]        __x64_sys_ioctl+0x13c/0x1d0
[  297.349416][ T3469]        x64_sys_call+0x10ee/0x20d0
[  297.357566][ T3469]        do_syscall_64+0xc9/0x400
[  297.365507][ T3469]        entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  297.375053][ T3469]
[  297.375053][ T3469] -> #0 (mapping.invalidate_lock#4){++++}-{4:4}:
[  297.389364][ T3469]        check_prev_add+0x8b/0x4c0
[  297.397442][ T3469]        validate_chain+0x367/0x440
[  297.405580][ T3469]        __lock_acquire+0x5ba/0xa20
[  297.413664][ T3469]        lock_acquire.part.0+0xb4/0x240
[  297.422123][ T3469]        lock_acquire+0x60/0x130
[  297.429836][ T3469]        down_read+0x9f/0x540
[  297.437187][ T3469]        kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm]
[  297.446895][ T3469]        __do_fault+0xf8/0x690
[  297.454304][ T3469]        do_shared_fault+0x8a/0x3b0
[  297.462205][ T3469]        do_fault+0xf0/0xb80
[  297.469355][ T3469]        handle_pte_fault+0x499/0x9a0
[  297.477294][ T3469]        __handle_mm_fault+0x98d/0x1100
[  297.485449][ T3469]        handle_mm_fault+0x1e2/0x500
[  297.493288][ T3469]        do_user_addr_fault+0x4f3/0xf20
[  297.501419][ T3469]        exc_page_fault+0x5d/0xc0
[  297.509027][ T3469]        asm_exc_page_fault+0x27/0x30
[  297.517003][ T3469]
[  297.517003][ T3469] other info that might help us debug this:
[  297.517003][ T3469]
[  297.534317][ T3469]  Possible unsafe locking scenario:
[  297.534317][ T3469]
[  297.546565][ T3469]        CPU0                    CPU1
[  297.554486][ T3469]        ----                    ----
[  297.562385][ T3469]   rlock(&mm->mmap_lock);
[  297.569203][ T3469]                                lock(mapping.invalidate_lock#4);
[  297.579871][ T3469]                                lock(&mm->mmap_lock);
[  297.589429][ T3469]   rlock(mapping.invalidate_lock#4);
[  297.597345][ T3469]
[  297.597345][ T3469]  *** DEADLOCK ***
[  297.597345][ T3469]
[  297.611988][ T3469] 1 lock held by tdx_vm_huge_pag/3469:
[  297.619863][ T3469]  #0: ff110004db628d80 (&mm->mmap_lock){++++}-{4:4}, at: lock_mm_and_find_vma+0x2d/0x520
[  297.634775][ T3469]
[  297.634775][ T3469] stack backtrace:
[  297.645161][ T3469] CPU: 7 UID: 0 PID: 3469 Comm: tdx_vm_huge_pag Tainted: G S                  6.17.0-rc7-upstream+ #109 PREEMPT(voluntary)  cdf4eff053c68cc34a4de47b373cdf3e020105d7
[  297.645166][ T3469] Tainted: [S]=CPU_OUT_OF_SPEC
[  297.645167][ T3469] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.SYS.0101.D29.2303301937 03/30/2023
[  297.645168][ T3469] Call Trace:
[  297.645170][ T3469]  <TASK>
[  297.645171][ T3469]  dump_stack_lvl+0x81/0xe0
[  297.645176][ T3469]  dump_stack+0x10/0x20
[  297.645178][ T3469]  print_circular_bug+0xf3/0x120
[  297.645181][ T3469]  check_noncircular+0x135/0x150
[  297.645186][ T3469]  check_prev_add+0x8b/0x4c0
[  297.645189][ T3469]  validate_chain+0x367/0x440
[  297.645192][ T3469]  __lock_acquire+0x5ba/0xa20
[  297.645196][ T3469]  lock_acquire.part.0+0xb4/0x240
[  297.645198][ T3469]  ? kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm 92b56a1aeace799385454e64f4d853f860f01956]
[  297.645279][ T3469]  lock_acquire+0x60/0x130
[  297.645281][ T3469]  ? kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm 92b56a1aeace799385454e64f4d853f860f01956]
[  297.645360][ T3469]  down_read+0x9f/0x540
[  297.645363][ T3469]  ? kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm 92b56a1aeace799385454e64f4d853f860f01956]
[  297.645441][ T3469]  ? __pfx_down_read+0x10/0x10
[  297.645444][ T3469]  ? __this_cpu_preempt_check+0x13/0x20
[  297.645447][ T3469]  kvm_gmem_fault_user_mapping+0xfc/0x4c0 [kvm 92b56a1aeace799385454e64f4d853f860f01956]
[  297.645527][ T3469]  __do_fault+0xf8/0x690
[  297.645530][ T3469]  do_shared_fault+0x8a/0x3b0
[  297.645532][ T3469]  do_fault+0xf0/0xb80
[  297.645534][ T3469]  ? __this_cpu_preempt_check+0x13/0x20
[  297.645537][ T3469]  handle_pte_fault+0x499/0x9a0
[  297.645541][ T3469]  ? __pfx_handle_pte_fault+0x10/0x10
[  297.645545][ T3469]  __handle_mm_fault+0x98d/0x1100
[  297.645547][ T3469]  ? mt_find+0x3e3/0x5d0
[  297.645552][ T3469]  ? __pfx___handle_mm_fault+0x10/0x10
[  297.645557][ T3469]  ? __this_cpu_preempt_check+0x13/0x20
[  297.645560][ T3469]  handle_mm_fault+0x1e2/0x500
[  297.645563][ T3469]  ? __pfx_handle_mm_fault+0x10/0x10
[  297.645566][ T3469]  ? down_read_trylock+0x49/0x60
[  297.645571][ T3469]  do_user_addr_fault+0x4f3/0xf20
[  297.645575][ T3469]  exc_page_fault+0x5d/0xc0
[  297.645577][ T3469]  asm_exc_page_fault+0x27/0x30
[  297.645579][ T3469] RIP: 0033:0x41fba0
[  297.645581][ T3469] Code: f8 41 89 f0 48 8d 3c 17 48 89 c1 48 85 d2 74 2a 48 89 fa 48 29 c2 83 e2 01 74 0f 48 8d 48 01 40 88 71 ff 48 39 cf 74 13 66 90 <44> 88 01 48 83 c1 02 44 88 41 ff 48 39 cf 75 f0 c3 c3 66 66 2e 0f
[  297.645583][ T3469] RSP: 002b:00007ffc8037f1c8 EFLAGS: 00010246
[  297.645585][ T3469] RAX: 00007f604ee9d000 RBX: 00007f604ee906a8 RCX: 00007f604ee9d000
[  297.645587][ T3469] RDX: 0000000000000000 RSI: 00000000000000aa RDI: 00007f604ee9e000
[  297.645588][ T3469] RBP: 00007f604ee9d000 R08: 00000000000000aa R09: 0000000000426886
[  297.645589][ T3469] R10: 0000000000000001 R11: 0000000000000246 R12: 000000003b5502a0
[  297.645591][ T3469] R13: 0000000000001000 R14: 0000000000000200 R15: 00007f604eee4000
[  297.645595][ T3469]  </TASK>



More information about the Linuxppc-dev mailing list