<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Nov 13, 2019, at 10:46 AM, Greg Kurz <<a href="mailto:groug@kaod.org" class="">groug@kaod.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">The EQ page is allocated by the guest and then passed to the hypervisor<br class="">with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page<br class="">before handing it over to the HW. This reference is dropped either when<br class="">the guest issues the H_INT_RESET hcall or when the KVM device is released.<br class="">But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times,<br class="">either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot).<br class="">In both cases the existing EQ page reference is leaked because we simply<br class="">overwrite it in the XIVE queue structure without calling put_page().<br class=""><br class="">This is especially visible when the guest memory is backed with huge pages:<br class="">start a VM up to the guest userspace, either reboot it or unplug a vCPU,<br class="">quit QEMU. The leak is observed by comparing the value of HugePages_Free in<br class="">/proc/meminfo before and after the VM is run.<br class=""><br class="">Ideally we'd want the XIVE code to handle the EQ page de-allocation at the<br class="">platform level. This isn't the case right now because the various XIVE<br class="">drivers have different allocation needs. It could maybe worth introducing<br class="">hooks for this purpose instead of exposing XIVE internals to the drivers,<br class="">but this is certainly a huge work to be done later.<br class=""><br class="">In the meantime, for easier backport, fix both vCPU unplug and guest reboot<br class="">leaks by introducing a wrapper around xive_native_configure_queue() that<br class="">does the necessary cleanup.<br class=""><br class="">Reported-by: Satheesh Rajendran <<a href="mailto:sathnaga@linux.vnet.ibm.com" class="">sathnaga@linux.vnet.ibm.com</a>><br class="">Cc: <a href="mailto:stable@vger.kernel.org" class="">stable@vger.kernel.org</a> # v5.2<br class="">Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")<br class="">Signed-off-by: Cédric Le Goater <<a href="mailto:clg@kaod.org" class="">clg@kaod.org</a>><br class="">Signed-off-by: Greg Kurz <<a href="mailto:groug@kaod.org" class="">groug@kaod.org</a>><br class=""></div></div></blockquote><div><br class=""></div><div>Tested-by: Lijun Pan <<a href="mailto:ljp@linux.ibm" class="">ljp@linux.ibm</a>.com></div><br class=""><blockquote type="cite" class=""><div class=""><div class="">---<br class="">v2: use wrapper as suggested by Cedric<br class="">---<br class=""> arch/powerpc/kvm/book3s_xive_native.c |   31 ++++++++++++++++++++++---------<br class=""> 1 file changed, 22 insertions(+), 9 deletions(-)<br class=""><br class="">diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c<br class="">index 34bd123fa024..0e1fc5a16729 100644<br class="">--- a/arch/powerpc/kvm/book3s_xive_native.c<br class="">+++ b/arch/powerpc/kvm/book3s_xive_native.c<br class="">@@ -50,6 +50,24 @@ static void kvmppc_xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio)<br class=""> <span class="Apple-tab-span" style="white-space:pre"> </span>}<br class=""> }<br class=""><br class="">+static int kvmppc_xive_native_configure_queue(u32 vp_id, struct xive_q *q,<br class="">+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>      u8 prio, __be32 *qpage,<br class="">+<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>      u32 order, bool can_escalate)<br class="">+{<br class="">+<span class="Apple-tab-span" style="white-space:pre">   </span>int rc;<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span>__be32 *qpage_prev = q->qpage;<br class="">+<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span>rc = xive_native_configure_queue(vp_id, q, prio, qpage, order,<br class="">+<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> can_escalate);<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span>if (rc)<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>return rc;<br class="">+<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span>if (qpage_prev)<br class="">+<span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>put_page(virt_to_page(qpage_prev));<br class="">+<br class="">+<span class="Apple-tab-span" style="white-space:pre">     </span>return rc;<br class="">+}<br class="">+<br class=""> void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)<br class=""> {<br class=""> <span class="Apple-tab-span" style="white-space:pre">       </span>struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;<br class="">@@ -575,19 +593,14 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,<br class=""> <span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>q->guest_qaddr  = 0;<br class=""> <span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>q->guest_qshift = 0;<br class=""><br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span>rc = xive_native_configure_queue(xc->vp_id, q, priority,<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> NULL, 0, true);<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span><span class="Apple-tab-span" style="white-space:pre">    </span>rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>NULL, 0, true);<br class=""> <span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (rc) {<br class=""> <span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>pr_err("Failed to reset queue %d for VCPU %d: %d\n",<br class=""> <span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>       priority, xc->server_num, rc);<br class=""> <span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>return rc;<br class=""> <span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class=""><br class="">-<span class="Apple-tab-span" style="white-space:pre">        </span><span class="Apple-tab-span" style="white-space:pre">    </span>if (q->qpage) {<br class="">-<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>put_page(virt_to_page(q->qpage));<br class="">-<span class="Apple-tab-span" style="white-space:pre">  </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>q->qpage = NULL;<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span>}<br class="">-<br class=""> <span class="Apple-tab-span" style="white-space:pre">       </span><span class="Apple-tab-span" style="white-space:pre">    </span>return 0;<br class=""> <span class="Apple-tab-span" style="white-space:pre">     </span>}<br class=""><br class="">@@ -646,8 +659,8 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,<br class=""> <span class="Apple-tab-span" style="white-space:pre">       </span>  * OPAL level because the use of END ESBs is not supported by<br class=""> <span class="Apple-tab-span" style="white-space:pre">   </span>  * Linux.<br class=""> <span class="Apple-tab-span" style="white-space:pre">       </span>  */<br class="">-<span class="Apple-tab-span" style="white-space:pre">     </span>rc = xive_native_configure_queue(xc->vp_id, q, priority,<br class="">-<span class="Apple-tab-span" style="white-space:pre">   </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span> (__be32 *) qaddr, kvm_eq.qshift, true);<br class="">+<span class="Apple-tab-span" style="white-space:pre">      </span>rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,<br class="">+<span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>(__be32 *) qaddr, kvm_eq.qshift, true);<br class=""> <span class="Apple-tab-span" style="white-space:pre">       </span>if (rc) {<br class=""> <span class="Apple-tab-span" style="white-space:pre">     </span><span class="Apple-tab-span" style="white-space:pre">    </span>pr_err("Failed to configure queue %d for VCPU %d: %d\n",<br class=""> <span class="Apple-tab-span" style="white-space:pre">    </span><span class="Apple-tab-span" style="white-space:pre">    </span>       priority, xc->server_num, rc);<br class=""><br class=""></div></div></blockquote></div><br class=""></body></html>