[PATCH 3/3] KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code

Wed Aug 3 00:47:08 EST 2011

On 07/23/2011 09:42 AM, Paul Mackerras wrote:
> With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
> core), whenever a CPU goes idle, we have to pull all the other
> hardware threads in the core out of the guest, because the H_CEDE
> hcall is handled in the kernel.  This is inefficient.
>
> This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
> in real mode.  When a guest vcpu does an H_CEDE hcall, we now only
> exit to the kernel if all the other vcpus in the same core are also
> idle.  Otherwise we mark this vcpu as napping, save state that could
> be lost in nap mode (mainly GPRs and FPRs), and execute the nap
> instruction.  When the thread wakes up, because of a decrementer or
> external interrupt, we come back in at kvm_start_guest (from the
> system reset interrupt vector), find the `napping' flag set in the
> paca, and go to the resume path.
>
> This has some other ramifications.  First, when starting a core, we
> now start all the threads, both those that are immediately runnable and
> those that are idle.  This is so that we don't have to pull all the
> threads out of the guest when an idle thread gets a decrementer interrupt
> and wants to start running.  In fact the idle threads will all start
> with the H_CEDE hcall returning; being idle they will just do another
> H_CEDE immediately and go to nap mode.
>
> This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
> These functions have been restructured to make them simpler and clearer.
> We introduce a level of indirection in the wait queue that gets woken
> when external and decrementer interrupts get generated for a vcpu, so
> that we can have the 4 vcpus in a vcore using the same wait queue.
> We need this because the 4 vcpus are being handled by one thread.
>
> Secondly, when we need to exit from the guest to the kernel, we now
> have to generate an IPI for any napping threads, because an HDEC
> interrupt doesn't wake up a napping thread.
>
> Thirdly, we now need to be able to handle virtual external interrupts
> and decrementer interrupts becoming pending while a thread is napping,
> and deliver those interrupts to the guest when the thread wakes.
> This is done in kvmppc_cede_reentry, just before fast_guest_return.
>
> Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
> and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
> from kvm_arch_vcpu_runnable.
>
> Signed-off-by: Paul Mackerras<paulus at samba.org>
> ---
>   arch/powerpc/include/asm/kvm_book3s_asm.h |    1 +
>   arch/powerpc/include/asm/kvm_host.h       |   19 ++-
>   arch/powerpc/kernel/asm-offsets.c         |    6 +
>   arch/powerpc/kvm/book3s_hv.c              |  335 ++++++++++++++++-------------
>   arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  297 ++++++++++++++++++++++---
>   arch/powerpc/kvm/powerpc.c                |   21 +-
>   6 files changed, 483 insertions(+), 196 deletions(-)
>
>

[...]

> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index a107c9b..cd0e3e5 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -39,12 +39,8 @@
>
>   int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>   {
> -#ifndef CONFIG_KVM_BOOK3S_64_HV
>   	return !(v->arch.shared->msr&  MSR_WE) ||
>   	       !!(v->arch.pending_exceptions);
> -#else
> -	return !(v->arch.ceded) || !!(v->arch.pending_exceptions);
> -#endif
>   }
>
>   int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
> @@ -258,6 +254,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>   {
>   	struct kvm_vcpu *vcpu;
>   	vcpu = kvmppc_core_vcpu_create(kvm, id);
> +	vcpu->arch.wqp =&vcpu->wq;
>   	if (!IS_ERR(vcpu))
>   		kvmppc_create_vcpu_debugfs(vcpu, id);
>   	return vcpu;
> @@ -289,8 +286,8 @@ static void kvmppc_decrementer_func(unsigned long data)
>
>   	kvmppc_core_queue_dec(vcpu);
>
> -	if (waitqueue_active(&vcpu->wq)) {
> -		wake_up_interruptible(&vcpu->wq);
> +	if (waitqueue_active(vcpu->arch.wqp)) {
> +		wake_up_interruptible(vcpu->arch.wqp);
>   		vcpu->stat.halt_wakeup++;
>   	}
>   }
> @@ -543,13 +540,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>
>   int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
>   {
> -	if (irq->irq == KVM_INTERRUPT_UNSET)
> +	if (irq->irq == KVM_INTERRUPT_UNSET) {
>   		kvmppc_core_dequeue_external(vcpu, irq);
> -	else
> -		kvmppc_core_queue_external(vcpu, irq);
> +		return 0;
> +	}

Not sure I understand this part. Mind to explain?

Alex

> +
> +	kvmppc_core_queue_external(vcpu, irq);
>
> -	if (waitqueue_active(&vcpu->wq)) {
> -		wake_up_interruptible(&vcpu->wq);
> +	if (waitqueue_active(vcpu->arch.wqp)) {
> +		wake_up_interruptible(vcpu->arch.wqp);
>   		vcpu->stat.halt_wakeup++;
>   	} else if (vcpu->cpu != -1) {
>   		smp_send_reschedule(vcpu->cpu);