[PATCH] arch/powerpc/kvm: kvmppc_core_vcpu_create_hv: check for kzalloc failure

Thu Mar 30 15:20:46 AEDT 2023

On 2023-03-30 10:59:19, Michael Ellerman wrote:
> Kautuk Consul <kconsul at linux.vnet.ibm.com> writes:
> > On 2023-03-28 23:02:09, Michael Ellerman wrote:
> >> Kautuk Consul <kconsul at linux.vnet.ibm.com> writes:
> >> > On 2023-03-28 15:44:02, Kautuk Consul wrote:
> >> >> On 2023-03-28 20:44:48, Michael Ellerman wrote:
> >> >> > Kautuk Consul <kconsul at linux.vnet.ibm.com> writes:
> >> >> > > kvmppc_vcore_create() might not be able to allocate memory through
> >> >> > > kzalloc. In that case the kvm->arch.online_vcores shouldn't be
> >> >> > > incremented.
> >> >> > 
> >> >> > I agree that looks wrong.
> >> >> > 
> >> >> > Have you tried to test what goes wrong if it fails? It looks like it
> >> >> > will break the LPCR update, which likely will cause the guest to crash
> >> >> > horribly.
> >> > Also, are you referring to the code in kvmppc_update_lpcr()?
> >> > That code will not crash as it checks for the vc before trying to
> >> > dereference it.
> >> 
> >> Yeah that's what I was looking at. I didn't mean it would crash, but
> >> that it would bail out early when it sees a NULL vcore, leaving other
> >> vcores with the wrong LPCR value.
> >> 
> >> But as you say it doesn't happen because qemu quits on the first ENOMEM.
> >> 
> >> And regardless if qemu does something that means the guest is broken
> >> that's just a qemu bug, no big deal as far as the kernel is concerned.
> 
> > But there could be another user-mode application other than qemu that
> > actually tries to create a vcpu after it gets a -ENOMEM for another
> > vcpu. Shouldn't the kernel be independent of qemu?
> 
> Yes, the kernel is independent of qemu.
> 
> On P8 we had kvmtool, and there's several other VMMs these days, though
> most don't support Power.
> 
> I didn't mean qemu specifically above. If any VMM continues blindly
> after getting ENOMEM back from the KVM API then that's a bug in that
> VMM.
> 
> >> > But the following 2 places that utilize the arch.online_vcores will have
> >> > problems in logic if the usermode test-case doesn't pull down the
> >> > kvm context after the -ENOMEM vcpu allocation failure:
> >> > book3s_hv.c:3030:       if (!kvm->arch.online_vcores) {
> >> > book3s_hv_rm_mmu.c:44:  if (kvm->arch.online_vcores == 1 && local_paca->kvm_hstate.kvm_vcpu)
> >> 
> >> OK. Both of those look harmless to the host.
> 
> > Harmless to the host in terms of a crash, not in terms of behavior.
> > For example in the case of kvmhv_set_smt_mode:
> > If we got a kzalloc failure once (and online_vcores was wrongly incremented), 
> > then if kvmhv_set_smt_mode() is called after that then it would be
> > not be setting the arch.smt_mode and arch.emul_smt_mode correctly and it
> > would be wrongly returning with -EBUSY instead of 0.
> 
> But again that bug only affects that VM, which the VMM should have
> terminated when it got the ENOMEM back.
> 
> It's definitely a bug that we increment online_vcores incorrectly, but
> it only affects that VM, and a correctly operating VMM will terminate
> the VM anyway because of the ENOMEM.
Okay, I understand. I used to earlier try to contribute to other
mailing lists and they were very particular about correcting code
that was doing something wrong (just by code review) irrespective of whether
it would actually result in a bug/crash or misbehaviour. I guess maintainers
look at the generic part of the kernel in a different way than arch or
device specific kernel/driver code.
> 
> cheers