// a kdump hang caused by PPC pci patch series

Tue Nov 22 14:29:41 AEDT 2022

Hi Gedric,

Appreciate your insight. Please see the comment inline below.

On Mon, Nov 21, 2022 at 8:57 PM Cédric Le Goater <clg at kaod.org> wrote:
>
> On 11/21/22 12:57, Pingfan Liu wrote:
> > Sorry that forget a subject.
> >
> > On Mon, Nov 21, 2022 at 7:54 PM Pingfan Liu <kernelfans at gmail.com> wrote:
> >>
> >> Hello Powerpc folks,
> >>
> >> I encounter an kdump bug, which I bisect and pin commit 174db9e7f775
> >> ("powerpc/pseries/pci: Add support of MSI domains to PHB hotplug")
> >> In that case, using Fedora 36 as host, the mentioned commit as the
> >> guest kernel, and virto-block disk, the kdump kernel will hang:
>
> The host kernel should be using the PowerNV platform and not pseries
> or are you running a nested L2 guest on KVM/pseries L1 ?
>

Host kernel ran on P9 bare metal. And here PowerKVM is used.

> And as far as I remember, the patch above only impacts the IBM PowerVM
> hypervisor, not KVM, and PHB hotplug, or kdump induces some hot-plugging
> I am not aware of.
>

Sorry that my information is not clear.
The suspect series is "[PATCH 00/31] powerpc: Modernize the PCI/MSI
support", and in the main line, beginning from commit 786e5b102a00
("powerpc/pseries/pci: Introduce __find_pe_total_msi()").

I tried to bisect, and the commit a5f3d2c17b07 ("powerpc/pseries/pci:
Add MSI domains") even hangs the first kernel. So I went ahead to find
the next functional change on pseries, which is commit 174db9e7f775
("powerpc/pseries/pci: Add support of MSI domains to PHB hotplug").

> Also, if indeed, this is a L2 guest, the XIVE interrupt controller is
> emulated in QEMU, "info pic" should return:
>
>    ...
>    irqchip: emulated
>
> >>
> >> [    0.000000] Kernel command line: elfcorehdr=0x22c00000
> >> no_timer_check net.ifnames=0 console=tty0 console=hvc0,115200n8
> >> irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory
> >>       numa=off udev.children-max=2 ehea.use_mcs=0 panic=10
> >> kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd
> >> hugetlb_cma=0
> >>      ...
> >>      [    7.763260] virtio_blk virtio2: 32/0/0 default/read/poll queues
> >>      [    7.771391] virtio_blk virtio2: [vda] 20971520 512-byte logical
> >> blocks (10.7 GB/10.0 GiB)
> >>      [   68.398234] systemd-udevd[187]: virtio2: Worker [190]
> >> processing SEQNUM=1193 is taking a long time
> >>      [  188.398258] systemd-udevd[187]: virtio2: Worker [190]
> >> processing SEQNUM=1193 killed
> >>
> >>
> >> During my test, I found that in very rare cases, the kdump can success
> >> (I guess it may be due to the cpu id).  And if using either maxcpus=2
> >> or using scsi-disk, then kdump can also success.  And before the
> >> mentioned commit, kdump can also success.
> >>
> >> The attachment contains the xml to reproduce that bug.
> >>
> >> Do you have any ideas?
>
> Most certainly an interrupt not being delivered. You can check the status
> on the host with :
>
>    virsh qemu-monitor-command --hmp <domain>  "info pic"
>

OK, I will try to occupy a P9 machine and have a shot. I will update
the info later.

Thanks,

Pingfa
>
>
> Thanks,
>
> C.