// a kdump hang caused by PPC pci patch series

Thu Nov 24 19:44:50 AEDT 2022

On 11/24/22 09:31, Pingfan Liu wrote:
> On Mon, Nov 21, 2022 at 8:57 PM Cédric Le Goater <clg at kaod.org> wrote:
>>
>> On 11/21/22 12:57, Pingfan Liu wrote:
>>> Sorry that forget a subject.
>>>
>>> On Mon, Nov 21, 2022 at 7:54 PM Pingfan Liu <kernelfans at gmail.com> wrote:
>>>>
>>>> Hello Powerpc folks,
>>>>
>>>> I encounter an kdump bug, which I bisect and pin commit 174db9e7f775
>>>> ("powerpc/pseries/pci: Add support of MSI domains to PHB hotplug")
>>>> In that case, using Fedora 36 as host, the mentioned commit as the
>>>> guest kernel, and virto-block disk, the kdump kernel will hang:
>>
>> The host kernel should be using the PowerNV platform and not pseries
>> or are you running a nested L2 guest on KVM/pseries L1 ?
>>
>> And as far as I remember, the patch above only impacts the IBM PowerVM
>> hypervisor, not KVM, and PHB hotplug, or kdump induces some hot-plugging
>> I am not aware of.
>>
>> Also, if indeed, this is a L2 guest, the XIVE interrupt controller is
>> emulated in QEMU, "info pic" should return:
>>
>>     ...
>>     irqchip: emulated
>>
>>>>
>>>> [    0.000000] Kernel command line: elfcorehdr=0x22c00000
>>>> no_timer_check net.ifnames=0 console=tty0 console=hvc0,115200n8
>>>> irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory
>>>>        numa=off udev.children-max=2 ehea.use_mcs=0 panic=10
>>>> kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd
>>>> hugetlb_cma=0
>>>>       ...
>>>>       [    7.763260] virtio_blk virtio2: 32/0/0 default/read/poll queues
>>>>       [    7.771391] virtio_blk virtio2: [vda] 20971520 512-byte logical
>>>> blocks (10.7 GB/10.0 GiB)
>>>>       [   68.398234] systemd-udevd[187]: virtio2: Worker [190]
>>>> processing SEQNUM=1193 is taking a long time
>>>>       [  188.398258] systemd-udevd[187]: virtio2: Worker [190]
>>>> processing SEQNUM=1193 killed
>>>>
>>>>
>>>> During my test, I found that in very rare cases, the kdump can success
>>>> (I guess it may be due to the cpu id).  And if using either maxcpus=2
>>>> or using scsi-disk, then kdump can also success.  And before the
>>>> mentioned commit, kdump can also success.
>>>>
>>>> The attachment contains the xml to reproduce that bug.
>>>>
>>>> Do you have any ideas?
>>
>> Most certainly an interrupt not being delivered. You can check the status
>> on the host with :
>>
>>     virsh qemu-monitor-command --hmp <domain>  "info pic"
>>
> 
> Please pick it up from the attachment.

Nothing wrong on the guest side. No pending interrupts. Not before or
after kdump. Next step is to look at KVM. I suggest you file a bug.

Thanks,

C.