[PATCH v3 1/6] PCI: pnv_php: Properly clean up allocated IRQs on unplug

Timothy Pearson tpearson at raptorengineering.com
Wed Sep 10 00:21:27 AEST 2025



----- Original Message -----
> From: "Jiri Slaby" <jirislaby at kernel.org>
> To: "Timothy Pearson" <tpearson at raptorengineering.com>
> Cc: "linuxppc-dev" <linuxppc-dev at lists.ozlabs.org>, "linux-kernel" <linux-kernel at vger.kernel.org>, "linux-pci"
> <linux-pci at vger.kernel.org>, "Madhavan Srinivasan" <maddy at linux.ibm.com>, "Michael Ellerman" <mpe at ellerman.id.au>,
> "christophe leroy" <christophe.leroy at csgroup.eu>, "Naveen N Rao" <naveen at kernel.org>, "Bjorn Helgaas"
> <bhelgaas at google.com>, "Shawn Anastasio" <sanastasio at raptorengineering.com>
> Sent: Tuesday, September 9, 2025 4:00:41 AM
> Subject: Re: [PATCH v3 1/6] PCI: pnv_php: Properly clean up allocated IRQs on unplug

> On 15. 07. 25, 23:36, Timothy Pearson wrote:
>> In cases where the root of a nested PCIe bridge configuration is
>> unplugged, the pnv_php driver would leak the allocated IRQ resources for
>> the child bridges' hotplug event notifications, resulting in a panic.
>> Fix this by walking all child buses and deallocating all it's IRQ
>> resources before calling pci_hp_remove_devices.
>> 
>> Also modify the lifetime of the workqueue at struct pnv_php_slot::wq so
>> that it is only destroyed in pnv_php_free_slot, instead of
>> pnv_php_disable_irq. This is required since pnv_php_disable_irq will now
>> be called by workers triggered by hot unplug interrupts, so the
>> workqueue needs to stay allocated.
>> 
>> The abridged kernel panic that occurs without this patch is as follows:
>> 
>>    WARNING: CPU: 0 PID: 687 at kernel/irq/msi.c:292
>>    msi_device_data_release+0x6c/0x9c
>>    CPU: 0 UID: 0 PID: 687 Comm: bash Not tainted 6.14.0-rc5+ #2
>>    Call Trace:
>>     msi_device_data_release+0x34/0x9c (unreliable)
>>     release_nodes+0x64/0x13c
>>     devres_release_all+0xc0/0x140
>>     device_del+0x2d4/0x46c
>>     pci_destroy_dev+0x5c/0x194
>>     pci_hp_remove_devices+0x90/0x128
>>     pci_hp_remove_devices+0x44/0x128
>>     pnv_php_disable_slot+0x54/0xd4
>>     power_write_file+0xf8/0x18c
>>     pci_slot_attr_store+0x40/0x5c
>>     sysfs_kf_write+0x64/0x78
>>     kernfs_fop_write_iter+0x1b0/0x290
>>     vfs_write+0x3bc/0x50c
>>     ksys_write+0x84/0x140
>>     system_call_exception+0x124/0x230
>>     system_call_vectored_common+0x15c/0x2ec
>> 
>> Signed-off-by: Shawn Anastasio <sanastasio at raptorengineering.com>
>> Signed-off-by: Timothy Pearson <tpearson at raptorengineering.com>
>> ---
>>   drivers/pci/hotplug/pnv_php.c | 94 ++++++++++++++++++++++++++++-------
>>   1 file changed, 75 insertions(+), 19 deletions(-)
>> 
>> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
>> index 573a41869c15..aec0a6d594ac 100644
>> --- a/drivers/pci/hotplug/pnv_php.c
>> +++ b/drivers/pci/hotplug/pnv_php.c
> ...
>> @@ -647,6 +702,15 @@ static struct pnv_php_slot *pnv_php_alloc_slot(struct
>> device_node *dn)
> 
> This is preceded by:
>         php_slot = kzalloc(sizeof(*php_slot), GFP_KERNEL);
> 
> Ie. php_slot is zeroed.
> 
>>   		return NULL;
>>   	}
>>   
>> +	/* Allocate workqueue for this slot's interrupt handling */
>> +	php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name);
>> +	if (!php_slot->wq) {
>> +		SLOT_WARN(php_slot, "Cannot alloc workqueue\n");
> 
> I believe this introduced a (unlikely) NULL ptr dereference.
> 
>> +		kfree(php_slot->name);
>> +		kfree(php_slot);
>> +		return NULL;
>> +	}
>> +
>>   	if (dn->child && PCI_DN(dn->child))
>>   		php_slot->slot_no = PCI_SLOT(PCI_DN(dn->child)->devfn);
>>   	else
> 
> This continues:
>         php_slot->pdev                  = bus->self;
>         php_slot->bus                   = bus;
> 
> 
> And SLOT_WARN() is defined as:
> #define SLOT_WARN(sl, x...) \
>         ((sl)->pdev ? pci_warn((sl)->pdev, x) :
> dev_warn(&(sl)->bus->dev, x))
> 
> The else branch is alkays taken in the 'if' above, which still
> dereferences NULLed (sl)->bus here.
> 
>> @@ -843,14 +907,6 @@ static void pnv_php_init_irq(struct pnv_php_slot *php_slot,
>> int irq)
>>   	u16 sts, ctrl;
>>   	int ret;
>>   
>> -	/* Allocate workqueue */
>> -	php_slot->wq = alloc_workqueue("pciehp-%s", 0, 0, php_slot->name);
>> -	if (!php_slot->wq) {
>> -		SLOT_WARN(php_slot, "Cannot alloc workqueue\n");
> 
> Here, php_slot used to have both ->pdev and ->bus assigned at this point.
> 
>> -		pnv_php_disable_irq(php_slot, true);
>> -		return;
>> -	}
>> -
> 
> Right?

That does look like an unlikely but definitely possible dereference -- good catch!

I can submit a patch to change

SLOT_WARN(php_slot, "Cannot alloc workqueue\n");

to

dev_warn(bus->dev, "Cannot alloc workqueue\n");

Would that work?


More information about the Linuxppc-dev mailing list