[PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

Krishna Kumar krishnak at linux.ibm.com
Mon Jul 7 18:01:00 AEST 2025


Thanks all for the review and Thanks a bunch to Timothy for fixing the PE Freeze issue. The hotplug issues are like you fix N issue and N+1 th issue will come with new HW.

We had a meeting of around 1.5 -2.0 hr with demo, code review and log review and we decided to let these fixes go ahead. I am consolidating what we discussed -


1. Let these fixes go ahead as it solves wider and much needed customer issue and its urgently needed for time being. We have verified in live demo that nothing is broken from legacy flow and 

PE/PHB freeze, race condition, hung, oops etc has been solved correctly. Basically it fixes below issues -

root-port(phb) -> switch(having4 port)--> 2 nvme devices

1st case - only removal of EP-nvme device (surprise hotplug disables msi at root port), soft removal may work
2nd case  - If we remove switch itself (surprise hotplug disable msi at root port) .


Some explanation of problem -

PHB Freeze (Interrupt is not reaching there) - Fence - Need to reset ||
EP device on removal generated msi - goes to cpu (via root port and then apic mmio region to cpu ),
PHB/root port itself got freeze and cpu never get interrupt - No wq/event going to work - driver is noit working


One area what I thought to fix it with OPAL call is below piece of code-

ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+    if (ret) {
+        SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, possible frozen PHB", ret);
+        SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot activation\n");
+        for (i = 0; i < 3; i++) {
+            /* Slot activation failed, PHB may be fenced from a prior device failure
+             * Use the OPAL fundamental reset call to both try a device reset and clear
+             * any potentially active PHB fence / freeze
+             */
+            SLOT_WARN(php_slot, "Try %d...\n", i + 1);
+            pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset);
+            msleep(250);
+            pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset);
+
+            ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+            if (!ret)
+                break;
+        }


I would like to see this fix in skiboot, the opal call should reset it and it should work. Normally opal call is responsible for  link training and reset, so ideally it should  happens from there. As if now, Timothy has some explanation for it, so its fine for now. He can add his points for record.


2. In future we have decided to work on items like - removal of Work-queue with threaded irq, addition of dpc support in this driver.

3. We have also discussed if we want to move pciehpc.c driver in future, we have to keep below things in mind, Timothy can add some more points for record.

Device Node (DTB) & its relationship with slot, Can we decouple it and will pciehpc.c going to work correctly for this ?
Driver binding and unbinding based on hotplug event and its relationship with slot. Role of DTB in this. Our driver  also depends on OPAL call for link reset etc, can we decouple from it ? If we add some PPC specific code in pciehpc.c, how will it gets handled (by VFT/Function-Pointer or #ifdef or by seperate files ?)


Lukas has some points for above and I am somewhat aligned with below points, but it needs to be tested to see conceptually it fixes above issues, I am consolidating his points and he can add more  if needed-

Only the host bridge
has to be enumerated in the devicetree or DSDT.

pnv_php.c seems to search the devicetree for hotplug slots and
instantiates them.

We've traditionally dealt with such issues by inserting pcibios_*()
hooks in generic code, with a __weak implementation (which is usually
an empty stub) and a custom implementation in arch/powerpc/.

The linker then chooses the custom implementation over the __weak one.

You can find the existing hooks with:

git grep "__weak .*pcibios" -- drivers/pci
git grep pcibios -- arch/powerpc

An alternative method is to add a callback to struct pci_host_bridge.

pciehp is used on all kinds of arches, it's just an implementation of the PCIe Base Spec.



Best Regards,

Krishna




On 6/25/25 4:04 AM, Bjorn Helgaas wrote:
> On Wed, Jun 18, 2025 at 07:37:54PM -0500, Timothy Pearson wrote:
>> ----- Original Message -----
>>> From: "Bjorn Helgaas" <helgaas at kernel.org>
>>> To: "Timothy Pearson" <tpearson at raptorengineering.com>
>>> Cc: "linuxppc-dev" <linuxppc-dev at lists.ozlabs.org>, "linux-kernel" <linux-kernel at vger.kernel.org>, "linux-pci"
>>> <linux-pci at vger.kernel.org>, "Madhavan Srinivasan" <maddy at linux.ibm.com>, "Michael Ellerman" <mpe at ellerman.id.au>,
>>> "christophe leroy" <christophe.leroy at csgroup.eu>, "Naveen N Rao" <naveen at kernel.org>, "Bjorn Helgaas"
>>> <bhelgaas at google.com>, "Shawn Anastasio" <sanastasio at raptorengineering.com>
>>> Sent: Wednesday, June 18, 2025 2:01:46 PM
>>> Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
>>> On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote:
>>>>  state
>>> Weird wrapping of last word of subject to here.
>> I'll need to see what's up with my git format-patch setup. Apologies
>> for that across the multiple series.
> No worries.  If you can figure out how to make your mailer use the
> normal "On xxx, somebody wrote:" attribution instead of duplicating
> all those headers, that would be far more useful :)
>
>>>> +static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8
>>>> *state)
>>>> +{
>>>> +	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>>>> +	struct pci_dev *bridge = php_slot->pdev;
>>>> +	u16 status;
>>>> +
>>>> +	pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status);
>>>> +	*state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
>>> Should be able to do this with FIELD_GET().
>> I used the same overall structure as the pciehp_hpc driver here.  Do
>> you want me to also fix up that driver with FIELD_GET()?
> Nope, I think it's fine to keep this looking like pciehp for now.
> If somebody wants to use FIELD_GET() in pciehp, I'd probably be OK
> with that, but no need for you to open that can of worms.
>
>>> Is the PCI_EXP_SLTCTL_PIC part needed?  It wasn't there before, commit
>>> log doesn't mention it, and as far as I can tell, this would be the
>>> only driver to do that.  Most expose only the attention status (0=off,
>>> 1=on, 2=identify/blink).
>>>
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +
>>>>  static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *state)
>>>>  {
>>>>  	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
>>>>  
>>>> +	pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state);
>>> This is a change worth noting.  Previously we didn't read the AIC
>>> state from PCI_EXP_SLTCTL at all; we used php_slot->attention_state to
>>> keep track of whatever had been previously set via
>>> pnv_php_set_attention_state().
>>>
>>> Now we read the current state from PCI_EXP_SLTCTL.  It's not clear
>>> that php_slot->attention_state is still needed at all.
>> It probably isn't.  It's unclear why IBM took this path at all,
>> given pciehp's attention handlers predate pnv-php's by many years.
>>
>>> Previously, the user could write any value at all to the sysfs
>>> "attention" file and then read that same value back.  After this
>>> patch, the user can still write anything, but reads will only return
>>> values with PCI_EXP_SLTCTL_AIC and PCI_EXP_SLTCTL_PIC.
>>>
>>>>  	*state = php_slot->attention_state;
>>>>  	return 0;
>>>>  }
>>>> @@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct hotplug_slot
>>>> *slot, u8 state)
>>>>  	mask = PCI_EXP_SLTCTL_AIC;
>>>>  
>>>>  	if (state)
>>>> -		new = PCI_EXP_SLTCTL_ATTN_IND_ON;
>>>> +		new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state);
>>> This changes the behavior in some cases:
>>>
>>>  write 0: previously turned indicator off, now writes reserved value
>>>  write 2: previously turned indicator on, now sets to blink
>>>  write 3: previously turned indicator on, now turns it off
>> If we're looking at normalizing with pciehp with an eye toward
>> eventually deprecating / removing pnv-php, I can't think of a better
>> time to change this behavior.  I suspect we're the only major user
>> of this code path at the moment, with most software expecting to see
>> pciehp-style handling.  Thoughts?
> I'm OK with changing this, but I do think it would be worth calling
> out the different behavior in the commit log.
>
> Bjorn
>


More information about the Linuxppc-dev mailing list