[PPC] Boot problems after the pci-v6.18-changes
Manivannan Sadhasivam
mani at kernel.org
Thu Oct 23 19:49:46 AEDT 2025
On Thu, Oct 23, 2025 at 09:38:13AM +0200, Herve Codina wrote:
> Hi Manivannan,
>
> On Wed, 15 Oct 2025 18:20:22 +0530
> Manivannan Sadhasivam <mani at kernel.org> wrote:
>
> > Hi Herve,
> >
> > On Wed, Oct 15, 2025 at 01:58:11PM +0200, Herve Codina wrote:
> > > Hi Christian,
> > >
> > > On Wed, 15 Oct 2025 13:30:44 +0200
> > > Christian Zigotzky <chzigotzky at xenosoft.de> wrote:
> > >
> > > > Hello Herve,
> > > >
> > > > > On 15 October 2025 at 10:39 am, Herve Codina <herve.codina at bootlin.com> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I also observed issues with the commit f3ac2ff14834 ("PCI/ASPM: Enable all
> > > > > ClockPM and ASPM states for devicetree platforms")
> > > >
> > > > Thanks for reporting.
> > > >
> > > > >
> > > > > Also tried the quirk proposed in this discussion (quirk_disable_aspm_all)
> > > > > an the quirk also fixes the timing issue.
> > > >
> > > > Where have you added quirk_disable_aspm_all?
> > >
> > > --- 8< ---
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 214ed060ca1b..a3808ab6e92e 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -2525,6 +2525,17 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> > > */
> > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
> > >
> > > +static void quirk_disable_aspm_all(struct pci_dev *dev)
> > > +{
> > > + pci_info(dev, "Disabling ASPM\n");
> > > + pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
> >
> > Could you please try disabling L1SS and L0s separately to see which one is
> > causing the issue? Like,
> >
> > pci_disable_link_state(dev, PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2);
> >
> > pci_disable_link_state(dev, PCIE_LINK_STATE_L0S);
> >
>
> I did tests and here are the results:
>
> - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_ALL)
> Issue not present
>
> - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)
> Issue present, timings similar to timings already reported
> (hundreds of ms).
>
> - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_L0S);
> Issue present, timings still incorrect but lower
> 64 bytes from 192.168.32.100: seq=10 ttl=64 time=16.738 ms
> 64 bytes from 192.168.32.100: seq=11 ttl=64 time=39.500 ms
> 64 bytes from 192.168.32.100: seq=12 ttl=64 time=62.178 ms
> 64 bytes from 192.168.32.100: seq=13 ttl=64 time=84.709 ms
> 64 bytes from 192.168.32.100: seq=14 ttl=64 time=107.484 ms
>
This is weird. Looks like all ASPM states (L0s, L1ss) are contributing to the
increased latency, which is more than what should occur. This makes me ignore
inspecting the L0s/L1 exit latency fields :/
Bjorn sent out a patch [1] that enables only L0s and L1 by default. But it
might not help you. I don't honestly know how you are seeing this much of the
latency. This could the due to an issue in the PCI component (host or endpoint),
or even the board routing. Identifying which one is causing the issue is going
to be tricky as it would require some experimentation.
If you are motivated, we can start to isolate this issue to the endpoint first.
Is it possible for you to connect a different PCI card to your host and check
whether you are seeing the increased latency? If the different PCI card is not
exhibiting the same behavior, then the current device is the culprit and we
should be able to quirk it.
- Mani
[1] https://lore.kernel.org/linux-pci/20251020221217.1164153-1-helgaas@kernel.org/
--
மணிவண்ணன் சதாசிவம்
More information about the Linuxppc-dev
mailing list