[PPC] Boot problems after the pci-v6.18-changes

Fri Oct 24 03:59:57 AEDT 2025

On Thu, Oct 23, 2025 at 11:19:47AM +0200, Herve Codina wrote:
> On Thu, 23 Oct 2025 14:19:46 +0530
> Manivannan Sadhasivam <mani at kernel.org> wrote:
> > On Thu, Oct 23, 2025 at 09:38:13AM +0200, Herve Codina wrote:
> > > On Wed, 15 Oct 2025 18:20:22 +0530
> > > Manivannan Sadhasivam <mani at kernel.org> wrote:
> > > > On Wed, Oct 15, 2025 at 01:58:11PM +0200, Herve Codina wrote:  
> > > > > On Wed, 15 Oct 2025 13:30:44 +0200
> > > > > Christian Zigotzky <chzigotzky at xenosoft.de> wrote:
> > > > > > > On 15 October 2025 at 10:39 am, Herve Codina <herve.codina at bootlin.com> wrote:
> > > > > > > I also observed issues with the commit f3ac2ff14834
> > > > > > > ("PCI/ASPM: Enable all ClockPM and ASPM states for
> > > > > > > devicetree platforms")      

> > > I did tests and here are the results:
> > > 
> > >   - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_ALL)
> > >     Issue not present
> > > 
> > >   - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)
> > >     Issue present, timings similar to timings already reported
> > >     (hundreds of ms).
> > > 
> > >   - quirk pci_disable_link_state(dev, PCIE_LINK_STATE_L0S);
> > >     Issue present, timings still incorrect but lower
> > >       64 bytes from 192.168.32.100: seq=10 ttl=64 time=16.738 ms
> > >       64 bytes from 192.168.32.100: seq=11 ttl=64 time=39.500 ms
> > >       64 bytes from 192.168.32.100: seq=12 ttl=64 time=62.178 ms
> > >       64 bytes from 192.168.32.100: seq=13 ttl=64 time=84.709 ms
> > >       64 bytes from 192.168.32.100: seq=14 ttl=64 time=107.484 ms
> > 
> > This is weird. Looks like all ASPM states (L0s, L1ss) are
> > contributing to the increased latency, which is more than what
> > should occur. This makes me ignore inspecting the L0s/L1 exit
> > latency fields :/
> > 
> > Bjorn sent out a patch [1] that enables only L0s and L1 by
> > default. But it might not help you. I don't honestly know how you
> > are seeing this much of the latency. This could the due to an
> > issue in the PCI component (host or endpoint), or even the board
> > routing. Identifying which one is causing the issue is going to be
> > tricky as it would require some experimentation.
> 
> I've just tested the patch from Bjorn and I confirm that it doesn't
> fix my issue.

You should be able to control ASPM at runtime via sysfs:

  What:           /sys/bus/pci/devices/.../link/clkpm
		  /sys/bus/pci/devices/.../link/l0s_aspm
		  /sys/bus/pci/devices/.../link/l1_aspm
		  /sys/bus/pci/devices/.../link/l1_1_aspm
		  /sys/bus/pci/devices/.../link/l1_2_aspm
		  /sys/bus/pci/devices/.../link/l1_1_pcipm
		  /sys/bus/pci/devices/.../link/l1_2_pcipm
  Date:           October 2019
  Contact:        Heiner Kallweit <hkallweit1 at gmail.com>
  Description:    If ASPM is supported for an endpoint, these files can be
		  used to disable or enable the individual power management
		  states. Write y/1/on to enable, n/0/off to disable.

I assume you're using CONFIG_PCIEASPM_DEFAULT=y, and if you're using
v6.18-rc1 plus the patch at [1], we should be enabling l0s_aspm and
l1_aspm at most.

If the sysfs knobs work correctly, maybe we can isolate the slowdown
to either L0s or L1?

[1] https://lore.kernel.org/linux-pci/20251020221217.1164153-1-helgaas@kernel.org/