[PPC] Boot problems after the pci-v6.18-changes

Herve Codina herve.codina at bootlin.com
Wed Oct 15 19:13:04 AEDT 2025


Hi All,

On Sat, 11 Oct 2025 08:11:42 -0700
Manivannan Sadhasivam <mani at kernel.org> wrote:

> On Sat, Oct 11, 2025 at 07:25:26AM +0200, Lukas Wunner wrote:
> > [cc += Mani]
> > 
> > On Sat, Oct 11, 2025 at 07:12:49AM +0200, Christian Zigotzky wrote:  
> > > On 09 October 2025 at 07:37 am, Lukas Wunner wrote:  
> > > > On Thu, Oct 09, 2025 at 06:54:58AM +0200, Christian Zigotzky wrote:  
> > > > > On 08 October 2025 at 09:51 pm, Bjorn Helgaas wrote:  
> > > > > > On Wed, Oct 08, 2025 at 06:35:42PM +0200, Christian Zigotzky wrote:  
> > > > > > > Our PPC boards [1] have boot problems since the pci-v6.18-changes. [2]
> > > > > > > 
> > > > > > > Without the pci-v6.18-changes, the PPC boards boot without any problems.
> > > > > > > 
> > > > > > > Boot log with error messages:
> > > > > > > https://github.com/user-attachments/files/22782016/Kernel_6.18_with_PCI_changes.log
> > > > > > > 
> > > > > > > Further information: https://github.com/chzigotzky/kernels/issues/17  
> > > > > > Do you happen to have a similar log from a recent working kernel,
> > > > > > e.g., v6.17, that we could compare with?  
> > > > > Thanks for your answer. Here is a similar log from the kernel 6.17.0:
> > > > > https://github.com/user-attachments/files/22789946/Kernel_6.17.0_Cyrus_Plus_board_P5040.log  
> > > > These lines are added in v6.18:
> > > > 
> > > >    pci 0000:01:00.0: ASPM: DT platform, enabling L0s-up L0s-dw L1 ASPM-L1.1 ASPM-L1.2 PCI-PM-L1.1 PCI-PM-L1.2
> > > >    pci 0000:01:00.0: ASPM: DT platform, enabling ClockPM
> > > >    pci 0001:01:00.0: ASPM: DT platform, enabling L0s-up L0s-dw L1 ASPM-L1.1 ASPM-L1.2 PCI-PM-L1.1 PCI-PM-L1.2
> > > >    pci 0001:01:00.0: ASPM: DT platform, enabling ClockPM
> > > >    pci 0001:03:00.0: ASPM: DT platform, enabling L0s-up L0s-dw L1 ASPM-L1.1 ASPM-L1.2 PCI-PM-L1.1 PCI-PM-L1.2
> > > >    pci 0001:03:00.0: ASPM: DT platform, enabling ClockPM
> > > > 
> > > > Possible candidate:
> > > > 
> > > > f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")  
> > > 
> > > After reverting the commit f3ac2ff14834, the kernel boots without any
> > > problems.
> > > 
> > > f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree
> > > platforms") is the bad commit.  
> > 
> > Hi Mani, your commit f3ac2ff14834 is causing a regression on certain
> > powerpc machines.  Any ideas?
> >   
> 
> Hi Lukas,
> 
> Thanks for looping me in. The referenced commit forcefully enables ASPM on all
> DT platforms as we decided to bite the bullet finally.
> 
> Looks like the device (0000:01:00.0) doesn't play nice with ASPM even though it
> advertises ASPM capability.
> 
> Christian, could you please test the below change and see if it fixes the issue?
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 214ed060ca1b..e006b0560b39 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2525,6 +2525,15 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
>   */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
>  
> +
> +static void quirk_disable_aspm_all(struct pci_dev *dev)
> +{
> +       pci_info(dev, "Disabling ASPM\n");
> +       pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
> +}
> +
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6738, quirk_disable_aspm_all);
> +
>  /*
>   * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain
>   * Link bit cleared after starting the link retrain process to allow this
> 
> 
> Going forward, we should be quirking the devices if they behave erratically.
> 
> - Mani
> 

I also observed issues with the commit f3ac2ff14834 ("PCI/ASPM: Enable all
ClockPM and ASPM states for devicetree platforms")

My system is an ARM board (Marvel Armada 3720 DDB)
  https://elixir.bootlin.com/linux/v6.17.1/source/arch/arm64/boot/dts/marvell/armada-3720-db.dts

I use an LAN966x PCI board
  https://elixir.bootlin.com/linux/v6.17.1/source/drivers/misc/lan966x_pci.c

Usually, when I did a ping using the PCI board, I have more or less the
following timings:
   # ping 192.168.32.100
   PING 192.168.32.100 (192.168.32.100): 56 data bytes
   64 bytes from 192.168.32.100: seq=0 ttl=64 time=3.328 ms
   64 bytes from 192.168.32.100: seq=1 ttl=64 time=2.636 ms
   64 bytes from 192.168.32.100: seq=2 ttl=64 time=2.928 ms
   64 bytes from 192.168.32.100: seq=3 ttl=64 time=2.649 ms

But with a vanilla v6.18-rc1 kernel, those timings become awful:
   # ping 192.168.32.100
   PING 192.168.32.100 (192.168.32.100): 56 data bytes
   64 bytes from 192.168.32.100: seq=0 ttl=64 time=656.634 ms
   64 bytes from 192.168.32.100: seq=1 ttl=64 time=551.812 ms
   64 bytes from 192.168.32.100: seq=2 ttl=64 time=702.966 ms
   64 bytes from 192.168.32.100: seq=3 ttl=64 time=725.904 ms

Reverting commit f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states
for devicetree platforms") fixes my timing issues.

Also tried the quirk proposed in this discussion (quirk_disable_aspm_all)
an the quirk also fixes the timing issue.

I used the same PCI board on an x86 system and no timing issues were
observed.

I am not sure the quirk_disable_aspm_all quirk is the solution. Indeed,
the issue could be at the PCIe controller level and not the PCIe device.

What should be the best solution ?
Is something missing on device-tree based systems to have the commit
f3ac2ff14834 applied without regressions ?

Best regards,
Hervé


More information about the Linuxppc-dev mailing list