[PPC] Boot problems after the pci-v6.18-changes
Hongxing Zhu
hongxing.zhu at nxp.com
Thu Nov 6 19:48:16 AEDT 2025
> -----Original Message-----
> From: Bjorn Helgaas <helgaas at kernel.org>
> Sent: 2025年11月6日 6:09
> To: Christian Zigotzky <chzigotzky at xenosoft.de>
> Cc: Manivannan Sadhasivam <mani at kernel.org>; Bjorn Helgaas
> <bhelgaas at google.com>; linux-pci at vger.kernel.org; mad skateman
> <madskateman at gmail.com>; R.T.Dickinson <rtd2 at xtra.co.nz>; Christian
> Zigotzky <info at xenosoft.de>; linuxppc-dev <linuxppc-dev at lists.ozlabs.org>;
> hypexed at yahoo.com.au; Darren Stevens <darren at stevens-zone.net>;
> debian-powerpc at lists.debian.org; John Paul Adrian Glaubitz
> <glaubitz at physik.fu-berlin.de>; Lukas Wunner <lukas at wunner.de>;
> regressions at lists.linux.dev; luigi burdo <intermediadc at hotmail.com>; Al
> <al at datazap.net>; Roland <rol7and at gmx.com>
> Subject: Re: [PPC] Boot problems after the pci-v6.18-changes
>
> [+cc luigi, Al, Roland; we broke X5000 in v6.18-rc1 by enabling ASPM too
> aggressively; Christian's original report is at:
> https://lore.ke/
> rnel.org%2Fr%2Fdb5c95a1-cf3e-46f9-8045-a1b04908051a%40xenosoft.de&da
> ta=05%7C02%7Chongxing.zhu%40nxp.com%7C6aa6d9be1c7c41d19f5908de1c
> b80084%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63897977378
> 4461344%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOi
> IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C%7C%7C&sdata=Nirv5c1ySjVdMa9HjkOlBwwfK7LT2xkCGS%2BScoT%2FrJg
> %3D&reserved=0
> I grubbed around in lore for other X5000 users who might possibly be able
> to help test our attempts to fix it]
>
> On Mon, Nov 03, 2025 at 07:28:19PM +0100, Christian Zigotzky wrote:
> > On 11/01/2025 06:06 PM, Manivannan Sadhasivam wrote:
> > ...
>
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > > d97335a40193..74d8596b3f62 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -2524,6 +2524,7 @@ static void quirk_disable_aspm_l0s_l1(struct
> > > pci_dev
> > *dev)
> > > * disable both L0s and L1 for now to be safe.
> > > */
> > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080,
> > quirk_disable_aspm_l0s_l1);
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_FREESCALE, 0x0451,
> > quirk_disable_aspm_l0s_l1);
> > >
> > > /*
> > > * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe
> > >Retrain
>
> > I tested your patch with the RC4 of kernel 6.18 today. Unfortunately
> > it doesn't solve the boot issue.
>
> Thanks for testing that. I see now why that approach doesn't work:
> quirk_disable_aspm_l0s_l1() calls pci_disable_link_state(), which updates
> the permissible ASPM link states, but pci_disable_link_state() only works for
> devices at the downstream end of a link. It doesn't work at all for Root
> Ports, which are at the upstream end of a link.
>
> Christian, you originally reported that both X5000 and X1000 were broken.
> I suspect X1000 may have been fixed in v6.18-rc3 by df5192d9bb0e
> ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms"), but I would
> love to have confirmation of that.
>
> The patches below read the PCIe Link Capabilities that tell us what ASPM link
> states the hardware supports, cache it, and use a quirk to remove the states
> we think are broken.
>
> Since these are just for testing, I didn't want to start a new thread.
> If anybody is able to test this, you should be able to just apply this whole
> email as a patch on top of v6.18-rc4.
Hi Bjorn:
I tested these patches on i.MX95 EVK board with NVME storage device. Because
that i.MX95 PCIe RC failed enter into L2 when one NVME device is connected to
the port if ASPM L1 is enabled in default.
These patches work as expected, the l0s and l1 can be disabled after adding
the following quirk.
"DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PHILIPS, PCI_ANY_ID, quirk_disable_aspm_l0s_l1_cap);"
Best Regards
Richard Zhu
>
> Would appreciate the complete dmesg log of the boot if possible. This
> patch adds a quirk for the [1957:0451] Root Ports in Christian's X5000. If
> ASPM is still broken for other Root Ports, we will need the Vendor and
> Device IDs to add quirks for them.
>
> commit 9d089f88fd66 ("PCI/ASPM: Cache Link Capabilities to enable quirks
> to override")
> Author: Bjorn Helgaas <bhelgaas at google.com>
> Date: Wed Nov 5 12:23:38 2025 -0600
>
> PCI/ASPM: Cache Link Capabilities to enable quirks to override
>
> Cache the PCIe Link Capabilities register in struct pci_dev so quirks can
> remove features to avoid hardware defects. The idea is:
>
> - set_pcie_port_type() reads PCIe Link Capabilities and caches it in
> dev->lnkcap
>
> - EARLY or HEADER quirks can update the cached dev->lnkcap to
> remove
> advertised features that don't work correctly
>
> - pcie_aspm_cap_init() relies on dev->lnkcap and ignores any features
> not
> advertised there
>
> Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
>
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index
> 79b965158473..c2ecfa4e92e5 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -378,15 +378,13 @@ static void pcie_set_clkpm(struct pcie_link_state
> *link, int enable) static void pcie_clkpm_cap_init(struct pcie_link_state
> *link, int blacklist) {
> int capable = 1, enabled = 1;
> - u32 reg32;
> u16 reg16;
> struct pci_dev *child;
> struct pci_bus *linkbus = link->pdev->subordinate;
>
> /* All functions should have the same cap and state, take the worst */
> list_for_each_entry(child, &linkbus->devices, bus_list) {
> - pcie_capability_read_dword(child, PCI_EXP_LNKCAP, ®32);
> - if (!(reg32 & PCI_EXP_LNKCAP_CLKPM)) {
> + if (!(child->lnkcap & PCI_EXP_LNKCAP_CLKPM)) {
> capable = 0;
> enabled = 0;
> break;
> @@ -567,7 +565,7 @@ static void encode_l12_threshold(u32 threshold_us,
> u32 *scale, u32 *value)
>
> static void pcie_aspm_check_latency(struct pci_dev *endpoint) {
> - u32 latency, encoding, lnkcap_up, lnkcap_dw;
> + u32 latency, encoding;
> u32 l1_switch_latency = 0, latency_up_l0s;
> u32 latency_up_l1, latency_dw_l0s, latency_dw_l1;
> u32 acceptable_l0s, acceptable_l1;
> @@ -592,14 +590,10 @@ static void pcie_aspm_check_latency(struct
> pci_dev *endpoint)
> struct pci_dev *dev = pci_function_0(link->pdev->subordinate);
>
> /* Read direction exit latencies */
> - pcie_capability_read_dword(link->pdev, PCI_EXP_LNKCAP,
> - &lnkcap_up);
> - pcie_capability_read_dword(dev, PCI_EXP_LNKCAP,
> - &lnkcap_dw);
> - latency_up_l0s = calc_l0s_latency(lnkcap_up);
> - latency_up_l1 = calc_l1_latency(lnkcap_up);
> - latency_dw_l0s = calc_l0s_latency(lnkcap_dw);
> - latency_dw_l1 = calc_l1_latency(lnkcap_dw);
> + latency_up_l0s = calc_l0s_latency(link->pdev->lnkcap);
> + latency_up_l1 = calc_l1_latency(link->pdev->lnkcap);
> + latency_dw_l0s = calc_l0s_latency(dev->lnkcap);
> + latency_dw_l1 = calc_l1_latency(dev->lnkcap);
>
> /* Check upstream direction L0s latency */
> if ((link->aspm_capable & PCIE_LINK_STATE_L0S_UP) && @@ -814,7
> +808,7 @@ static void pcie_aspm_override_default_link_state(struct
> pcie_link_state *link) static void pcie_aspm_cap_init(struct pcie_link_state
> *link, int blacklist) {
> struct pci_dev *child = link->downstream, *parent = link->pdev;
> - u32 parent_lnkcap, child_lnkcap;
> + u32 lnkcap;
> u16 parent_lnkctl, child_lnkctl;
> struct pci_bus *linkbus = parent->subordinate;
>
> @@ -829,9 +823,7 @@ static void pcie_aspm_cap_init(struct pcie_link_state
> *link, int blacklist)
> * If ASPM not supported, don't mess with the clocks and link,
> * bail out now.
> */
> - pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, &parent_lnkcap);
> - pcie_capability_read_dword(child, PCI_EXP_LNKCAP, &child_lnkcap);
> - if (!(parent_lnkcap & child_lnkcap & PCI_EXP_LNKCAP_ASPMS))
> + if (!(parent->lnkcap & child->lnkcap & PCI_EXP_LNKCAP_ASPMS))
> return;
>
> /* Configure common clock before checking latencies */ @@ -841,10
> +833,18 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int
> blacklist)
> * Re-read upstream/downstream components' register state after
> * clock configuration. L0s & L1 exit latencies in the otherwise
> * read-only Link Capabilities may change depending on common clock
> - * configuration (PCIe r5.0, sec 7.5.3.6).
> + * configuration (PCIe r5.0, sec 7.5.3.6). Update only the exit
> + * latencies in the cached dev->lnkcap because quirks may have
> + * removed broken features advertised by the device.
> */
> - pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, &parent_lnkcap);
> - pcie_capability_read_dword(child, PCI_EXP_LNKCAP, &child_lnkcap);
> + pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, &lnkcap);
> + parent->lnkcap &= ~(PCI_EXP_LNKCAP_L0SEL | PCI_EXP_LNKCAP_L1EL);
> + parent->lnkcap |= lnkcap & (PCI_EXP_LNKCAP_L0SEL |
> +PCI_EXP_LNKCAP_L1EL);
> +
> + pcie_capability_read_dword(child, PCI_EXP_LNKCAP, &lnkcap);
> + child->lnkcap &= ~(PCI_EXP_LNKCAP_L0SEL | PCI_EXP_LNKCAP_L1EL);
> + child->lnkcap |= lnkcap & (PCI_EXP_LNKCAP_L0SEL |
> +PCI_EXP_LNKCAP_L1EL);
> +
> pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &parent_lnkctl);
> pcie_capability_read_word(child, PCI_EXP_LNKCTL, &child_lnkctl);
>
> @@ -864,7 +864,7 @@ static void pcie_aspm_cap_init(struct pcie_link_state
> *link, int blacklist)
> * given link unless components on both sides of the link each
> * support L0s.
> */
> - if (parent_lnkcap & child_lnkcap & PCI_EXP_LNKCAP_ASPM_L0S)
> + if (parent->lnkcap & child->lnkcap & PCI_EXP_LNKCAP_ASPM_L0S)
> link->aspm_support |= PCIE_LINK_STATE_L0S;
>
> if (child_lnkctl & PCI_EXP_LNKCTL_ASPM_L0S) @@ -873,7 +873,7 @@
> static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
> link->aspm_enabled |= PCIE_LINK_STATE_L0S_DW;
>
> /* Setup L1 state */
> - if (parent_lnkcap & child_lnkcap & PCI_EXP_LNKCAP_ASPM_L1)
> + if (parent->lnkcap & child->lnkcap & PCI_EXP_LNKCAP_ASPM_L1)
> link->aspm_support |= PCIE_LINK_STATE_L1;
>
> if (parent_lnkctl & child_lnkctl & PCI_EXP_LNKCTL_ASPM_L1) diff --git
> a/drivers/pci/probe.c b/drivers/pci/probe.c index
> 0ce98e18b5a8..7dab61347244 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1633,7 +1633,6 @@ void set_pcie_port_type(struct pci_dev *pdev) {
> int pos;
> u16 reg16;
> - u32 reg32;
> int type;
> struct pci_dev *parent;
>
> @@ -1652,8 +1651,8 @@ void set_pcie_port_type(struct pci_dev *pdev)
> pci_read_config_dword(pdev, pos + PCI_EXP_DEVCAP, &pdev->devcap);
> pdev->pcie_mpss = FIELD_GET(PCI_EXP_DEVCAP_PAYLOAD,
> pdev->devcap);
>
> - pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, ®32);
> - if (reg32 & PCI_EXP_LNKCAP_DLLLARC)
> + pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &pdev->lnkcap);
> + if (pdev->lnkcap & PCI_EXP_LNKCAP_DLLLARC)
> pdev->link_active_reporting = 1;
>
> parent = pci_upstream_bridge(pdev);
> diff --git a/include/linux/pci.h b/include/linux/pci.h index
> d1fdf81fbe1e..ec4133ae9cae 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -361,6 +361,7 @@ struct pci_dev {
> struct pci_dev *rcec; /* Associated RCEC device */
> #endif
> u32 devcap; /* PCIe Device Capabilities */
> + u32 lnkcap; /* PCIe Link Capabilities */
> u16 rebar_cap; /* Resizable BAR capability offset */
> u8 pcie_cap; /* PCIe capability offset */
> u8 msi_cap; /* MSI capability offset */
> commit ad3106e7d1bf ("PCI/ASPM: Add pcie_aspm_disable_cap() to remove
> advertised features")
> Author: Bjorn Helgaas <bhelgaas at google.com>
> Date: Wed Nov 5 09:11:21 2025 -0600
>
> PCI/ASPM: Add pcie_aspm_disable_cap() to remove advertised features
>
> Add pcie_aspm_disable_cap(), which can be used by quirks to remove
> advertised features that don't work correctly.
>
> Removing advertised features prevents aspm.c from enabling them,
> even if
> users try to enable them via sysfs or by building the kernel with
> CONFIG_PCIEASPM_POWERSAVE or
> CONFIG_PCIEASPM_POWER_SUPERSAVE.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
>
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index
> 4492b809094b..858d2d6c9304 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -958,6 +958,7 @@ void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>
> #ifdef CONFIG_PCIEASPM
> +void pcie_aspm_disable_cap(struct pci_dev *pdev, int state);
> void pcie_aspm_init_link_state(struct pci_dev *pdev); void
> pcie_aspm_exit_link_state(struct pci_dev *pdev); void
> pcie_aspm_pm_state_change(struct pci_dev *pdev, bool locked); @@ -965,6
> +966,7 @@ void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> void pci_configure_ltr(struct pci_dev *pdev); void
> pci_bridge_reconfigure_ltr(struct pci_dev *pdev); #else
> +static inline void pcie_aspm_disable_cap(struct pci_dev *pdev, int
> +state) { }
> static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { } static
> inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { } static inline
> void pcie_aspm_pm_state_change(struct pci_dev *pdev, bool locked) { } diff
> --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index
> c2ecfa4e92e5..4e9e3503c569 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1530,6 +1530,18 @@ int pci_enable_link_state_locked(struct pci_dev
> *pdev, int state) } EXPORT_SYMBOL(pci_enable_link_state_locked);
>
> +void pcie_aspm_disable_cap(struct pci_dev *pdev, int state) {
> + if (state & PCIE_LINK_STATE_L0S)
> + pdev->lnkcap &= ~PCI_EXP_LNKCAP_ASPM_L0S;
> + if (state & PCIE_LINK_STATE_L1)
> + pdev->lnkcap &= ~PCI_EXP_LNKCAP_ASPM_L1;
> + if (state & (PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1))
> + pci_info(pdev, "ASPM:%s%s removed from Link Capabilities to work
> around device defect\n",
> + state & PCIE_LINK_STATE_L0S ? " L0s" : "",
> + state & PCIE_LINK_STATE_L1 ? " L1" : ""); }
> +
> static int pcie_aspm_set_policy(const char *val,
> const struct kernel_param *kp)
> {
> commit 69184484d69e ("PCI/ASPM: Prevent use of ASPM on Freescale Root
> Ports")
> Author: Bjorn Helgaas <bhelgaas at google.com>
> Date: Wed Nov 5 12:18:07 2025 -0600
>
> PCI/ASPM: Prevent use of ASPM on Freescale Root Ports
>
> Christian reported that f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM
> and
> ASPM states for devicetree platforms") broke booting on the A-EON
> X5000 and
> AmigaOne X1000.
>
> Fixes: f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for
> devicetree platforms")
> Fixes: df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree
> platforms"
> )
> Reported-by: Christian Zigotzky <chzigotzky at xenosoft.de>
> Link:
> https://lore.ke/
> rnel.org%2Fr%2Fdb5c95a1-cf3e-46f9-8045-a1b04908051a%40xenosoft.de&da
> ta=05%7C02%7Chongxing.zhu%40nxp.com%7C6aa6d9be1c7c41d19f5908de1c
> b80084%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63897977378
> 4486225%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOi
> IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C%7C%7C&sdata=FMTDk9MiwZbvwhB3opSyoWVI1BeIhiy2tc07UO0rYL4%3
> D&reserved=0
> Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
>
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> 214ed060ca1b..8cf182c28d2b 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2525,6 +2525,16 @@ static void quirk_disable_aspm_l0s_l1(struct
> pci_dev *dev)
> */
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080,
> quirk_disable_aspm_l0s_l1);
>
> +/*
> + * Remove ASPM L0s and L1 support from cached copy of Link Capabilities
> +so
> + * aspm.c won't try to enable them.
> + */
> +static void quirk_disable_aspm_l0s_l1_cap(struct pci_dev *dev) {
> + pcie_aspm_disable_cap(dev, PCIE_LINK_STATE_L0S |
> PCIE_LINK_STATE_L1);
> +} DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, 0x0451,
> +quirk_disable_aspm_l0s_l1_cap);
> +
> /*
> * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe
> Retrain
> * Link bit cleared after starting the link retrain process to allow this
More information about the Linuxppc-dev
mailing list