[PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

Tue Apr 14 02:35:28 AEST 2026

Hi Brian,

On Wed, Apr 08, 2026 at 06:58:34PM -0700, Brian Norris wrote:
> Hi Richard and Mani,
> 
> For the record, I've been using a form of an earlier version of this
> patchset in my environment for some time now, and I've run across
> problems that *might* relate to what Richard is reporting, but I'm not
> quite sure at the moment. Details below.
> 
> On Wed, Mar 25, 2026 at 07:06:49AM +0000, Hongxing Zhu wrote:
> > Hi Mani:
> > I've accidentally encountered a new issue based on the reset root port patch-set.
> > After performing a few hot-reset operations, the PCIe link enters a continuous up/down cycling pattern.
> > 
> > I found that calling pci_reset_secondary_bus() first in pcibios_reset_secondary_bus() appears to resolve this issue.
> > Have you experienced a similar problem?
> > 
> > "
> > ...
> > [  141.897701] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> > [  142.086341] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> > [  142.092038] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000c00) link down detected
> > ...
> > "
> > 
> > Platform: i.MX95 EVK board plus local Root Ports reset supports based on the #1 and #2 patches of v7 patch-set.
> > Notes of the logs:
> > - One Gen3 NVME device is connected.
> > - "./memtool 4c341058=0;./memtool 4c341058=1;" is used to toggle the LTSSM_EN bit to trigger the link down.
> > - Toggle BIT6 of Bridge Control Register to trigger hot reset by "./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff;"
> > - The Root Port reset patches works correctly at first.
> > However, after several hot-reset triggers, the link enters a repeated down/up cycling state.
> > 
> > Logs:
> > [    3.553188] imx6q-pcie 4c300000.pcie: host bridge /soc/pcie at 4c300000 ranges:
> > [    3.560308] imx6q-pcie 4c300000.pcie:       IO 0x006ff00000..0x006fffffff -> 0x0000000000
> > [    3.568525] imx6q-pcie 4c300000.pcie:      MEM 0x0910000000..0x091fffffff -> 0x0010000000
> > [    3.577314] imx6q-pcie 4c300000.pcie: config reg[1] 0x60100000 == cpu 0x60100000
> > [    3.796029] imx6q-pcie 4c300000.pcie: iATU: unroll T, 128 ob, 128 ib, align 4K, limit 1024G
> > [    4.003746] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> > [    4.009553] imx6q-pcie 4c300000.pcie: PCI host bridge to bus 0000:00
> > root at imx95evk:~#
> > root at imx95evk:~#
> > root at imx95evk:~# ./memtool 4c341058=0;./memtool 4c341058=1; Writing 32-bit value 0x0 to address 0x4C341058
> > Writing 32-bit v
> > [   87.265348] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d01) link down detected
> > alue 0x1 to adder
> > [   87.273106] imx6q-pcie 4c300000.pcie: Stop root bus and handle link down
> > ss 0x4C341058
> > [   87.281264] pcieport 0000:00:00.0: Recovering Root Port due to Link Down
> > [   87.289245] pci 0000:01:00.0: AER: can't recover (no error_detected callback)
> > root at imx95evk:~# [   87.514216] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) link up detected
> > [   87.702968] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> > [   87.834983] pcieport 0000:00:00.0: Root Port has been reset
> > [   87.840714] pcieport 0000:00:00.0: AER: device recovery failed
> > [   87.846592] imx6q-pcie 4c300000.pcie: Rescan bus after link up is detected
> > [   87.855947] pcieport 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> 
> I've seen this same line ("bridge configuration invalid") before, and I
> believe that's because the saved state (pci_save_state(); more about
> this below) is invalid -- it contains 0 values in places where they
> should be non-zero. So when those values are restored
> (pci_restore_state()), we get confused.
> 
> I believe we've pinned down one reason this invalid state occurs -- it's
> because of an automatic (mis)feature in the DesignWare PCIe hardware.
> Specifically, it's because of what the controller does during a surprise
> link-down error.
> 
> From the Designware docs:
> 
>   "[...] during normal operation, the link might fail and go down. After
>   this link-down event, the controller requests the DWC_pcie_clkrst.v
>   module to hot-reset the controller. There is no difference in the
>   handling of a link-down reset or a hot reset; the controller asserts
>   the link_req_rst_not output requesting the DWC_pcie_clkrst.v module to
>   reset the controller."
> 
> In some of the adjacent documentation (and confirmed in local testing),
> it suggests that this automatic reset will also reset various DBI (i.e.,
> PCIe config space) registers. It also seems as if there's not really a
> good way to completely stop this automatic reset -- the docs mention
> some SW methods prevent the reset, but they all seem racy or incomplete.
> 
> Anyway, I think this implies that patch 1 is somewhat wrong [1]. It
> includes some code like this:
> 
> 		pci_save_state(dev);
> 		ret = host->reset_root_port(host, dev);
> 		if (ret)
> 			pci_err(dev, "Failed to reset Root Port: %d\n", ret);
> 		else
> 			/* Now restore it on success */
> 			pci_restore_state(dev);
> 
> That first line (pci_save_state()) is prone to saving invalid state,
> depending on whether the link-down event has finished flushing and
> resetting the controller yet or not. The resulting impact is a bit hard
> to judge, depending on what (mis)configuration you end up with.
> 

Thanks a lot for your investigation. I think your observation makes sense and
could be the culprit in saving the corrupted state. Even on non-DWC controllers,
there is no guarantee that the Root Port config registers state will be
preserved after LDn (before Root Port reset).

> I also noticed commit a2f1e22390ac ("PCI/ERR: Ensure error
> recoverability at all times") was merged recently. With that change, I
> believe it is now safe to perform pci_restore_state() even without
> pci_save_state() here.
> 
> So ... can we remove pci_save_state() from
> pcibios_reset_secondary_bus()? Might that help?

I think so. I will also test it locally and report back soon.

> It sounds like my above
> observations *may* match Richard's reports, but I'm not sure. And
> anyway, the documented hardware behavior is racy, so it's hard to
> propose a foolproof solution.
> 

@Richard: Can you confirm if removing 'pci_save_state(dev);' from
pcibios_reset_secondary_bus() fixes your issue?

- Mani

-- 
மணிவண்ணன் சதாசிவம்