[PATCH v2 1/3] PCI/PM: Reinstate clearing state_saved in legacy and !pm codepaths
Lukas Wunner
lukas at wunner.de
Wed Nov 26 23:49:06 AEDT 2025
On Tue, Nov 25, 2025 at 05:18:46PM -0600, Bjorn Helgaas wrote:
> On Wed, Nov 19, 2025 at 09:50:01AM +0100, Lukas Wunner wrote:
> > But there are two corner cases where the PCI core neglects to clear the
> > flag before commencing the suspend sequence:
> >
> > * If a driver has legacy PCI PM callbacks, pci_legacy_suspend() neglects
> > to clear the flag. The (stale) flag is subsequently queried by
> > pci_legacy_suspend() itself and pci_legacy_suspend_late().
> >
> > * If a device has no driver or its driver has no PCI PM callbacks,
> > pci_pm_freeze() neglects to clear the flag. The (stale) flag is
> > subsequently queried by pci_pm_freeze_noirq().
> >
> > The flag may be set prior to suspend if the device went through error
> > recovery: Drivers commonly invoke pci_restore_state() + pci_save_state()
> > to restore Config Space after reset.
>
> I guess the only point of pci_save_state() in this case is to set
> state_saved again so a future pci_restore_state() will work, right?
>
> The actual state being *saved* is pointless, assuming pci_save_state()
> saves exactly the same data that pci_restore_state() restored.
>
> And these are the pci_save_state() calls you removed with "treewide:
> Drop pci_save_state() after pci_restore_state()". Too bad we have to
> document the behavior we're about to change, but that's what we need
> to do. It's just a little clutter to keep in mind for this release.
Yes. All of your comments above are correct.
> > The flag may also be set if drivers call pci_save_state() on probe to
> > allow for recovery from subsequent errors.
> >
> > The result is that pci_legacy_suspend_late() and pci_pm_freeze_noirq()
> > don't call pci_save_state() and so the state that will be restored on
> > resume is the one recorded on last error recovery or on probe, not the one
> > that the device had on suspend. If the two states happen to be identical,
> > there's no problem.
>
> So IIUC the effect is that after this change and the "treewide"
> change,
>
> - If the driver uses legacy PM, the state restored on resume will be
> the state from suspend instead of the state on probe.
Right.
>
> - For devices with no driver or a driver without PM, if the device
> has already been runtime-suspended, we avoid a pointless
> pci_save_state(), so it's an optimization and not logically
> related to the legacy PM case.
It's slightly different:
- For devices with no driver or a driver without PM, the state restored
on "thaw" and "restore" will be the state from "freeze" instead of the
state on probe.
So the same problem that we have for drivers using legacy PM, we also
have for devices with no driver or a driver without (modern) PM callbacks,
but only in the "freeze" codepath (for hibernation).
In the patch, I made the "pci_dev->state_saved = false" assignment
conditional on !pm_runtime_suspended() in the "freeze" codepath.
I didn't do the same in the legacy codepath because none of the
drivers using legacy PM callbacks seem to be using runtime PM.
The purpose of making it conditional on !pm_runtime_suspended()
is just that we'd otherwise call pci_save_state() twice: Once in
pci_pm_runtime_suspend() and once more in pci_pm_freeze().
That would be pointless.
In the commit message, I provided a rationale for the conditionality,
but inadvertently caused confusion.
> I'm thinking of something like this for the merge commit and eventual
> pull request; please correct me if this isn't right:
>
> Restore the suspend config state, not the state from probe or error
> recovery, for drivers using legacy PCI suspend.
>
> Avoid saving config state again for devices without driver PM if
> their state was already saved by runtime suspend.
I'd suggest instead (feel free to wordsmith as you see fit):
Restore the suspend config state, not the state from probe or error
recovery, for drivers using legacy PCI suspend. [ <- unmodified ]
Same for devices with no driver or a driver without PM callbacks
when the system is hibernated. [ <- replacement ]
Mentioning the runtime PM conditionality in the high-level changelog
is probably not worth it.
Was I able to clarify all questions? Please ask again if not.
Also, in case the meaning of "freeze", "thaw", "restore" isn't clear,
here's the order of a hibernation sequence (suspend to disk):
pci_pm_prepare()
pci_pm_freeze()
pci_pm_freeze_noirq()
<system image is generated>
pci_pm_thaw_noirq()
pci_pm_thaw()
pci_pm_complete()
pci_pm_prepare()
pci_pm_poweroff()
pci_pm_poweroff_late()
pci_pm_poweroff_noirq()
<system is asleep, then restarted with boot kernel>
pci_pm_prepare()
pci_pm_freeze()
pci_pm_freeze_noirq()
<system image is restored, replacing the boot kernel>
pci_pm_restore_noirq()
pci_pm_restore()
pci_pm_complete()
Note that "freeze" happens twice in the whole sequence.
Thanks,
Lukas
More information about the Linuxppc-dev
mailing list