[PATCH AUTOSEL 6.17-5.4] powerpc/eeh: Use result of error_detected() in uevent
Sasha Levin
sashal at kernel.org
Sun Oct 26 02:54:48 AEDT 2025
From: Niklas Schnelle <schnelle at linux.ibm.com>
[ Upstream commit 704e5dd1c02371dfc7d22e1520102b197a3b628b ]
Ever since uevent support was added for AER and EEH with commit
856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume"), it
reported PCI_ERS_RESULT_NONE as uevent when recovery begins.
Commit 7b42d97e99d3 ("PCI/ERR: Always report current recovery status for
udev") subsequently amended AER to report the actual return value of
error_detected().
Make the same change to EEH to align it with AER and s390.
Suggested-by: Lukas Wunner <lukas at wunner.de>
Link: https://lore.kernel.org/linux-pci/aIp6LiKJor9KLVpv@wunner.de/
Signed-off-by: Niklas Schnelle <schnelle at linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
Reviewed-by: Lukas Wunner <lukas at wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy at linux.intel.com>
Acked-by: Mahesh Salgaonkar <mahesh at linux.ibm.com>
Link: https://patch.msgid.link/20250807-add_err_uevents-v5-3-adf85b0620b0@linux.ibm.com
Signed-off-by: Sasha Levin <sashal at kernel.org>
---
LLM Generated explanations, may be completely bogus:
YES
Rationale
- Fixes incorrect uevent status at start of EEH recovery: the code
currently emits a uevent with `PCI_ERS_RESULT_NONE` regardless of what
the driver reported via `error_detected()`. This misrepresents the
actual recovery status to user space.
- The fix makes EEH behave like AER (already fixed by commit
7b42d97e99d3) and s390, improving cross-arch consistency and user
space expectations.
Evidence in code
- Current EEH behavior: emits BEGIN_RECOVERY unconditionally at error
detection
- `pci_uevent_ers(pdev, PCI_ERS_RESULT_NONE);` is called after
`error_detected()` even if the driver “votes” differently (e.g.,
DISCONNECT/NEED_RESET): arch/powerpc/kernel/eeh_driver.c:337
- Proposed change: pass actual driver result
- Changes the above call to `pci_uevent_ers(pdev, rc);`, where `rc` is
the result of `driver->err_handler->error_detected()` captured just
above: arch/powerpc/kernel/eeh_driver.c:337
- uevent mapping semantics (what user space sees) are centralized in
`pci_uevent_ers()`:
- NONE/CAN_RECOVER -> `ERROR_EVENT=BEGIN_RECOVERY`, `DEVICE_ONLINE=0`
- RECOVERED -> `ERROR_EVENT=SUCCESSFUL_RECOVERY`, `DEVICE_ONLINE=1`
- DISCONNECT -> `ERROR_EVENT=FAILED_RECOVERY`, `DEVICE_ONLINE=0`
- Others (e.g., NEED_RESET) -> no immediate uevent (consistent with
AER)
- drivers/pci/pci-driver.c:1595
- AER already reports actual `error_detected()` return value to udev:
- `pci_uevent_ers(dev, vote);` after computing `vote` in
`report_error_detected()`: drivers/pci/pcie/err.c:83
- EEH already emits final-stage uevents correctly (unchanged by this
patch):
- Success at resume: `pci_uevent_ers(edev->pdev,
PCI_ERS_RESULT_RECOVERED);` arch/powerpc/kernel/eeh_driver.c:432
- Failure path: `pci_uevent_ers(pdev, PCI_ERS_RESULT_DISCONNECT);`
arch/powerpc/kernel/eeh_driver.c:462
Why this is a bugfix suitable for stable
- User-visible correctness: With the current code, user space always
sees “BEGIN_RECOVERY” even when drivers have already indicated an
unrecoverable state (e.g., DISCONNECT). The patch ensures uevents
reflect the true state immediately, matching AER behavior introduced
by 7b42d97e99d3.
- Minimal, contained change: One-line change in a single architecture-
specific file (PowerPC EEH). No API/ABI changes; only corrects the
parameter passed to an existing helper.
- No architectural change: Keeps existing EEH flow; only adjusts the
uevent status emitted at a single step.
- Low regression risk: AER has used this semantic for years;
`pci_uevent_ers()` already handles `rc` values. EEH already emits
RECOVERED/DISCONNECT at later stages; this makes the initial event
consistent.
- Aligns cross-arch semantics: Consistent uevent reporting across AER,
EEH, and s390 reduces user space special-casing and potential errors.
Potential side effects and why acceptable
- For drivers returning `PCI_ERS_RESULT_DISCONNECT` at
`error_detected()`, user space will now see `FAILED_RECOVERY`
immediately instead of a misleading `BEGIN_RECOVERY`. This is a
correctness fix.
- For returns like `PCI_ERS_RESULT_NEED_RESET`, no initial uevent is
emitted (consistent with AER); user space will still receive final
RECOVERED/FAILED, as today. Any scripts that strictly expected an
initial BEGIN_RECOVERY for all cases are already inconsistent with AER
and should not rely on that behavior.
Historical context
- Uevent support was added by 856e1eb9bdd4 (“PCI/AER: Add uevents in AER
and EEH error/resume”), initially emitting `NONE` at error detection
for both AER and EEH.
- AER was corrected by 7b42d97e99d3 (“PCI/ERR: Always report current
recovery status for udev”) to emit the actual `error_detected()`
result.
- This patch brings EEH to parity with that established AER behavior.
Conclusion
- This is a targeted, low-risk correctness fix that improves user space
observability and cross-arch consistency without changing kernel-side
recovery logic. It fits stable backport rules (important bugfix,
minimal change, low regression risk, confined to a subsystem).
arch/powerpc/kernel/eeh_driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 48ad0116f3590..ef78ff77cf8f2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -334,7 +334,7 @@ static enum pci_ers_result eeh_report_error(struct eeh_dev *edev,
rc = driver->err_handler->error_detected(pdev, pci_channel_io_frozen);
edev->in_error = true;
- pci_uevent_ers(pdev, PCI_ERS_RESULT_NONE);
+ pci_uevent_ers(pdev, rc);
return rc;
}
--
2.51.0
More information about the Linuxppc-dev
mailing list