[PATCH 0/2] PCI/AER: Consistently use _OSC to determine who owns AER

Alex_Gagniuc at Dellteam.com Alex_Gagniuc at Dellteam.com
Tue Nov 20 07:16:59 AEDT 2018


On 11/19/2018 01:32 PM, Sinan Kaya wrote:
> ACPI 6.2:
> 
> 18.3.2.4 PCI Express Root Port AER Structure
> 
> Flags:
> 
> Bit [0] - FIRMWARE_FIRST: If set, this bit indicates to the OSPM that system
> firmware will handle errors from this source first.
> Bit [1] - GLOBAL: If set, indicates that the settings contained in this
> structure apply globally to all PCI Express Devices.
> All other bits must be set to zero.
> 
> It doesn't say shall, may or might. It says will.

It says "system firmware will handle errors". It does not say "system 
firmware owns AER registers". In absence on any descriptor text on the 
meaning of these tables, this really looks to me like it should be 
interpreted as a descriptor of APEI error sources, not a mutex on who 
writes to certain bits-- AER in this case.

I don't think that is contradictory or inconsistent.
I also wasn't able to find any reference to HEST in UEFI 2.7, only in 
ACPI spec.

> I think It depends on your PCI topology.
> 
> For other topologies with multiple PCI root complexes, I can see this being
> used per root complex flag to indicate which root complex needs firmware first
> and which one doesn't.

_OSC is per root bus, so it's already granular enough, right? Why would 
it depend on PCI topology?


>> I'd like see how exactly we break one of those elusive systems with _OSC. I
>> suspect _OSC and HEST end up having the same information, and that's why we
>> didn't see any real-life issue with mixing the approaches.
> 
> I'm already aware of two systems that rely on HEST table to pass information to
> the OS that firmware first is enabled. Both of the systems do not change their
> _OSC bits during this assuming HEST table has priority over _OSC for firmware
> first.

Are those hax86 systems?
It seems like the systems have broken firmware. I see several ways to 
handle broken systems like those:
  - Parse both HEST and _OSC, and decide AER ownership with root bridge 
granularity. i.e. host_bridge->native_aer is authoritative, but is 
derived from both HEST and _OSC
  - Add quirks for the broken systems
  - Keep doing what we're doing until current code breaks a new system

> If we add this patch, OS will try to claim the AER address space while firmware
> wants exclusive access.

Yay! FFS wants exclusive access, but does not claim it. Oh, FFS!


> As I said in my previous email, the right place to talk about this is UEFI
> forum.

The way I would present the problem to he spec writers is that, although 
the spec appears to be consistent, we've seen firmware vendors that made 
the wrong assumptions about HEST/_OSC. Instead of describing AER 
ownership with _OSC, they attempted to do it with HEST. So we should add 
an implementation note, or clarification about this.

Alex


More information about the Linuxppc-dev mailing list