[Skiboot] PCIe training failure on PLX PCI bridge
tpearson at raptorengineering.com
Thu Jun 6 22:07:48 AEST 2019
----- Original Message -----
> From: "Oliver" <oohall at gmail.com>
> To: "Timothy Pearson" <tpearson at raptorengineering.com>
> Cc: "skiboot" <skiboot at lists.ozlabs.org>
> Sent: Thursday, June 6, 2019 1:32:48 AM
> Subject: Re: [Skiboot] PCIe training failure on PLX PCI bridge
> On Thu, Jun 6, 2019 at 2:24 PM Timothy Pearson
> <tpearson at raptorengineering.com> wrote:
>> Popped in a couple of new tuner cards into our "oddball hardware"
>> testing unit (Talos II w/ two P9 DD2.2 CPUs). These use a Pericom
>> bridge chip, PI7C9X2G304, as the PCIe interface.
>> The links train but are degraded so are taken back offline. Both cards
>> are brand new and show the exact same problem.
> What skiboot version are you using? The workaround in 02a683bf09d9
> ("hw/phb4: Assert Link Disable bit after ETU init") might help.
GIT hash d318cdb. I'm not sure if that patch is included or not, I'd have to go check.
>> Log from PHB#0 in Mata mode:
> Mata mode doesn't appear to be enabled since there's no TRACE: lines
> in the output. Turn on the link trace output with:
> nvram -p ibm,skiboot --update-config pci-tracing=true
> It should look something like this:
> PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
> PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
> PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
> PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config
> PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery
> PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery
> PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0
> PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0
You're right -- I learned the hard way that the NVRAM setting doesn't take effect if you fast reboot. It took a traditional cold power down and re-IPL before the trace mode was activated, and by that time the cards were already removed again. I'll see what I can do to get a proper trace, but if this is the same Pericom fault already known to the hardware team in Austin I'm not sure the trace is going to help much?
More information about the Skiboot