[Skiboot] [PATCH 1/2] hw/phb4: Assert Link Disable bit after ETU init
Stewart Smith
stewart at linux.ibm.com
Mon Jun 3 12:10:13 AEST 2019
"Oliver O'Halloran" <oohall at gmail.com> writes:
> The cursed RAID card in ozrom1 has a bug where it ignores PERST being
> asserted. The PCIe Base spec is a little vague about what happens
> while PERST is asserted, but it does clearly specify that when
> PERST is de-asserted the Link Training and Status State Machine
> (LTSSM) of a device should return to the initial state (Detect)
> defined in the spec and the link training process should restart.
>
> This bug was worked around in 9078f8268922 ("phb4: Delay training till
> after PERST is deasserted") by setting the link disable bit at the
> start of the FRESET process and clearing it after PERST was
> de-asserted. Although this fixed the bug, the patch offered no
> explaination of why the fix worked.
>
> In b8b4c79d4419 ("hw/phb4: Factor out PERST control") the link disable
> workaround was moved into phb4_assert_perst(). This is called
> always in the CRESET case, but a following patch resulted in
> assert_perst() not being called if phb4_freset() was entered following a
> CRESET since p->skip_perst was set in the CRESET handler. This is bad
> since a side-effect of the CRESET is that the Link Disable bit is
> cleared.
>
> This, combined with the RAID card ignoring PERST results in the PCIe
> link being trained by the PHB while we're waiting out the 100ms
> ETU reset time. If we hack skiboot to print a DLP trace after returning
> from phb4_hw_init() we get:
>
> PHB#0001[0:1]: Initialization complete
> PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
> PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
> PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
> PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config
> PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery
> PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery
> PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0
> PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0
> PHB#0001[0:1]: CRESET: wait_time = 100
> PHB#0001[0:1]: FRESET: Starts
> PHB#0001[0:1]: FRESET: Prepare for link down
> PHB#0001[0:1]: FRESET: Assert skipped
> PHB#0001[0:1]: FRESET: Deassert
> PHB#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0
> PHB#0001[0:1]: TRACE: Reached target state
> PHB#0001[0:1]: LINK: Start polling
> PHB#0001[0:1]: LINK: Electrical link detected
> PHB#0001[0:1]: LINK: Link is up
> PHB#0001[0:1]: LINK: Went down waiting for stabilty
> PHB#0001[0:1]: LINK: DLP train control: 0x0000105101000000
> PHB#0001[0:1]: CRESET: Starts
>
> What has happened here is that the link is trained to 8x Gen3 33ms after
> we return from phb4_init_hw(), and before we've waitined to 100ms
> that we normally wait after re-initialising the ETU. When we "deassert"
> PERST later on in the FRESET handler the link in L0 (normal) state. At
> this point we try to read from the Vendor/Device ID register to verify
> that the link is stable and immediately get a PHB fence due to a PCIe
> Completion Timeout. Skiboot attempts to recover by doing another CRESET,
> but this will encounter the same issue.
>
> This patch fixes the problem by setting the Link Disable bit (by calling
> phb4_assert_perst()) immediately after we return from phb4_init_hw().
> This prevents the link from being trained while PERST is asserted which
> seems to avoid the Completion Timeout. With the patch applied we get:
>
> PHB#0001[0:1]: Initialization complete
> PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
> PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect
> PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling
> PHB#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled
> PHB#0001[0:1]: CRESET: wait_time = 100
> PHB#0001[0:1]: FRESET: Starts
> PHB#0001[0:1]: FRESET: Prepare for link down
> PHB#0001[0:1]: FRESET: Assert skipped
> PHB#0001[0:1]: FRESET: Deassert
> PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect
> PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling
> PHB#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect
> PHB#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling
> PHB#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config
> PHB#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery
> PHB#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery
> PHB#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0
> PHB#0001[0:1]: TRACE: Reached target state
> PHB#0001[0:1]: LINK: Start polling
> PHB#0001[0:1]: LINK: Electrical link detected
> PHB#0001[0:1]: LINK: Link is up
> PHB#0001[0:1]: LINK: Link is stable
> PHB#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled
> PHB#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3
> PHB#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08
> PHB#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000
>
> Cc: Michael Neuling <mikey at neuling.org>
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
> ---
> hw/phb4.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
Merged to master as of 02a683bf09d94757a72eec00e602c5609aa8d754.
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list