[Linuxppc-users] Fedora 28-1.1 taking 30 seconds to discover/enable PCIe adapter after link disable/enable

Mike Bieker mike.bieker at broadcom.com
Thu May 31 04:12:33 AEST 2018


Hi Brian,



Thanks for looking at this!  I will check dmesg for the same.



Mike



*From:* Brian King [mailto:brking at linux.ibm.com]
*Sent:* Wednesday, May 30, 2018 12:06 PM
*To:* Mike Bieker; linuxppc-users at lists.ozlabs.org
*Subject:* Re: [Linuxppc-users] Fedora 28-1.1 taking 30 seconds to
discover/enable PCIe adapter after link disable/enable



Hi Mike,

When I try that on a Power 9 system of mine, the act of doing the link
disable results in the PHB going into EEH
state, which is essentially the PHB going into a frozen state due to an
unexpected error of some sort. Lots of things
can cause this - bad DMA address, PCIe link errors, etc. In this case its
the act of disabling the link.
If you check dmesg, my guess is that you will see errors related to EEH.
The kernel will then attempt to
recover from this state. In fact, what I see on my system, is I don't even
need to clear the link disable state,
as the act of going through EEH recovery in the kernel ends up clearing it.

Thanks,

Brian

On 05/30/2018 11:00 AM, Mike Bieker wrote:

On x86 system, discovering/enabling a PCIe adapter after PCIe link
disable/enable takes less than a second.  However, on Power Systems it
takes 30 seconds or more.



Here is the process we are using to test:

1)        Boot system and verify that link is up between IBM Root Port and
our Atlas PCIe Gen4x16 switch with no errors – ‘lspci –s 034:01:00.0 –vvv’

2)        Set Link Disable bit (Bit 4) in PCIe Link Control register of
Root Port - ‘setpci –s 034:00:00.0 58.w=0018’.

3)        Verify that link is disabled between Root and Atlas – ‘setpci –s
034:00:00.0 58.w’ should show that link disable bit is set.  Can also
execute ‘lspci’ and see that link is down between Root Port and Atlas.

4)        Clear Link Disable bit in PCIe Link Control register of Root Port
– ‘setpci –s 034:00:00.0 58.w=0008’

5)        Wait 5 seconds  - ‘sleep 5’

6)        Check that that link between Root Port and Atlas is enabled and
at proper rate and width (Gen4x16) – ‘lspci –s 034:01:00.0 –vvv’.  This is
where error occurs because link is not up.  If I keep trying lspci, after
30 to 60 seconds the port returns valid data.  Why does Fedora on Power
Systems take so long to link up and discover the adapter after link
disable/enable?



Thanks,

Mike






_______________________________________________

Linuxppc-users mailing list

Linuxppc-users at lists.ozlabs.org

https://lists.ozlabs.org/listinfo/linuxppc-users



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-users/attachments/20180530/5427047a/attachment-0001.html>


More information about the Linuxppc-users mailing list