[PASEMI] Nemo board doesn't recognize any ATA disks with the pci-v5.16 updates

Damien Le Moal damien.lemoal at opensource.wdc.com
Wed Nov 10 14:52:09 AEDT 2021


On 2021/11/10 7:40, Krzysztof Wilczyński wrote:
> [+CC Adding Jens and Damien to get their opinion about the problem at hand]
> 
> Hello Jens and Damien,
> 
> Sorry to bother both of you, but we are having a problem that most
> definitely requires someone with an extensive expertise in storage,
> as per the quoted message from Christian below:
> 
>>>> The Nemo board [1] doesn't recognize any ATA disks with the pci-v5.16
>>>> updates [2].
>>>>
>>>> Error messages:
>>>>
>>>> ata4.00: gc timeout cmd 0xec
>>>> ata4.00: failed to IDENTIFY (I/O error, error_mask=0x4)
>>>> ata1.00: gc timeout cmd 0xec
>>>> ata1.00: failed to IDENTIFY (I/O error, error_mask=0x4)
>>>> ata3.00: gc timeout cmd 0xec
>>>> ata3.00: failed to IDENTIFY (I/O error, error_mask=0x4)

IDENTIFY is the first command sent to a device when it is being probed. This
means that at least the AHCI (is it AHCI ?) adapter found the ports and drives
connected. But the qc timeout indicates that there is no response from the
drive. This could be due to interrupts not being received for the command
completion. One thing to try would be to increase the identify command timeout
to see things simply got slow (for whatever reason) or if indeed there is no
response at all. Note that after the first timeout, normally the port is reset
and the command retried. That does not seem to be the case here. Weird...

Maybe try something like this:

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 1d4a6f1e88cd..16e105bcb899 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -79,7 +79,7 @@ enum {
  * take an exceptionally long time to recover from reset.
  */
 static const unsigned long ata_eh_reset_timeouts[] = {
-       10000,  /* most drives spin up by 10sec */
+       30000,  /* most drives spin up by 10sec */
        10000,  /* > 99% working drives spin up before 20sec */
        35000,  /* give > 30 secs of idleness for outlier devices */
         5000,  /* and sweet one last chance */

Also note that I posted a patch a couple of days ago fixing a qc timeout for
read log commands during device probe. This is not what you are hitting here
though. I have not yet sent this to Linus.

https://lore.kernel.org/linux-ide/20211105073106.422623-1-damien.lemoal@opensource.wdc.com/



> 
> The error message is also not very detailed and we aren't really sure what
> the issue coming from the PCI sub-system might be causing or leading to
> this.
> 
>>>>
>>>> I was able to revert the new pci-v5.16 updates [2]. After a new compiling,
>>>> the kernel recognize all ATA disks correctly.
>>>>
>>>> Could you please check the pci-v5.16 updates [2]?
>>>>
>>>> Please find attached the kernel config.
>>>>
>>>> Thanks,
>>>> Christian
>>>>
>>>> [1] https://en.wikipedia.org/wiki/AmigaOne_X1000
>>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c5c62ddf88c34bc83b66e4ac9beb2bb0e1887d4
>>
>> Sorry for the breakage, and thank you very much for the report.  Can
>> you please collect the complete dmesg logs before and after the
>> pci-v5.16 changes and the "sudo lspci -vv" output from before the
>> changes?
>>
>> You can attach them at https://bugzilla.kernel.org if you don't have
>> a better place to put them.
>>
>> You could attach the kernel config there, too, since it didn't make it
>> to the mailing list (vger may discard them -- see
>> http://vger.kernel.org/majordomo-info.html).
> 
> Bjorn and I looked at which commits that went with a recent Pull Request
> from us might be causing this, but we are a little bit at loss, and were
> hoping that you could give us a hand in troubleshooting this.
> 
> Thank you in advance!
> 
> 	Krzysztof
> 
> 


-- 
Damien Le Moal
Western Digital Research


More information about the Linuxppc-dev mailing list