Since 6.10 - kernel oops/panics on G4 macmini due to change in drivers/ata/pata_macio.c

Michael Ellerman mpe at ellerman.id.au
Tue Aug 13 22:32:36 AEST 2024


Niklas Cassel <cassel at kernel.org> writes:
> Hello Jonáš, Kolbjørn,
>
> thank you for the report.
>
> On Tue, Aug 13, 2024 at 07:49:34AM +0200, Jonáš Vidra wrote:
>> On Tue 13. Aug 2024 0:32:37 CEST, Kolbjørn Barmen wrote:
>> > Ever since 6.10, my macmini G4 behaved unstable when dealing with lots of
>> > I/O activity, such as sync'ing of Gentoo portage tree, unpacking kernel
>> > source tarball, building large software packages (or kernel) etc.
>> > 
>> > After a bit of testing, and patient kernel rebuilding (while crashing) I
>> > found the cuplit to be this commit/change
>> > 
>> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/diff/?id=09fe2bfa6b83f865126ce3964744863f69a4a030
>> 
>> I've been able to reproduce this pata_macio bug on a desktop PowerMac G4
>> with the 6.10.3 kernel version. Reverting the linked change
>> ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") makes
>> the errors go away.
>
> Michael, as the author of the this commit, could you please look into
> this issue?

I can. My commit was really just working around the warning in the SCSI
core which appeared after afd53a3d8528, it was supposed to just fix the
warning without changing behaviour. Though obviously it did for 4KB
PAGE_SIZE kernels.

I don't have easy access to my mac-mini so it would be helpful if you
can test changes Jonáš and/or Kolbjørn.

> We could revert your patch, which appears to work for some users,
> but that would again break setups with PAGE_SIZE == 64K.
> (I assume that Jonáš and Kolbjørn are not building with PAGE_SIZE == 64K.)

Yes they are using 4K, it says so in the oops.

>> ------------[ cut here ]------------
>> kernel BUG at drivers/ata/pata_macio.c:544!
>
> https://github.com/torvalds/linux/blob/v6.11-rc3/drivers/ata/pata_macio.c#L544
>
> It seems that the
> while (sg_len) loop does not play nice with the new .max_segment_size.

Right, but only for 4KB kernels for some reason. Is there some limit
elsewhere that prevents the bug tripping on 64KB kernels, or is it just
luck that no one has hit it?

I wonder if the best solution is something like below. It effectively
reverts to the old behaviour for 4KB page size, and should avoid the
same bug happening on 64KB page size kernels.

cheers


diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
index 1b85e8bf4ef9..eaffa510de49 100644
--- a/drivers/ata/pata_macio.c
+++ b/drivers/ata/pata_macio.c
@@ -208,6 +208,19 @@ static const char* macio_ata_names[] = {
 /* Don't let a DMA segment go all the way to 64K */
 #define MAX_DBDMA_SEG		0xff00
 
+#ifdef CONFIG_PAGE_SIZE_64KB
+/*
+ * The SCSI core requires the segment size to cover at least a page, so
+ * for 64K page size kernels it must be at least 64K. However the
+ * hardware can't handle 64K, so pata_macio_qc_prep() will split large
+ * requests. To handle the split requests the tablesize must be halved.
+ */
+#define MAX_SEGMENT_SIZE SZ_64K
+#define SG_TABLESIZE (MAX_DCMDS / 2)
+#else
+#define MAX_SEGMENT_SIZE MAX_DBDMA_SEG
+#define SG_TABLESIZE MAX_DCMDS
+#endif
 
 /*
  * Wait 1s for disk to answer on IDE bus after a hard reset
@@ -912,16 +925,10 @@ static int pata_macio_do_resume(struct pata_macio_priv *priv)
 
 static const struct scsi_host_template pata_macio_sht = {
 	__ATA_BASE_SHT(DRV_NAME),
-	.sg_tablesize		= MAX_DCMDS,
+	.sg_tablesize		= SG_TABLESIZE,
 	/* We may not need that strict one */
 	.dma_boundary		= ATA_DMA_BOUNDARY,
-	/*
-	 * The SCSI core requires the segment size to cover at least a page, so
-	 * for 64K page size kernels this must be at least 64K. However the
-	 * hardware can't handle 64K, so pata_macio_qc_prep() will split large
-	 * requests.
-	 */
-	.max_segment_size	= SZ_64K,
+	.max_segment_size	= MAX_SEGMENT_SIZE,
 	.device_configure	= pata_macio_device_configure,
 	.sdev_groups		= ata_common_sdev_groups,
 	.can_queue		= ATA_DEF_QUEUE,



More information about the Linuxppc-dev mailing list