[Skiboot] [PATCH] witherspoon: Squash spurious I2C errors

Oliver O'Halloran oohall at gmail.com
Wed Nov 27 15:47:58 AEDT 2019

On witherspoon there's an I2C bus connecting the each chip to the GPUs
connected to that chip via NVLink. That bus has a 750kOhm pullup on the
system planar with prevents the bus from operating correctly.

Each GPU has a smaller pullup which makes the bus usable when a GPU is
plugged in, but on systems without GPUs we get a lot of spurious I2C
master errors. Specificly, because of the oversized pullup the SDA and
SCL for that bus cannot return to '1' fast enough, so the master assumes
that another master is driving the bus and that it should stop.

Work around this by disabling the affected port when there's no GPUs
detected in the system.

Cc: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
Tested on a system with a GPU on each socket and on a system with none.
Couldn't find a mixed system.
 platforms/astbmc/witherspoon.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/platforms/astbmc/witherspoon.c b/platforms/astbmc/witherspoon.c
index edf84fb89601..081996684603 100644
--- a/platforms/astbmc/witherspoon.c
+++ b/platforms/astbmc/witherspoon.c
@@ -465,6 +465,7 @@ static void npu2_phb_nvlink_dt(struct phb *npuphb)
 static void witherspoon_finalise_dt(bool is_reboot)
 	struct dt_node *np;
+	struct proc_chip *c;
 	if (is_reboot)
@@ -479,6 +480,30 @@ static void witherspoon_finalise_dt(bool is_reboot)
+	/*
+	 * The I2C bus on used to talk to the GPUs has a 750K pullup
+	 * which is way too big. If there's no GPUs connected to the
+	 * chip all I2C transactions fail with an Arb loss error since
+	 * SCL/SDA don't return to the idle state fast enough. Disable
+	 * the port to squash the errors.
+	 */
+	for (c = next_chip(0); c; c = next_chip(c)) {
+		bool detected = false;
+		int i;
+		np = dt_find_by_path(c->devnode, "i2cm at a1000/i2c-bus at 4");
+		if (!np)
+			continue;
+		for (i = 0; i < 3; i++)
+			detected |= occ_get_gpu_presence(c, i);
+		if (!detected) {
+			dt_check_del_prop(np, "status");
+			dt_add_property_string(np, "status", "disabled");
+		}
+	}
 /* The only difference between these is the PCI slot handling */

More information about the Skiboot mailing list