[Skiboot] [EXTERNAL] [PATCH] witherspoon: Squash spurious I2C errors

Andrew Donnellan ajd at linux.ibm.com
Wed Nov 27 16:04:33 AEDT 2019


On 27/11/19 3:47 pm, Oliver O'Halloran wrote:
> On witherspoon there's an I2C bus connecting the each chip to the GPUs
> connected to that chip via NVLink. That bus has a 750kOhm pullup on the
> system planar with prevents the bus from operating correctly.
> 
> Each GPU has a smaller pullup which makes the bus usable when a GPU is
> plugged in, but on systems without GPUs we get a lot of spurious I2C
> master errors. Specificly, because of the oversized pullup the SDA and
> SCL for that bus cannot return to '1' fast enough, so the master assumes
> that another master is driving the bus and that it should stop.
> 
> Work around this by disabling the affected port when there's no GPUs
> detected in the system.
> 
> Cc: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>

Looks reasonable.

Reviewed-by: Andrew Donnellan <ajd at linux.ibm.com>

> ---
> Tested on a system with a GPU on each socket and on a system with none.
> Couldn't find a mixed system.
> ---
>   platforms/astbmc/witherspoon.c | 25 +++++++++++++++++++++++++
>   1 file changed, 25 insertions(+)
> 
> diff --git a/platforms/astbmc/witherspoon.c b/platforms/astbmc/witherspoon.c
> index edf84fb89601..081996684603 100644
> --- a/platforms/astbmc/witherspoon.c
> +++ b/platforms/astbmc/witherspoon.c
> @@ -465,6 +465,7 @@ static void npu2_phb_nvlink_dt(struct phb *npuphb)
>   static void witherspoon_finalise_dt(bool is_reboot)
>   {
>   	struct dt_node *np;
> +	struct proc_chip *c;
>   
>   	if (is_reboot)
>   		return;
> @@ -479,6 +480,30 @@ static void witherspoon_finalise_dt(bool is_reboot)
>   			continue;
>   		npu2_phb_nvlink_dt(npphb);
>   	}
> +
> +	/*
> +	 * The I2C bus on used to talk to the GPUs has a 750K pullup
> +	 * which is way too big. If there's no GPUs connected to the
> +	 * chip all I2C transactions fail with an Arb loss error since
> +	 * SCL/SDA don't return to the idle state fast enough. Disable
> +	 * the port to squash the errors.
> +	 */
> +	for (c = next_chip(0); c; c = next_chip(c)) {
> +		bool detected = false;
> +		int i;
> +
> +		np = dt_find_by_path(c->devnode, "i2cm at a1000/i2c-bus at 4");
> +		if (!np)
> +			continue;
> +
> +		for (i = 0; i < 3; i++)
> +			detected |= occ_get_gpu_presence(c, i);
> +
> +		if (!detected) {
> +			dt_check_del_prop(np, "status");
> +			dt_add_property_string(np, "status", "disabled");
> +		}
> +	}
>   }
>   
>   /* The only difference between these is the PCI slot handling */
> 

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd at linux.ibm.com             IBM Australia Limited



More information about the Skiboot mailing list