[Skiboot] [EXTERNAL] [PATCH] witherspoon: Squash spurious I2C errors
Andrew Donnellan
ajd at linux.ibm.com
Wed Nov 27 16:04:33 AEDT 2019
On 27/11/19 3:47 pm, Oliver O'Halloran wrote:
> On witherspoon there's an I2C bus connecting the each chip to the GPUs
> connected to that chip via NVLink. That bus has a 750kOhm pullup on the
> system planar with prevents the bus from operating correctly.
>
> Each GPU has a smaller pullup which makes the bus usable when a GPU is
> plugged in, but on systems without GPUs we get a lot of spurious I2C
> master errors. Specificly, because of the oversized pullup the SDA and
> SCL for that bus cannot return to '1' fast enough, so the master assumes
> that another master is driving the bus and that it should stop.
>
> Work around this by disabling the affected port when there's no GPUs
> detected in the system.
>
> Cc: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
Looks reasonable.
Reviewed-by: Andrew Donnellan <ajd at linux.ibm.com>
> ---
> Tested on a system with a GPU on each socket and on a system with none.
> Couldn't find a mixed system.
> ---
> platforms/astbmc/witherspoon.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/platforms/astbmc/witherspoon.c b/platforms/astbmc/witherspoon.c
> index edf84fb89601..081996684603 100644
> --- a/platforms/astbmc/witherspoon.c
> +++ b/platforms/astbmc/witherspoon.c
> @@ -465,6 +465,7 @@ static void npu2_phb_nvlink_dt(struct phb *npuphb)
> static void witherspoon_finalise_dt(bool is_reboot)
> {
> struct dt_node *np;
> + struct proc_chip *c;
>
> if (is_reboot)
> return;
> @@ -479,6 +480,30 @@ static void witherspoon_finalise_dt(bool is_reboot)
> continue;
> npu2_phb_nvlink_dt(npphb);
> }
> +
> + /*
> + * The I2C bus on used to talk to the GPUs has a 750K pullup
> + * which is way too big. If there's no GPUs connected to the
> + * chip all I2C transactions fail with an Arb loss error since
> + * SCL/SDA don't return to the idle state fast enough. Disable
> + * the port to squash the errors.
> + */
> + for (c = next_chip(0); c; c = next_chip(c)) {
> + bool detected = false;
> + int i;
> +
> + np = dt_find_by_path(c->devnode, "i2cm at a1000/i2c-bus at 4");
> + if (!np)
> + continue;
> +
> + for (i = 0; i < 3; i++)
> + detected |= occ_get_gpu_presence(c, i);
> +
> + if (!detected) {
> + dt_check_del_prop(np, "status");
> + dt_add_property_string(np, "status", "disabled");
> + }
> + }
> }
>
> /* The only difference between these is the PCI slot handling */
>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd at linux.ibm.com IBM Australia Limited
More information about the Skiboot
mailing list