[PATCH 3/3] hwmon: (occ) Provide the SBEFIFO FFDC in binary sysfs

Jeremy Kerr jk at codeconstruct.com.au
Thu Sep 16 10:17:41 AEST 2021


Hi Eddie,

> Save any FFDC provided by the OCC driver, and provide it to userspace
> through a binary sysfs entry. Do some basic state management to
> ensure that userspace can always collect the data if there was an
> error. Notify polling userspace when there is an error too.

Super! Some comments inline:

> +enum sbe_error_state {
> +       SBE_ERROR_NONE = 0,
> +       SBE_ERROR_PENDING,
> +       SBE_ERROR_COLLECTED
> +};
> +
>  struct p9_sbe_occ {
>         struct occ occ;
> +       int sbe_error;

Use the enum here?

> +       void *ffdc;
> +       size_t ffdc_len;
> +       size_t ffdc_size;
> +       struct mutex sbe_error_lock;    /* lock access to ffdc data */
> +       u32 no_ffdc_magic;
>         struct device *sbe;
>  };
>  
>  #define to_p9_sbe_occ(x)       container_of((x), struct p9_sbe_occ, occ)
>  
> +static ssize_t sbe_error_read(struct file *filp, struct kobject *kobj,
> +                             struct bin_attribute *battr, char *buf,
> +                             loff_t pos, size_t count)
> +{
> +       ssize_t rc = 0;
> +       struct occ *occ = dev_get_drvdata(kobj_to_dev(kobj));
> +       struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
> +
> +       mutex_lock(&ctx->sbe_error_lock);
> +       if (ctx->sbe_error == SBE_ERROR_PENDING) {
> +               rc = memory_read_from_buffer(buf, count, &pos, ctx->ffdc,
> +                                            ctx->ffdc_len);
> +               ctx->sbe_error = SBE_ERROR_COLLECTED;
> +       }
> +       mutex_unlock(&ctx->sbe_error_lock);
> +
> +       return rc;
> +}

So any read from this file will clear out the FFDC data, making partial
reads impossible. As a least-intrusive change, could we set
SBE_ERROR_COLLECTED on write instead?

Or is there a better interface (a pipe?) that allows multiple FFDC
captures, destroyed on full consume, without odd read/write side
effects?

>         rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> -       if (rc < 0)
> +       if (rc < 0) {
> +               if (resp_len) {
> +                       bool notify = false;
> +
> +                       mutex_lock(&ctx->sbe_error_lock);
> +                       if (ctx->sbe_error != SBE_ERROR_PENDING)
> +                               notify = true;
> +                       ctx->sbe_error = SBE_ERROR_PENDING;

                          [...]

> +                       ctx->ffdc_len = resp_len;
> +                       memcpy(ctx->ffdc, resp, resp_len);

This will clear out the previous error it if hasn't been collected by
userspace. Is that really what you want for *first* fail data capture?
:)

Cheers,


Jeremy



More information about the linux-fsi mailing list