[PATCH 3/3] hwmon: (occ) Provide the SBEFIFO FFDC in binary sysfs

Sat Sep 18 04:44:01 AEST 2021

On Thu, 2021-09-16 at 08:17 +0800, Jeremy Kerr wrote:
> Hi Eddie,
> 
> > Save any FFDC provided by the OCC driver, and provide it to
> > userspace
> > through a binary sysfs entry. Do some basic state management to
> > ensure that userspace can always collect the data if there was an
> > error. Notify polling userspace when there is an error too.
> 
> Super! Some comments inline:
> 
> > +enum sbe_error_state {
> > +       SBE_ERROR_NONE = 0,
> > +       SBE_ERROR_PENDING,
> > +       SBE_ERROR_COLLECTED
> > +};
> > +
> >  struct p9_sbe_occ {
> >         struct occ occ;
> > +       int sbe_error;
> 
> Use the enum here?

Yep :) Though I think I will switch to a bool; as I wrote to Guenter,
the extra "collected" state isn't necessary if I stop trying to report
two things (the FFDC itself and whether or not there is an error at
all) with one interface.

> 
> > +       void *ffdc;
> > +       size_t ffdc_len;
> > +       size_t ffdc_size;
> > +       struct mutex sbe_error_lock;    /* lock access to ffdc data
> > */
> > +       u32 no_ffdc_magic;
> >         struct device *sbe;
> >  };
> >  
> >  #define to_p9_sbe_occ(x)       container_of((x), struct
> > p9_sbe_occ, occ)
> >  
> > +static ssize_t sbe_error_read(struct file *filp, struct kobject
> > *kobj,
> > +                             struct bin_attribute *battr, char
> > *buf,
> > +                             loff_t pos, size_t count)
> > +{
> > +       ssize_t rc = 0;
> > +       struct occ *occ = dev_get_drvdata(kobj_to_dev(kobj));
> > +       struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
> > +
> > +       mutex_lock(&ctx->sbe_error_lock);
> > +       if (ctx->sbe_error == SBE_ERROR_PENDING) {
> > +               rc = memory_read_from_buffer(buf, count, &pos, ctx-
> > >ffdc,
> > +                                            ctx->ffdc_len);
> > +               ctx->sbe_error = SBE_ERROR_COLLECTED;
> > +       }
> > +       mutex_unlock(&ctx->sbe_error_lock);
> > +
> > +       return rc;
> > +}
> 
> So any read from this file will clear out the FFDC data, making
> partial
> reads impossible. As a least-intrusive change, could we set
> SBE_ERROR_COLLECTED on write instead?

Good point. Write would work. How about resetting the error flag once
pos >= size though, for the sake of simplicity?

> 
> Or is there a better interface (a pipe?) that allows multiple FFDC
> captures, destroyed on full consume, without odd read/write side
> effects?

I tried to look into pipes, but I'm pretty unclear on how to set one
up. Doesn't appear as though they are used in kernel much, especially
as an interface to userspace.
I'm just not sure its worth having more than one FFDC packet available.
There would always have to be a maximum number of them anyway to put a
bound on memory usage. We're somewhat helped here by the fact that OCC
operations will go out a maximum of once a second, so that's hopefully
plenty of time for userspace to notice the error and pick up the FFDC.

> 
> >         rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> > -       if (rc < 0)
> > +       if (rc < 0) {
> > +               if (resp_len) {
> > +                       bool notify = false;
> > +
> > +                       mutex_lock(&ctx->sbe_error_lock);
> > +                       if (ctx->sbe_error != SBE_ERROR_PENDING)
> > +                               notify = true;
> > +                       ctx->sbe_error = SBE_ERROR_PENDING;
> 
>                           [...]
> 
> > +                       ctx->ffdc_len = resp_len;
> > +                       memcpy(ctx->ffdc, resp, resp_len);
> 
> This will clear out the previous error it if hasn't been collected by
> userspace. Is that really what you want for *first* fail data
> capture?
> :)

Ah, good point. I will fix that!

Thanks for the review Jeremy!

Eddie

> 
> Cheers,
> 
> 
> Jeremy
>