[PATCH 3/3] hwmon: (occ) Provide the SBEFIFO FFDC in binary sysfs
Jeremy Kerr
jk at codeconstruct.com.au
Thu Sep 16 10:17:41 AEST 2021
Hi Eddie,
> Save any FFDC provided by the OCC driver, and provide it to userspace
> through a binary sysfs entry. Do some basic state management to
> ensure that userspace can always collect the data if there was an
> error. Notify polling userspace when there is an error too.
Super! Some comments inline:
> +enum sbe_error_state {
> + SBE_ERROR_NONE = 0,
> + SBE_ERROR_PENDING,
> + SBE_ERROR_COLLECTED
> +};
> +
> struct p9_sbe_occ {
> struct occ occ;
> + int sbe_error;
Use the enum here?
> + void *ffdc;
> + size_t ffdc_len;
> + size_t ffdc_size;
> + struct mutex sbe_error_lock; /* lock access to ffdc data */
> + u32 no_ffdc_magic;
> struct device *sbe;
> };
>
> #define to_p9_sbe_occ(x) container_of((x), struct p9_sbe_occ, occ)
>
> +static ssize_t sbe_error_read(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *battr, char *buf,
> + loff_t pos, size_t count)
> +{
> + ssize_t rc = 0;
> + struct occ *occ = dev_get_drvdata(kobj_to_dev(kobj));
> + struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
> +
> + mutex_lock(&ctx->sbe_error_lock);
> + if (ctx->sbe_error == SBE_ERROR_PENDING) {
> + rc = memory_read_from_buffer(buf, count, &pos, ctx->ffdc,
> + ctx->ffdc_len);
> + ctx->sbe_error = SBE_ERROR_COLLECTED;
> + }
> + mutex_unlock(&ctx->sbe_error_lock);
> +
> + return rc;
> +}
So any read from this file will clear out the FFDC data, making partial
reads impossible. As a least-intrusive change, could we set
SBE_ERROR_COLLECTED on write instead?
Or is there a better interface (a pipe?) that allows multiple FFDC
captures, destroyed on full consume, without odd read/write side
effects?
> rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
> - if (rc < 0)
> + if (rc < 0) {
> + if (resp_len) {
> + bool notify = false;
> +
> + mutex_lock(&ctx->sbe_error_lock);
> + if (ctx->sbe_error != SBE_ERROR_PENDING)
> + notify = true;
> + ctx->sbe_error = SBE_ERROR_PENDING;
[...]
> + ctx->ffdc_len = resp_len;
> + memcpy(ctx->ffdc, resp, resp_len);
This will clear out the previous error it if hasn't been collected by
userspace. Is that really what you want for *first* fail data capture?
:)
Cheers,
Jeremy
More information about the linux-fsi
mailing list