[PATCH] powerpc/powernv: Read opal error log and export it through sysfs interface.
Mahesh Jagannath Salgaonkar
mahesh at linux.vnet.ibm.com
Tue Feb 25 16:02:11 EST 2014
On 02/21/2014 05:41 AM, Stewart Smith wrote:
> Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
> > This patch adds support to read error logs from OPAL and export them
> > to userspace through sysfs interface /sys/firmware/opa/opal-elog.
Hi Stewart,
Thanks for the review. This code definitely needs improvement.
>
> I think we could provide a better interface with instead having a file
> per log message appear in sysfs. We're never going to have more than 128
> of these at any one time on the Linux side, so it's not going to bee too
> many files.
It is not just about 128 files, we may be adding/removing sysfs node for
every new log id that gets informed to kernel and ack-ed. In worst case,
when we have flood of elog errors with user daemon consuming it and
ack-ing back to get ready for next log in a tight poll, we may
continuously add/remove the sysfs node for each new <id>.
>
> e.g. /sys/firmware/opal/elog/<id>
>
> that way, any new file in /sys/firmware/opal/elog/ means there's a new
> log entry available. I believe there's
>
> To ack a log, you could just echo 'ack' to the file.
>
> The other option woudl be to more closely follow what sysfs is meant to
> be - ascii text. This would mean having more (any) of the parser in
> kernel for the error logs - which may/may not be a bad idea.
>
> However, it would make the end user code for consuming them much much
> simpler, and that may be a good thing.
>
> Having some way of getting some information out without a userspace
> parser is probably good though, I'm pretty sure having only a binary
> interface in /sys is at least partially frowned upon.
>
> > This is what user space tool would do:
> > - Read error log from /sys/firmware/opa/opal-elog.
> > - Save it to the disk.
> > - Send an acknowledgement on successful consumption by writing error log
> > id to /sys/firmware/opa/opal-elog-ack.
>
> A userspace tool may want to explicitly *not* ack the log too, or only
> ack some entries, so the interface sohuld be sane for this use case too.
>
> e.g. we could display them in petitboot.
>
> > diff --git a/arch/powerpc/platforms/powernv/opal-elog.c
> > b/arch/powerpc/platforms/powernv/opal-elog.c
> [ 2 more citation lines. Click/Enter to show. ]
> > new file mode 100644
> > index 0000000..fc891ae
> > --- /dev/null
> > +++ b/arch/powerpc/platforms/powernv/opal-elog.c
> > @@ -0,0 +1,309 @@
> <snip>
> > +/* Maximum size of a single log on FSP is 16KB */
> > +#define OPAL_MAX_ERRLOG_SIZE 16384
>
> I've seen some conflicting things on this - is it 2kb or 16kb?
We choose 16kb because we want to pull all the log data and not partial.
>
> > +
> > +struct opal_err_log {
> > + struct list_head link;
> > + uint64_t opal_log_id;
>
> why is this uint64_t and not uint32_t? It appears that the log id is 32bits.
Agree, This needs to be changed to uint_32.
>
> > + size_t opal_log_size;
> > + uint8_t data[OPAL_MAX_ERRLOG_SIZE];
> > +};
> > +
> > +/* Pre-allocated temp buffer to pull error log from opal. */
> > +static uint8_t err_log_data[OPAL_MAX_ERRLOG_SIZE];
>
> Why do we need temporary space? Why not just store directly into struct
> opal_err_log?
>
> > +/* Protect err_log_data buf */
> > +static DEFINE_MUTEX(err_log_data_mutex);
> [ 15 more citation lines. Click/Enter to show. ]
> > +
> > +static uint64_t total_log_size;
> > +static bool opal_log_available;
> > +static LIST_HEAD(elog_list);
> > +static LIST_HEAD(elog_ack_list);
> > +
> > +/* lock to protect elog_list and elog-ack_list. */
> > +static DEFINE_SPINLOCK(opal_elog_lock);
> > +
> > +static DECLARE_WAIT_QUEUE_HEAD(opal_log_wait);
> > +
> > +/*
> > + * Interface for user to acknowledge the error log.
> > + *
> > + * Once user acknowledge the log, we delete that record entry from the
> > + * list and move it ack list.
> > + */
> > +void opal_elog_ack(uint64_t ack_id)
>
> s/ack_id/log_id/
Yup. makes sense.
>
> > +
> > +static ssize_t elog_ack_store(struct kobject *kobj,
> [ 7 more citation lines. Click/Enter to show. ]
> > + struct kobj_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + uint32_t log_ack_id;
> > + log_ack_id = *(uint32_t *) buf;
> > +
> > + /* send acknowledgment to FSP */
> > + opal_elog_ack(log_ack_id);
> > + return 0;
> > +}
>
> This function has a few problems:
>
> Consider the following actions:
> $ echo 1 > /sys/firmware/opal/opal-elog-ack
> $ echo 'abcde' > /sys/firmware/opal/opal-elog-ack
>
> The former will read undefined memory and the latter will make a kernel
> thread, rsyslogd and systemd-journal all each a CPU each.
>
> Basically, the problems are:
> 1) not endian safe
> 2) not following store API of returning nr bytes read
> 3) binary interface. Use sscanf to read numbers instead.
>
> > +/*
> > + * Show error log records to user.
> [ 9 more citation lines. Click/Enter to show. ]
> > + */
> > +static ssize_t opal_elog_show(struct file *filp, struct kobject *kobj,
> > + struct bin_attribute *bin_attr, char *buf,
> > + loff_t pos, size_t count)
> > +{
> > + unsigned long flags;
> > + struct opal_err_log *record, *next;
> > + size_t size = 0;
> > + size_t data_to_copy = 0;
> > + int error = 0;
> > +
> > + /* Display one log at a time. */
>
> use words, not @.
>
> > + if (count > OPAL_MAX_ERRLOG_SIZE)
> > + count = OPAL_MAX_ERRLOG_SIZE;
> [ 23 more citation lines. Click/Enter to show. ]
> > + spin_lock_irqsave(&opal_elog_lock, flags);
> > + /* Align the pos to point within total errlog size. */
> > + if (total_log_size && pos > total_log_size)
> > + pos = pos % total_log_size;
> > +
> > + /*
> > + * if pos goes beyond total_log_size then we know we don't have any
> > + * new record to show.
> > + */
> > + if (total_log_size == 0 || pos >= total_log_size) {
> > + opal_log_available = 0;
> > + if (filp->f_flags & O_NONBLOCK) {
> > + spin_unlock_irqrestore(&opal_elog_lock, flags);
> > + error = -EAGAIN;
> > + goto out;
> > + }
> > + spin_unlock_irqrestore(&opal_elog_lock, flags);
> > + pos = 0;
> > +
> > + /* Wait until we get log from sapphire */
> > + error = wait_event_interruptible(opal_log_wait,
> > + opal_log_available);
> > + if (error)
> > + goto out;
> > + spin_lock_irqsave(&opal_elog_lock, flags);
> > + }
>
> Why should we wait for there to be a log message? If there's not one
> then there's not one and that's fine.
>
> I also wonder if we really need total_log_size and opal_log_available,
> this information seems readily available from the list of events.
>
> Instead, for notification (as i understand it) we should be using
> sysfs_notify() from kernel and then in userspace we can just call
> select() to wait for something to happen.
>
> > +/*
> > + * Pre-allocate a buffer to hold handful of error logs until user space
> [ 5 more citation lines. Click/Enter to show. ]
> > + * consumes it.
> > + */
> > +static int init_err_log_buffer(void)
> > +{
> > + int i = 0;
> > + struct opal_err_log *buf_ptr;
> > +
> > + buf_ptr = vmalloc(sizeof(struct opal_err_log) * MAX_NUM_RECORD);
>
> This means we constantly use 128 * sizeof(struct opal_err_log) which
> equates to somewhere north of 2MB of memory (due to list overhead).
>
> I don't think we need to statically allocate this, we can probably just
> allocate on-demand as in a typical system you're probably quite
> unlikely to have too many of these sitting around (besides, if for
> whatever reason we cannot allocate memory at some point, that's okay
> because we can read it again later).
The reason we choose to go for static allocation is, we can not afford
to drop or delay a critical error log due to memory allocation failure.
OR we can keep static allocations for critical errors and follow dynamic
allocation for informative error logs. What do you say?
Thanks,
-Mahesh.
More information about the Linuxppc-dev
mailing list