[Skiboot] [PATCH v4 04/10] opal-msg: Enhance opal-get-msg API

Tue May 21 17:20:00 AEST 2019

Vasant Hegde <hegdevasant at linux.vnet.ibm.com> writes:

> Linux uses opal_get_msg (OPAL_GET_MSG) API to get OPAL messages. This interface
> supports upto 8 params (64 bytes). We have a requirement to send bigger data to
> Linux. This patch enhances OPAL to send bigger data to Linux.
>
> - Linux will use "opal-msg-size" device tree property to allocate memory for
>   OPAL messages (previous patch increased "opal-msg-size" to 64K).
>
> - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
>   side opal_get_msg user can detect actual data size.
>
> - If buffer size < actual message size, then opal_get_msg will copy partial
>   data and return OPAL_PARTIAL to Linux.

Looking through the linux code, we should probably very carfully
document the expected behaviour for larger messages an the impact.

Specifically around what happens with OPAL_PARTIAL and ensuring we don't
just print error messages out from the kernel forever.

Specifically, we seem to have two places where we consume messages in
Linux:

in opal.c:
static void opal_handle_message(void)
{
....
        ret = opal_get_msg(__pa(&msg), sizeof(msg));
        /* No opal message pending. */
        if (ret == OPAL_RESOURCE)
                return;

        /* check for errors. */
        if (ret) {
                pr_warn("%s: Failed to retrieve opal message, err=%lld\n",
                        __func__, ret);
                return;
        }

So this gives me pause, as if we have a large message needing to be
retreived, existing kernels never will clear it, and spew pr_warns from
here to eternity.

and in opal-hmi.c:

        if (unrecoverable) {
                /* Pull all HMI events from OPAL before we panic. */
                while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
                      .....
                      print_hmi_event_info(hmi_evt);

So if we have a large event in the middle of a bunch of fatal HMI info,
we'll lose the HMI event info.

So I think we need to ensure two things:
1) we don't lose unrecoverable HMIs
2) existing kernels degrade gracefully.

We can probably solve 1 by something only casually ugly (remove every
event that isn't unrecoverable hmi or don't return OPAL_PARTIAL when
there's an unrecoverable HMI event present).

For 2, I'm trying to think of what's good to do.. my ideas go down two
paths:
- if a message has gotten OPAL_PARTIAL twice, drop it.
- Drop message after first OPAL_PARTIAL if opal_get_msg size parameter
  is the sizeof(msg) rather than what we put in the device tree.

Either way, we should very clearly document this (and the reasoning
behind it) in the OPAL API docs.

I haven't looked at what FreeBSD does, and we should probably do that if
only to get into the habit of doing so.

> - Add new variable "extended" to "opal_msg_entry" structure to keep track
>   of messages that has more than 64byte data. We will allocate separate
>   memory for these messages and once kernel consumes message we will
>   release that memory.
>
> Cc: Jeremy Kerr <jk at ozlabs.org>
> Cc: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> Cc: Oliver O'Halloran <oohall at gmail.com>
> Signed-off-by: Vasant Hegde <hegdevasant at linux.vnet.ibm.com>
> Acked-by: Jeremy Kerr <jk at ozlabs.org>
> ---
>  core/opal-msg.c                | 66 ++++++++++++++++++++++++++++++------------
>  core/test/run-msg.c            |  6 ++--
>  doc/opal-api/opal-messages.rst |  2 +-
>  include/opal-api.h             |  2 +-
>  4 files changed, 53 insertions(+), 23 deletions(-)
>
> diff --git a/core/opal-msg.c b/core/opal-msg.c
> index 907a9e0af..af1ec7d00 100644
> --- a/core/opal-msg.c
> +++ b/core/opal-msg.c
> @@ -25,6 +25,7 @@
>  struct opal_msg_entry {
>  	struct list_node link;
>  	void (*consumed)(void *data, int status);
> +	bool extended;
>  	void *data;
>  	struct opal_msg msg;
>  };
> @@ -39,37 +40,47 @@ int _opal_queue_msg(enum opal_msg_type msg_type, void *data,
>  		    size_t params_size, const void *params)
>  {
>  	struct opal_msg_entry *entry;
> +	uint64_t entry_size;
> +
> +	if ((params_size + OPAL_MSG_HDR_SIZE) > OPAL_MSG_SIZE) {
> +		prlog(PR_DEBUG, "param_size (0x%x) > opal_msg param size (0x%x)\n",
> +		      (u32)params_size, (u32)(OPAL_MSG_SIZE - OPAL_MSG_HDR_SIZE));
> +		return OPAL_PARAMETER;
> +	}
>  
>  	lock(&opal_msg_lock);
>  
> -	entry = list_pop(&msg_free_list, struct opal_msg_entry, link);
> -	if (!entry) {
> -		prerror("No available node in the free list, allocating\n");
> -		entry = zalloc(sizeof(struct opal_msg_entry));
> +	if (params_size > OPAL_MSG_FIXED_PARAMS_SIZE) {
> +		entry_size = sizeof(struct opal_msg_entry) + params_size;
> +		entry_size -= OPAL_MSG_FIXED_PARAMS_SIZE;
> +		entry = zalloc(entry_size);
> +		if (entry)
> +			entry->extended = true;
> +	} else {
> +		entry = list_pop(&msg_free_list, struct opal_msg_entry, link);
>  		if (!entry) {
> -			prerror("Allocation failed\n");
> -			unlock(&opal_msg_lock);
> -			return OPAL_RESOURCE;
> +			prerror("No available node in the free list, allocating\n");
> +			entry = zalloc(sizeof(struct opal_msg_entry));

I'm tempted to say we should switch to using the pool allocator for
these and hard failing when we run out. This could come in a separate
patch though.

We've (more than once) gotten everything fairly wrong and eaten up all
of skiboot heap with messages that aren't being consumed, causing
everything to explode even more horribly than it was already exploding.

-- 
Stewart Smith
OPAL Architect, IBM.