[Skiboot] [PATCH v4 04/10] opal-msg: Enhance opal-get-msg API

Tue May 28 14:56:35 AEST 2019

On 05/21/2019 12:50 PM, Stewart Smith wrote:
> Vasant Hegde <hegdevasant at linux.vnet.ibm.com> writes:
> 
>> Linux uses opal_get_msg (OPAL_GET_MSG) API to get OPAL messages. This interface
>> supports upto 8 params (64 bytes). We have a requirement to send bigger data to
>> Linux. This patch enhances OPAL to send bigger data to Linux.
>>
>> - Linux will use "opal-msg-size" device tree property to allocate memory for
>>    OPAL messages (previous patch increased "opal-msg-size" to 64K).
>>
>> - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
>>    side opal_get_msg user can detect actual data size.
>>
>> - If buffer size < actual message size, then opal_get_msg will copy partial
>>    data and return OPAL_PARTIAL to Linux.
> 
> Looking through the linux code, we should probably very carfully
> document the expected behaviour for larger messages an the impact.

Correct. Its already documented in OPAL.

OPAL_PARTIAL
    If pending opal message is greater than supplied buffer.
    In this case the message is *DISCARDED* by OPAL.

...and this patch retains above behavior. i.e. we remove entry from list and return
OPAL_PARTIAL to kernel.

> 
> Specifically around what happens with OPAL_PARTIAL and ensuring we don't
> just print error messages out from the kernel forever.

It will print error message once and move on....as OPAL removes entry from list.
(Same as existing behavior).

> 
> Specifically, we seem to have two places where we consume messages in
> Linux:
> 
> in opal.c:
> static void opal_handle_message(void)
> {
> ....
>          ret = opal_get_msg(__pa(&msg), sizeof(msg));
>          /* No opal message pending. */
>          if (ret == OPAL_RESOURCE)
>                  return;
> 
>          /* check for errors. */
>          if (ret) {
>                  pr_warn("%s: Failed to retrieve opal message, err=%lld\n",
>                          __func__, ret);
>                  return;
>          }
> 
> So this gives me pause, as if we have a large message needing to be
> retreived, existing kernels never will clear it, and spew pr_warns from
> here to eternity.
> 
> and in opal-hmi.c:
> 
>          if (unrecoverable) {
>                  /* Pull all HMI events from OPAL before we panic. */
>                  while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
>                        .....
>                        print_hmi_event_info(hmi_evt);
> 
> 
> So if we have a large event in the middle of a bunch of fatal HMI info,
> we'll lose the HMI event info.
> 
> So I think we need to ensure two things:
> 1) we don't lose unrecoverable HMIs
> 2) existing kernels degrade gracefully.

I think both are taken care.

> 
> We can probably solve 1 by something only casually ugly (remove every
> event that isn't unrecoverable hmi or don't return OPAL_PARTIAL when
> there's an unrecoverable HMI event present).

This is not an issue because existing interface handles HMI events properly.
And on old kernel / new OPAL, for bigger messages we will return OPAL_PARTIAL *and*
remove message from list. We will not have infinite loop. so we are good.

> 
> For 2, I'm trying to think of what's good to do.. my ideas go down two
> paths:
> - if a message has gotten OPAL_PARTIAL twice, drop it.
> - Drop message after first OPAL_PARTIAL if opal_get_msg size parameter
>    is the sizeof(msg) rather than what we put in the device tree.

Only way for kernel to know the maximum message size is through device tree.
So it makes sense to drop message if kernel is not passing sufficient buffer 
(i.e. Old
kernel / new OPAL combination).

> 
> Either way, we should very clearly document this (and the reasoning
> behind it) in the OPAL API docs.
> 
> I haven't looked at what FreeBSD does, and we should probably do that if
> only to get into the habit of doing so.
> 
> 
>> - Add new variable "extended" to "opal_msg_entry" structure to keep track
>>    of messages that has more than 64byte data. We will allocate separate
>>    memory for these messages and once kernel consumes message we will
>>    release that memory.
>>
>> Cc: Jeremy Kerr <jk at ozlabs.org>
>> Cc: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>> Cc: Oliver O'Halloran <oohall at gmail.com>
>> Signed-off-by: Vasant Hegde <hegdevasant at linux.vnet.ibm.com>
>> Acked-by: Jeremy Kerr <jk at ozlabs.org>
>> ---
>>   core/opal-msg.c                | 66 ++++++++++++++++++++++++++++++------------
>>   core/test/run-msg.c            |  6 ++--
>>   doc/opal-api/opal-messages.rst |  2 +-
>>   include/opal-api.h             |  2 +-
>>   4 files changed, 53 insertions(+), 23 deletions(-)
>>
>> diff --git a/core/opal-msg.c b/core/opal-msg.c
>> index 907a9e0af..af1ec7d00 100644
>> --- a/core/opal-msg.c
>> +++ b/core/opal-msg.c
>> @@ -25,6 +25,7 @@
>>   struct opal_msg_entry {
>>   	struct list_node link;
>>   	void (*consumed)(void *data, int status);
>> +	bool extended;
>>   	void *data;
>>   	struct opal_msg msg;
>>   };
>> @@ -39,37 +40,47 @@ int _opal_queue_msg(enum opal_msg_type msg_type, void *data,
>>   		    size_t params_size, const void *params)
>>   {
>>   	struct opal_msg_entry *entry;
>> +	uint64_t entry_size;
>> +
>> +	if ((params_size + OPAL_MSG_HDR_SIZE) > OPAL_MSG_SIZE) {
>> +		prlog(PR_DEBUG, "param_size (0x%x) > opal_msg param size (0x%x)\n",
>> +		      (u32)params_size, (u32)(OPAL_MSG_SIZE - OPAL_MSG_HDR_SIZE));
>> +		return OPAL_PARAMETER;
>> +	}
>>   
>>   	lock(&opal_msg_lock);
>>   
>> -	entry = list_pop(&msg_free_list, struct opal_msg_entry, link);
>> -	if (!entry) {
>> -		prerror("No available node in the free list, allocating\n");
>> -		entry = zalloc(sizeof(struct opal_msg_entry));
>> +	if (params_size > OPAL_MSG_FIXED_PARAMS_SIZE) {
>> +		entry_size = sizeof(struct opal_msg_entry) + params_size;
>> +		entry_size -= OPAL_MSG_FIXED_PARAMS_SIZE;
>> +		entry = zalloc(entry_size);
>> +		if (entry)
>> +			entry->extended = true;
>> +	} else {
>> +		entry = list_pop(&msg_free_list, struct opal_msg_entry, link);
>>   		if (!entry) {
>> -			prerror("Allocation failed\n");
>> -			unlock(&opal_msg_lock);
>> -			return OPAL_RESOURCE;
>> +			prerror("No available node in the free list, allocating\n");
>> +			entry = zalloc(sizeof(struct opal_msg_entry));
> 
> I'm tempted to say we should switch to using the pool allocator for
> these and hard failing when we run out. This could come in a separate
> patch though.

Agreed. Makes sense to control number of allocation. I will send separate patch 
for that.

-Vasant