[Skiboot] [PATCH v4 04/10] opal-msg: Enhance opal-get-msg API
Stewart Smith
stewart at linux.ibm.com
Thu May 30 13:40:58 AEST 2019
Vasant Hegde <hegdevasant at linux.vnet.ibm.com> writes:
> On 05/21/2019 12:50 PM, Stewart Smith wrote:
>> Vasant Hegde <hegdevasant at linux.vnet.ibm.com> writes:
>>
>>> Linux uses opal_get_msg (OPAL_GET_MSG) API to get OPAL messages. This interface
>>> supports upto 8 params (64 bytes). We have a requirement to send bigger data to
>>> Linux. This patch enhances OPAL to send bigger data to Linux.
>>>
>>> - Linux will use "opal-msg-size" device tree property to allocate memory for
>>> OPAL messages (previous patch increased "opal-msg-size" to 64K).
>>>
>>> - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux
>>> side opal_get_msg user can detect actual data size.
>>>
>>> - If buffer size < actual message size, then opal_get_msg will copy partial
>>> data and return OPAL_PARTIAL to Linux.
>>
>> Looking through the linux code, we should probably very carfully
>> document the expected behaviour for larger messages an the impact.
>
> Correct. Its already documented in OPAL.
>
> OPAL_PARTIAL
> If pending opal message is greater than supplied buffer.
> In this case the message is *DISCARDED* by OPAL.
>
> ...and this patch retains above behavior. i.e. we remove entry from list and return
> OPAL_PARTIAL to kernel.
Ahh excellent.
Oh look, some guy named Stewart wrote that nearly 3 years ago.
>> Specifically around what happens with OPAL_PARTIAL and ensuring we don't
>> just print error messages out from the kernel forever.
>
> It will print error message once and move on....as OPAL removes entry from list.
> (Same as existing behavior).
>
>>
>> Specifically, we seem to have two places where we consume messages in
>> Linux:
>>
>> in opal.c:
>> static void opal_handle_message(void)
>> {
>> ....
>> ret = opal_get_msg(__pa(&msg), sizeof(msg));
>> /* No opal message pending. */
>> if (ret == OPAL_RESOURCE)
>> return;
>>
>> /* check for errors. */
>> if (ret) {
>> pr_warn("%s: Failed to retrieve opal message, err=%lld\n",
>> __func__, ret);
>> return;
>> }
>>
>> So this gives me pause, as if we have a large message needing to be
>> retreived, existing kernels never will clear it, and spew pr_warns from
>> here to eternity.
>>
>> and in opal-hmi.c:
>>
>> if (unrecoverable) {
>> /* Pull all HMI events from OPAL before we panic. */
>> while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
>> .....
>> print_hmi_event_info(hmi_evt);
>>
>>
>> So if we have a large event in the middle of a bunch of fatal HMI info,
>> we'll lose the HMI event info.
>>
>> So I think we need to ensure two things:
>> 1) we don't lose unrecoverable HMIs
>> 2) existing kernels degrade gracefully.
>
> I think both are taken care.
Has it been tested?
>> I'm tempted to say we should switch to using the pool allocator for
>> these and hard failing when we run out. This could come in a separate
>> patch though.
>
> Agreed. Makes sense to control number of allocation. I will send separate patch
> for that.
ok.
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list