[Skiboot] [PATCH] FSP/CONSOLE: Workaround for unresponsive ipmi daemon

Stewart Smith stewart at linux.vnet.ibm.com
Fri Jun 9 15:13:48 AEST 2017


Vasant Hegde <hegdevasant at linux.vnet.ibm.com> writes:
> On 06/07/2017 12:20 PM, Vasant Hegde wrote:
>> We use TCE mapped area to write data to console. Console header
>> (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
>> next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
>>
>> Kernel makes opal_console_write() OPAL call to write data to console.
>> OPAL write data to TCE mapped area and sends MBOX command to FSP.
>> If our console becomes full and we have data to write to console,
>> we keep on waiting until FSP reads data.
>>
>> In some corner cases, where FSP is active but not responding to
>> console MBOX message (due to buggy IPMI) and we have heavy console
>> write happening from kernel, then eventually our console buffer
>> becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
>> kernel. Kernel will keep on retrying. This is creating kernel soft
>> lockups. In some extreme case when every CPU is trying to write to
>> console, user will not be able to ssh and thinks system is hang.
>>
>> If we reset FSP or restart IPMI daemon on FSP, system recovers and
>> everything becomes normal.
>>
>> This patch adds workaround to above issue by returning OPAL_HARDWARE
>> when cosole is full. Side effect of this patch is, we may endup dropping
>> latest console data. But better to drop console data than system hang.
>>
>> Alternative approach is to drop old data from console buffer, make space
>> for new data. But in normal condition only FSP can update 'next_out'
>> pointer and if we touch that pointer, it may introduce some other
>> race conditions. Hence we decided to just new console write request.
>
> Stewart,
>
> We have to backport this patch to 860.30 release as well.
>
> I think it will apply cleanly on 830.30 branch. Let me know in case if you want 
> me to send
> backported patch for 860.30 series.

860.30 is on skiboot 5.4.x

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list