[Skiboot] [PATCH 1/2] opal: Fix hang in time_wait* calls on HMI for TB errors.

Stewart Smith stewart at linux.vnet.ibm.com
Tue Sep 15 11:19:16 AEST 2015


Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>
> On TOD/TB errors timebase register stops/freezes until HMI error recovery
> gets TOD/TB back into running state. However, while HMI recovery is in
> progress there are chances where some code path may invoke time_wait*()
> calls which depends on running TB value. In an event of TB not moving,
> time_wait* calls would keep looping resulting into a hang on that CPU.
>
> On OpenPower systems we are seeing system hang on TOD/TB errors. The hang
> is seen inside OPAL HMI handler while invoking prlog/perror(). The reason
> is, on OpenPower systems prlog/perror() depends on LPC UART console
> driver to flush log messages to the console. UART read/write calls invoke
> time_wait_nopoll() inside opb_[read|write]() functions. When TB is in
> stopped state this causes a hang in prlog/perror() calls.
>
> This patch fixes this issue by modifying time_wait_[no]poll() to check
> for TB validity and return immediately.

Thanks! Applied as of 1764f24 and heading to skiboot-5.1.3



More information about the Skiboot mailing list