[Skiboot] skiboot 6.0.4

Stewart Smith stewart at linux.ibm.com
Mon May 28 17:51:16 AEST 2018


skiboot-6.0.4
*************

skiboot 6.0.4 was released on Monday May 28th, 2018. It replaces
skiboot-6.0.3 as the current stable release in the 6.0.x series.

It is recommended that 6.0.4 be used instead of any previous 6.0.x
version.

Over skiboot-6.0.3, we have two bug fixes: one helps with performance
(especially in HPC environments), and one is an opal-prd fix.

Changes are:

* SLW: Remove stop1_lite and stop2_lite

  stop1_lite has been removed since it adds no additional benefit over
  stop0_lite. stop2_lite has been removed since currently it adds
  minimal benefit over stop2. However, the benefit is eclipsed by the
  time required to ungate the clocks

  Moreover, Lite states don’t give up the SMT resources, can
  potentially have a performance impact on sibling threads.

  Since current OSs (Linux) aren’t smart enough to make good decisions
  with these stop states, we’re (temporarly) removing them from what
  we expose to the OS, the idea being to bring them back in a new DT
  representation so that only an OS that knows what to do will do
  things with them.

* opal-prd: Do not error out on first failure for soft/hard offline.

  The memory errors (CEs and UEs) that are detected as part of
  background memory scrubbing are reported by PRD asynchronously to
  opal-prd along with affected memory ranges. hservice_memory_error()
  converts these ranges into page granularity before hooking up them
  to soft/hard offline-ing infrastructure.

  But the current implementation of hservice_memory_error() does not
  hookup all the pages to soft/hard offline-ing if any of the page
  offline action fails. e.g hard offline can fail for:

  * Pages that are not part of buddy managed pool.

  * Pages that are reserved by kernel using memblock_reserved()

  * Pages that are in use by kernel.

  But for the pages that are in use by user space application, the
  hard offline marks the page as hwpoison, sends SIGBUS signal to kill
  the affected application as recovery action and returns success.

  Hence, It is possible that some of the pages in that memory range
  are in use by application or free. By stopping on first error we
  loose the opportunity to hwpoison the subsequent pages which may be
  free or in use by application. This patch fixes this issue.

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list