[Skiboot] skiboot 5.4.0-rc1 tagged

Stewart Smith stewart at linux.vnet.ibm.com
Mon Oct 17 15:40:33 AEDT 2016


Hi all,

I just tagged and pushed skiboot-5.4.0-rc1.

Release notes below:

skiboot-5.4.0-rc1
=================

skiboot-5.4.0-rc1 was released on Monday October 17th 2016. It is the first
release candidate of skiboot 5.4, which will become the new stable release
of skiboot following the 5.3 release, first released August 2nd 2016.

skiboot-5.4.0-rc1 contains all bug fixes as of :ref:`skiboot-5.3.7`
and :ref:`skiboot-5.1.18` (the currently maintained stable releases).

For how the skiboot stable releases work, see :ref:`stable-rules` for details.

The current plan is to release a new release candidate every week until we
feel good about it. The aim is for skiboot-5.4.x to be in op-build v1.13, which
is due by November 23rd 2016.

Over skiboot-5.3, we have the following changes:

New Features
------------
- Initial Trusted Boot support (see :ref:`stb-overview`).
  There are several limitations with this initial release:

    - CAPP partition is not measured correctly
    - Only Nuvoton TPM 2.0 is supported
    - Requires hardware rework on late revision Habanero or Firestone boards
      in order to install TPM.

  - Add i2c Nuvoton TPM 2.0 Driver
  - romcode driver for POWER8 secure ROM
  - See Device tree docs for tpm and ibm,secureboot nodes
  - See main secure and trusted boot documentation.


- Fast reboot for P8

  This makes reboot take an *awful* lot less time, somewhere between four
  and ten times faster than a full IPL. It is currently experimental and not
  enabled by default.
  You can enable the experimental support via nvram option: ::

   # nvram -p ibm,skiboot --update-config experimental-fast-reset=feeling-lucky

  **WARNING**: This has *known* bugs. For example, if you have used a device
  in CAPI mode, we will currently *NOT* reset it back to plain PCI. There
  are also some known issues in most simulators.

- Support ``ibm,skiboot`` NVRAM partition with skiboot configuration options.

  - These should generally only be used if you either completely know what
    you are doing or need to work around a skiboot bug. They are **not**
    intended for end users.
  - Add support for supplying the kernel boot arguments from the ``bootargs``
    configuration string in the ``ibm,skiboot`` NVRAM partition.
  - Enabling the experimental fast reset feature is done via this method.

- Add support for nap mode on P8 while in skiboot

  - While nap has been exposed to the Operating System since day 1, we have
    not utilized low power states when in skiboot itself, leading to higher
    power consumption during boot.
    We only enable the functionality after the 0x100 vector has been
    patched, and we disable it before transferring control to Linux.

- libflash: add 128MB MX66L1G45G part

- Pointer validation of OPAL API call arguments.

  - If the kernel called an OPAL API with vmalloc'd address
    or any other address range in real mode, we would hit
    a problem with aliasing. Since the top 4 bits are ignored
    in real mode, pointers from 0xc.. and 0xd.. (and other ranges)
    could collide and lead to hard to solve bugs. This patch
    adds the infrastructure for pointer validation and a simple
    test case for testing the API
  - The checks validate pointers sent in using ``opal_addr_valid()``

Documentation
-------------

There have been a number of documentation fixes this release. Most prominent
is the switch to Sphinx (from the Python project) and ReStructured Text (RST)
as the documentation format. RST and Sphinx enable both production of pretty
documentation in HTML and PDF formats while remaining readable in their raw
form to those with no knowledge of RST.

You can build a HTML site by doing the following: ::

 cd doc/
 make html

As always, documentation patches are very, *very* welcome as we attempt to
document the OPAL API, the device tree bindings and important parts of
OPAL internals.

We would like the Device Tree documentation to follow the style that can be
included in the Device Tree Specification.


General
-------
- Make console-log time more readable: seconds rather than timebase
  Log format is now ``[SECONDS.(tb%512000000),LEVEL]``

- Flash (PNOR) code improvements

  - flash: Make size 64 bit safe
    This makes the size of flash 64 bit safe so that we can have flash
    devices greater than 4GB. This is especially useful for mambo disks
    passed through to Linux.
  - core/flash.c: load actual partition size
    We are downloading 0x20000 bytes from PNOR for CAPP, but currently the
    CAPP lid is only 40K.
  - flash: Rework error paths and messages for multiple flash controllers
    Now that we have mambo bogusdisk flash, we can have many flash chips.
    This is resulting in some confusing output messages.

- core/init: Fix "failure of getting node in the free list" warning on boot.
- slw: improve error message for SLW timer stuck

- Centaur / XSCOM error handling

  - print message on disabling xscoms to centaur due to many errors
  - Mark centaur offline after 10 consecutive access errors

- XSCOM improvements

  - xscom: Map all HMER status codes to OPAL errors
  - xscom: Initialize the data to a known value in ``xscom_read``
    In case of error, don't leave the data random. It helps debugging when
    the user fails to check the error code. This happens due to a bug in the
    PRD wrapper app.
  - chip: Add a quirk for when core direct control XSCOMs are missing

- p8-i2c: Don't crash if a centaur errored out

- cpu: Make endian switch message more informative
- cpu: Display number of started CPUs during boot
- core/init: ensure that HRMOR is zero at boot
- asm: Fix backtrace for unexpected exception

- cpu: Remove pollers calling heuristics from ``cpu_wait_job``
  This will be handled by ``time_wait_ms()``. Also remove a useless
  ``smt_medium()``.
  Note that this introduce a difference in behaviour: time_wait
  will only call the pollers on the boot CPU while ``cpu_wait_job()``
  could call them on any. However, I can't think of a case where
  this is a problem.

- cpu: Remove global job queue
  Instead, target a specific CPU for a global job at queuing time.
  This will allow us to wake up the target using an interrupt when
  implementing nap mode.
  The algorithm used is to look for idle primary threads first, then
  idle secondaries, and finally the less loaded thread. If nothing can
  be found, we fallback to a synchronous call.
- lpc: Log LPC SYNC errors as unrecoverable ones for manufacturing
- lpc: Optimize SerIRQ dispatch based on which PSI IRQ fired
- interrupts: Add new source ``->attributes()`` callback
    This allows a given source to provide per-interrupt attributes
    such as whether it targets OPAL or Linux and it's estimated
    frequency.

    The former allows to get rid of the double set of ops used to
    decide which interrupts go where on some modules like the PHBs
    and the latter will be eventually used to implement smart
    caching of the source lookups.
- opal/hmi: Fix a TOD HMI failure during a race condition.
- platform: Add BT to Generic platform


NVRAM
-----
- Support ``ibm,skiboot`` partition for skiboot specific configuration options
- flash: Size NVRAM based on ECC for OpenPOWER platforms
    If NVRAM has ECC (as per the ffs header) then the actual size of the
    partition is less than reported by the ffs header in the PNOR then the
    actual size of the partition is less than reported by the ffs header.

NVLink/NPU
----------

- Fix reserved PE#
- NPU bdfn allocation bugfix
- Fix bad PE number check
    NPUs have 4 PEs which are zero indexed, so {0, 1, 2, 3}.  A bad PE number
    check in npu_err_inject checks if the PE number is greater than 4 as a
    fail case, so it would wrongly perform operations on a non-existant PE 4.
- Use PCI virtual device
- assert the NPU irq min is aligned.
- program NPU BUID reg properly
- npu: reword "error" to indicate it's actually a warning
   Incorrect FWTS annotation.
   Without this patch, you get spurious FirmWare Test Suite (FWTS) warnings
   about NVLink not working on machines that aren't fully populated with
   GPUs.
- external: NPU hardware procedure script
   Performing NPU hardware procedures requires some config space magic.
   Put all that magic into a script, so you can just specify the target
   device and the procedure number.

PCI
---

- Generic fixes

  - Claim surprise hotplug capability
  - Reserve PCI buses for RC's slot
  - Update PCI topology after power change
  - Return slot cached power state
  - Cache power state on slot without power control
  - Avoid hot resets at boot time
  - Fix initial PCIe slot power state
  - Print CRS retry times
    It's useful to know the CRS retry times before the PCI device is
    detected successfully. In PCI hot add case, it usually indicates
    time consumed for the adapter's firmware to be partially ready
    (responsive PCI config space).
  - core/pci: Fix the power-off timeout in ``pci_slot_power_off()``
    The timeout should be 1000ms instead of 1000 ticks while powering
    off PCI slot in ``pci_slot_power_off()``. Otherwise, it's likely to
    hit timeout powering off the PCI slot as below skiboot logs reveal: ::

      [5399576870,5] PHB#0005:02:11.0 Timeout powering off slot

- PHB3

  - Override root slot's ``prepare_link_change()`` with PHB's
  - Disable surprise link down event on PCI slots
  - Disable ECRC on Broadcom adapter behind PMC switch

- astbmc platforms

  - Support dynamic PCI slot. We might insert a PCIe switch to PHB direct slot
    and the downstream ports of the PCIe switch supports PCI hotplug.


CAPI
----

- hw/phb3: Update capi initialization sequence
    The capi initialization sequence was revised in a circumvention
    document when a 'link down' error was converted from fatal to Endpoint
    Recoverable. Other, non-capi, register setup was corrected even before
    the initial open-source release of skiboot, but a few capi-related
    registers were not updated then, so this patch fixes it.

IPMI
----

- core/ipmi: Set interrupt-parent property
    This allows ipmi-opal to properly use the OPAL irqchip rather than
    falling back to the event interface in Linux.

Mambo Simulator
---------------

- Helpers for POWER9 Mambo.
- mambo: Advertise available RADIX page sizes
- mambo: Add section for kernel command line boot args
  Users can set kernel command line boot arguments for Mambo in a tcl
  script.
- mambo: add exception and qtrace helpers
- external/mambo: Update skiboot.tcl to add page-sizes nodes to device tree

Simics Simulator
----------------

- chiptod: Enable ChipTOD in SIMICS

Utilities
---------

- pflash

  - fix harmless buffer overflow: ``fl_total_size`` was ``uint32_t`` not ``uint64_t``.
  - Don't try to write protect when writing to flash file
  - Misc small improvements to code and code style
  - makefile bug fixes


- external/boot_tests

  - remove lid from the BMC after flashing
  - add the nobooting option -N
  - add arbitrary lid option -F

- ``getscom`` / ``getsram`` / ``putscom``: Parse chip-id as hex
    We print the chip-id in hex (without a leading 0x), but we fail to
    parse that same value correctly in ``getscom`` / ``getsram`` / ``putscom`` ::

     # getscom -l
     ...
     80000000 | DD2.0 | Centaur memory buffer
     # getscom -c 80000000 201140a
     Error -19 reading XSCOM

    Fix this by assuming base 16 when parsing chip-id.

PRD
---

- opal-prd: Fix error code from ``scom_read`` and ``scom_write``
- opal-prd: Add get_interface_capabilities to host interfaces
- opal-prd: fix for 64-bit pnor sizes
- occ/prd/opal-prd: Queue OCC_RESET event message to host in OpenPOWER
    During an OCC reset cycle the system is forced to Psafe pstate.
    When OCC becomes active, the system has to be restored to its
    last pstate as requested by host. So host needs to be notified
    of OCC_RESET event or else system will continue to remian in
    Psafe state until host requests a new pstate after the OCC
    reset cycle.

IBM FSP Based Platforms
-----------------------

- fsp/console: Allocate irq for each hvc console
    Allocate an irq number for each hvc console and set its interrupt-parent
    property so that Linux can use the opal irqchip instead of the
    OPAL_EVENT_CONSOLE_INPUT interface.
- platforms/firenze: Fix clock frequency dt property: ::

    [ 1.212366090,3] DT: Unexpected property length /xscom at 3fc0000000000/i2cm at a0020/clock-frequency

- HDAT: Fix typo in nest-frequency property
    nest-frquency -> nest-frequency
- platforms/ibm-fsp: Use power_ctl bit when determining slot reset method
    The power_ctl bit is used to represent if power management is available.
    If power_ctl is set to true, then the I2C based external power management
    functionality will be populated on the PCI slot. Otherwise we will try to
    use the inband PERST as the fundamental reset, as before.
- FSP/ELOG: Fix elog timeout issue
    Presently we set timeout value as soon as we add elog to queue. If
    we have multiple elogs to write, it doesn't consider queue wait time.
    Instead set timeout value when we are actually sending elog to FSP.
- FSP/ELOG: elog_enable flag should be false by default
    This issue is one of the corner case, which is related to recent change
    went upstream and only observed in the petitboot prompt, where we see
    only one error log instead of getting all error log in
    ``/sys/firmware/opal/elog``.



POWER9
------

- mambo: Make POWER9 look like DD2
- flash: Move flash node under ``ibm,opal/flash/``
    This changes the boot ABI, so it's only active for P9 and later systems,
    even though it's unrelated to hardware changes. There is an associated
    Linux change to properly search for this node as well.
- core/cpu.c: Add OPAL call to setup Nest MMU
- psi: On p9, create an interrupt-map for routing PSI interrupts
- lpc: Add P9 LPC interrupts support
- chiptod: Basic P9 support
- psi: Add P9 support

Testing and Debugging
---------------------

- test/qemu: bump qemu version used in CI, adds IPMI support
- platform/qemu: add BT and IPMI support
  Enables testing BT and IPMI functionality in the Qemu simulator
- init: In debug builds, enable debug output to console
- mem_region: Be a bit smarter about poisoning
    Don't poison chunks that are already free and poison regions on
    first allocation. This speeds things up dramatically.
- libc: Use 8-bytes stores for non-0 memset too
    Memory poisoning hammers this, so let's be a bit smart about it and
    avoid falling back to byte stores when the data is not 0
- fwts: add annotation for manufacturing mode
- check: Fix bugs in mem region tests
- Don't set -fstack-protector-all unconditionally
    We set it already in DEBUG builds and we use -fstack-protector-strong
    in release builds which provides most of the benefits and is more
    efficient.
- Build host programs (and checks) with debug enabled
    This enables memory poisoning in allocations and list checking
    among other things.
- Add global DEBUG make flag


Contributors
------------

Extending the analysis done for the last few releases, we can see our trends
in code review across versions:

======== ====== ======= ======= ======  ========
Release	 csets	Ack	Reviews	Tested	Reported
======== ====== ======= ======= ======  ========
5.0	 329	 15	     20	     1	       0
5.1	 372	 13	     38	     1	       4
5.2-rc1	 334	 20	     34	     6	      11
5.3-rc1  302     36          53      4         5
5.4-rc1  278      8          19      0         4
======== ====== ======= ======= ======  ========

This release has fewer changesets over previous 5.x first release candidates,
but that is not indicative of the size or complexity of these changes.


Processed 278 csets from 31 developers
A total of 17052 lines added, 4745 removed (delta 12307)

Developers with the most changesets

=========================== == =======
=========================== == =======
Stewart Smith               71 (25.5%)
Benjamin Herrenschmidt      50 (18.0%)
Claudio Carvalho            38 (13.7%)
Gavin Shan                  20 (7.2%)
Oliver O'Halloran           18 (6.5%)
Mukesh Ojha                  9 (3.2%)
Cyril Bur                    7 (2.5%)
Russell Currey               7 (2.5%)
Vasant Hegde                 7 (2.5%)
Pridhiviraj Paidipeddi       6 (2.2%)
Michael Neuling              6 (2.2%)
Alistair Popple              4 (1.4%)
Sam Mendoza-Jonas            3 (1.1%)
Vipin K Parashar             3 (1.1%)
Balbir Singh                 3 (1.1%)
Mahesh Salgaonkar            3 (1.1%)
Frederic Barrat              3 (1.1%)
Chris Smart                  2 (0.7%)
Jack Miller                  2 (0.7%)
Patrick Williams             2 (0.7%)
Jeremy Kerr                  2 (0.7%)
Suraj Jitindar Singh         2 (0.7%)
Milton Miller                2 (0.7%)
Shilpasri G Bhat             1 (0.4%)
Frederic Bonnard             1 (0.4%)
Joel Stanley                 1 (0.4%)
Breno Leitao                 1 (0.4%)
Anton Blanchard              1 (0.4%)
Nicholas Piggin              1 (0.4%)
Nageswara R Sastry           1 (0.4%)
Cédric Le Goater             1 (0.4%)
=========================== == =======

Developers with the most changed lines

========================= ==== =======
========================= ==== =======
Claudio Carvalho          6817 (38.2%)
Stewart Smith             4677 (26.2%)
Benjamin Herrenschmidt    2586 (14.5%)
Gavin Shan                1005 (5.6%)
Cyril Bur                  509 (2.9%)
Mukesh Ojha                361 (2.0%)
Oliver O'Halloran          343 (1.9%)
Russell Currey             343 (1.9%)
Balbir Singh               227 (1.3%)
Pridhiviraj Paidipeddi     194 (1.1%)
Michael Neuling            121 (0.7%)
Cédric Le Goater           115 (0.6%)
Vipin K Parashar            68 (0.4%)
Alistair Popple             66 (0.4%)
Vasant Hegde                65 (0.4%)
Shilpasri G Bhat            45 (0.3%)
Suraj Jitindar Singh        41 (0.2%)
Nicholas Piggin             34 (0.2%)
Sam Mendoza-Jonas           33 (0.2%)
Jack Miller                 32 (0.2%)
Nageswara R Sastry          32 (0.2%)
Jeremy Kerr                 23 (0.1%)
Mahesh Salgaonkar           21 (0.1%)
Chris Smart                 20 (0.1%)
Milton Miller               19 (0.1%)
Patrick Williams            11 (0.1%)
Frederic Barrat              6 (0.0%)
Anton Blanchard              3 (0.0%)
Frederic Bonnard             2 (0.0%)
Joel Stanley                 2 (0.0%)
Breno Leitao                 2 (0.0%)
========================= ==== =======

Developers with the most lines removed

========================= ==== =======
========================= ==== =======
Cyril Bur                  299 (6.3%)
========================= ==== =======

Developers with the most signoffs (total 226)

========================= ==== =======
========================= ==== =======
Stewart Smith              219 (96.9%)
Alistair Popple              4 (1.8%)
Cyril Bur                    1 (0.4%)
Jeremy Kerr                  1 (0.4%)
Benjamin Herrenschmidt       1 (0.4%)
========================= ==== =======

Developers with the most reviews (total 19)

========================= ==== =======
========================= ==== =======
Mukesh Ojha                  5 (26.3%)
Andrew Donnellan             4 (21.1%)
Vasant Hegde                 3 (15.8%)
Russell Currey               3 (15.8%)
Balbir Singh                 2 (10.5%)
Cyril Bur                    1 (5.3%)
Vaidyanathan Srinivasan      1 (5.3%)
========================= ==== =======

Developers with the most test credits (total 0)

Developers who gave the most tested-by credits (total 0)

Developers with the most report credits (total 4)

========================= ==== =======
========================= ==== =======
Benjamin Herrenschmidt       1 (25.0%)
Li Meng                      1 (25.0%)
Pridhiviraj Paidipeddi       1 (25.0%)
Gavin Shan                   1 (25.0%)
========================= ==== =======

Developers who gave the most report credits (total 4)

========================= ==== =======
========================= ==== =======
Gavin Shan                   1 (25.0%)
Vasant Hegde                 1 (25.0%)
Russell Currey               1 (25.0%)
Stewart Smith                1 (25.0%)
========================= ==== =======

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list