[Skiboot] [PATCH v2 00/12] opal: Improve TOD topology failover.

Mahesh J Salgaonkar mahesh at linux.vnet.ibm.com
Sat Jun 6 04:04:46 AEST 2015

This patch series improves handling of TOD topology failover. The existing
code does not handle TOD error recovery efficiently. It triggers a
topology switch even if it is not required to do so. This new patchset
fixes that. It introduces detection of sync/step network status and
improved checks to decide whether topology switch is required or not.

A master-slave relationship among the chips is assigned at IPL time
based on the system configuration. The master chip drives the step pulses
to the slave chips. Similarly, primary/secondary topology configurations
on selected chips is also assigned at IPL time. One of the topology acts
as Active master and the other as Backup master providing redundancy during
TOD topology failure.

In the event of failure, check the status of sync/step network on
Active master. If not running, then trigger a topology switch to recover.
Once the topology switch is over inform the FSP/PRD to analyze the TOD
error on old master and fix/re-configure new backup master topology.

Once FSP fixes/re-configures the new backup master topology it sends out
mailbox command (xE6, s/c 0x06, mod 0) to enable new backup topology.
(NOTE: We need to decide how do we handle this on non-FSP based system)

OPAL layer now maintains additional information about the TOD topology
configurations that improves the TOD error recovery.

Additionally This patch implements recovery for TOD register parity errors
as well.

Initial set of patches (1 through 5) sets up the framework to capture the
necessary topology info to keep OPAL up-to-date with current topology

The patch 6 through 10 refactors the TOD failover recovery mechanism that
includes detection of sync/step network status, topology switch and
fixing new backup topology.

The last two pacthes adds the recovery for parity errors on TOD control

Changes in v2:
- Fixed few issues with TOD topology switch [07/12]
  - Stop TODs on all slave chips in backup topology (except backup
    master chip TOD) before triggering a switch.
  - During the topology switch step checkers are disabled and stays
    disabled even after the switch. This causes future step errors to go
    undetected. To fix this, Enable step checkers on all TODs after the
- Moved fsp specific code under hw/fsp/fsp-chiptod.c [10/12]


Mahesh Salgaonkar (12):
      opal: Query current TOD topology during chiptod init.
      opal: Identify chip role for a given topology.
      opal: Query chip TOD status for a given topology.
      opal: Introduce a function to check sync/step network status.
      opal: Modify chiptod_running_check() function to take chip id as argument.
      opal: Re-sync failed chiptod with neighboring chiptod.
      opal: Refactor TOD topology failover recovery.
      opal: Check if backup master is valid before topology switch.
      opal: Inform fsp about the topology switch.
      opal: Enable backup topology.
      opal: Cache chipTOD control register values.
      opal: Recover from TOD register parity errors.

 hw/chiptod.c               |  803 +++++++++++++++++++++++++++++++++++++++++++-
 hw/fsp/Makefile.inc        |    2 
 hw/fsp/fsp-chiptod.c       |   74 ++++
 include/chiptod.h          |    7 
 include/fsp.h              |    9 
 platforms/ibm-fsp/common.c |    3 
 6 files changed, 876 insertions(+), 22 deletions(-)
 create mode 100644 hw/fsp/fsp-chiptod.c


More information about the Skiboot mailing list