[Skiboot] [PATCH 00/12] opal: Improve TOD topology failover.

Mahesh J Salgaonkar mahesh at linux.vnet.ibm.com
Sat Mar 28 20:34:17 AEDT 2015


This patch series improves handling of TOD topology failover. The existing
code does not handle TOD error recovery efficiently. It triggers a
topology switch even if it is not required to do so. This new patchset
fixes that. It introduces detection of sync/step network status and
improved checks to decide whether topology switch is required or not.

A master-slave relationship among the chips is assigned at IPL time
based on the system configuration. The master chip drives the step pulses
to the slave chips. Similarly, primary/secondary topology configurations
on selected chips is also assigned at IPL time. One of the topology acts
as Active master and the other as Backup master providing redundancy during
TOD topology failure.

In the event of failure, check the status of sync/step network on
Active master. If not running, then trigger a topology switch to recover.
Once the topology switch is over inform the FSP/PRD to analyze the TOD
error on old master and fix/re-configure new backup master topology.

Once FSP fixes/re-configures the new backup master topology it sends out
mailbox command (xE6, s/c 0x06, mod 0) to enable new backup topology.
(NOTE: We need to decide how do we handle this on non-FSP based system)

OPAL layer now maintains additional information about the TOD topology
configurations that improves the TOD error recovery.

Additionally This patch implements recovery for TOD register parity errors
as well.

Initial set of patches (1 through 5) sets up the framework to capture the
necessary topology info to keep OPAL up-to-date with current topology
configurations.

The patch 6 through 10 refactors the TOD failover recovery mechanism that
includes detection of sync/step network status, topology switch and
fixing new backup topology.

The last two pacthes adds the recovery for parity errors on TOD control
registers.

Thanks,
-Mahesh.

---

Mahesh Salgaonkar (12):
      opal: Query current TOD topology during chiptod init.
      opal: Identify chip role for a given topology.
      opal: Query chip TOD status for a given topology.
      opal: Introduce a function to check sync/step network status.
      opal: Modify chiptod_running_check() function to take chip id as argument.
      opal: Re-sync failed chiptod with neighboring chiptod.
      opal: Refactor TOD topology failover recovery.
      opal: Check if backup master is valid before topology switch.
      opal: Inform fsp about the topology switch.
      opal: Enable backup topology.
      opal: Cache chipTOD control register values.
      opal: Recover from TOD register parity errors.


 hw/chiptod.c               |  784 +++++++++++++++++++++++++++++++++++++++++++-
 include/fsp.h              |    9 +
 platforms/ibm-fsp/common.c |    3 
 3 files changed, 775 insertions(+), 21 deletions(-)

--
Signature



More information about the Skiboot mailing list