Proposal for operations on isolated hardware units using Redfish logging

dhruvaraj S dhruvaraj at gmail.com
Fri Dec 11 01:55:31 AEDT 2020


Hi,
Please find the option for operations on isolated hardware units using
Redfisg logging


Hardware Isolation
On systems with multiple processor units and other redundant vital resources,
the system downtime can be prevented by isolating the faulty hardware units.
Most of the actions required to isolate the parts will be dependent on
the architecture and
executed in the host. But the BMC needs to support a few steps like
provide a method to users to query the units in isolation, clearing
isolation, isolating a
suspected part, or isolating when the host is down due to a fault in a
critical unit.
Since a user interface is needed for the above actions proposing a method to use
Redfish log service to carry out these actions.

Requirements
When user requests, isolate a hardware unit.
Getting the list of all isolated resources.
Remove the isolation of a hardware unit.
Remove all existing isolation

Isolating a hardware unit:
redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware
{
  "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware",
  "@odata.type": "#LogService.v1_2_0.LogService",
  "Actions": {
    "#LogService.CollectDiagnosticData": {
      "target":
"/redfish/v1/Systems/system/LogServices/IsolatedHardware/Actions/LogService.CollectDiagnosticData"
    }
  },
  "Description": "Isolated Hardware",
  "Entries": {
    "@odata.id":
"/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries"
  },
  "Id": "IsolatedHardware",
  "Name": "Isolated Hardware LogService",
  "OverWritePolicy": "WrapsWhenFull"

Listing isolated hardware units.
redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware >> Entries
{
  "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries",
  "@odata.type": "#LogEntryCollection.LogEntryCollection",
  "Description": "Collection of Isolated Hardware Components",
  "Members": [
    {
      "@odata.id":
"/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries/1",
      "@odata.type": "#LogEntry.v1_7_0.LogEntry",
      "Created": "2020-10-15T10:30:08+00:00",
      "EntryType": "Event",
      "Id": "1",
      "Resolved": "false",
      "Name": "Processor 1",
      "links":  {
                 "OriginOfCondition": {
                        "@odata.id":
"/redfish/v1/Systems/system/Processors/cpu1"
                    },
      "Severity": "Critical",
       "SensorType" : "Processor",

 "AdditionalDataURI":
“/redfish/v1/Systems/system/LogServices/EventLog/attachement/111"
 “AddionalDataSizeBytes": "1024"

  }
  ],
  "Members at odata.count": 1,
  "Name": "Isolated Hardware Entries"

Users will be able to delete any entry or all the entries, but if an
isolated unit is serviced then that unit will be back in service, in
such cases the "Resolved" property in the entries will be marked as
"true"
"AdditionalDataURI" : This is a link to the error log associated with
this isolation action.
--------------
Dhruvaraj S


More information about the openbmc mailing list