Platform telemetry and health monitoring

Neeraj Ladkani neladk at microsoft.com
Tue Jun 25 17:43:48 AEST 2019


This is good stuff Paul. Thank you for detailed explanation in today's call. if you can share the Mock up, it would be great. 

I'll share notes soon. 

Neeraj

-----Original Message-----
From: openbmc <openbmc-bounces+neladk=microsoft.com at lists.ozlabs.org> On Behalf Of Paul.Vancil at dell.com
Sent: Monday, June 24, 2019 12:34 PM
To: openbmc at lists.ozlabs.org
Subject: Re: Platform telemetry and health monitoring

Re Redfish support for Telemetry,
Deepak noted that Redfish had a Telemetry schema that is a work-in-progress (wip).
Actually, Redfish Telemetry was release as part of the 2018.2 release in August 2018, and is being implemented by some BMCs now.
See:  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dmtf.org%2Fcontent%2Fnew-redfish-release-adds-openapi-30-support-telemetry&data=02%7C01%7Cneladk%40microsoft.com%7Ca3197c4ea71c426c110f08d6f921c441%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636970320773102040&sdata=2U11E0iOrhnRGxl8FgmBrjyaPzHKbWajeT509U5tmXw%3D&reserved=0
And  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dmtf.org%2Fsites%2Fdefault%2Ffiles%2FRedfish_2018_Release_2_Overview.pdf&data=02%7C01%7Cneladk%40microsoft.com%7Ca3197c4ea71c426c110f08d6f921c441%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636970320773102040&sdata=%2B4JjaeCJ870NOUS%2F7%2FhKSu6TFeexSayU45N08YYj0dM%3D&reserved=0    (slides 2, 7 )
The White Paper Deepak referred to was released as a wip earlier in the year.   It was not updated but is accurate as a general overview.
There is a public-telemetry-mockup that is nice for understanding the model, but not yet published.   We could push them to do that soon.
In Redfish, there are:
   --MetricDefinitions -- that define a metric property (eg minimumConsumedWatts being the min of power consumption over an interval)
   --MetricReportDefinitions -- define a metric report consisting of a set of MetricProperties, what triggers the generation of the report (eg scheduled, on trigger...), and how to send the report (log to metricReports Collection, send RedfishEvent, etc)
   --MetricReports -- the report --which can is logged or sent as an event
   --MetricTriggers -- defines triggers can can trigger a metric report creation eg a sensor crossing a threshold etc

Metric data can be collected by the BMC, and then read by a client with Redfish GET requests, or can be sent autonomously as RedfishEvents.

The data is JSON encoded and formatted along the lines of Redfish responses, but the reports generally only contains the relevant telemetry data (with minimal describing metadata) since the descriptive metadata is all defined by the MetricReportDefinitions and MetricDefinitions that are associated with the report.
 
The Redfish Telemetry model is very general in nature I think--and thus supports about any type of metric or telemetry data one might want.
So this is worth a strong consideration as the basis for OpenBMC telemetry.

Note that the model does support allowing users to define metric reports (based on supported Redfish properties), however it does not require allowing users to define custom reports (which could be complicated to implement).
I think most early implementations will support some set of pre-defined MetricReportDefinitions.
However the DMTF has not officially published any 'standard' Report Definitions.

Thanks,  Paul Vancil   --Dell ESI


----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Jun 2019 14:54:35 +0530
From: Deepak Kodihalli <dkodihal at linux.vnet.ibm.com>
To: Neeraj Ladkani <neladk at microsoft.com>, OpenBMC Maillist
	<openbmc at lists.ozlabs.org>
Subject: Re: Platform telemetry and health monitoring
Message-ID: <582a29cf-e3bf-f7d3-2e78-c743c3a6a2d2 at linux.vnet.ibm.com>
Content-Type: text/plain; charset=utf-8; format=flowed

On 19/06/19 11:11 AM, Neeraj Ladkani wrote:
> In last meeting, we discussed that telemetry data can be collected 
> using ?tools? and exported using binary ?blobs?,
> 
> Should we define a standard data format so that information can be parsed through standard mechanism and help taking specific actions.
> 
> Host CPU
> Memory
> Network Adapter
> GPUs/IPUs
> BMCs
> 

Redfish has a work-in-progress [1] Telemetry schema, and I believe via this schema, one can pull out telemetry information, as well as have the BMC push out Telemetry information via server-sent events. Are you looking at this as an option?

[1]
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dmtf.org%2Fdocuments%2Fredfish-spmf%2Fredfish-telemetry-white-paper-010a&data=02%7C01%7Cneladk%40microsoft.com%7Ca3197c4ea71c426c110f08d6f921c441%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636970320773102040&sdata=YhukH6juS%2FbWyFd7PngBHmbHwsY98%2FEVe2zYuYb2d%2Fs%3D&reserved=0

Regards,
Deepak

> Thanks
> Neeraj




More information about the openbmc mailing list