Platform telemetry and health monitoring

Matthew Barth msbarth at linux.ibm.com
Sat Jul 13 02:37:31 AEST 2019


There are a proposed design and associated dbus interfaces for power 
metrics in gerrit that I'd like to receive any feedback that anyone may 
have. Interfaces were created for providing average and maximum power 
consumed metrics over a configurable duration of time. These were 
intended to be flexible in providing these metrics to the end user thru 
any protocol able to access the dbus properties calculated by a BMC side 
application.

Design: https://gerrit.openbmc-project.xyz/23493

Dbus Intefaces: https://gerrit.openbmc-project.xyz/23405

Matt

On 6/25/19 2:43 AM, Neeraj Ladkani wrote:
> This is good stuff Paul. Thank you for detailed explanation in today's call. if you can share the Mock up, it would be great.
> 
> I'll share notes soon.
> 
> Neeraj
> 
> -----Original Message-----
> From: openbmc <openbmc-bounces+neladk=microsoft.com at lists.ozlabs.org> On Behalf Of Paul.Vancil at dell.com
> Sent: Monday, June 24, 2019 12:34 PM
> To: openbmc at lists.ozlabs.org
> Subject: Re: Platform telemetry and health monitoring
> 
> Re Redfish support for Telemetry,
> Deepak noted that Redfish had a Telemetry schema that is a work-in-progress (wip).
> Actually, Redfish Telemetry was release as part of the 2018.2 release in August 2018, and is being implemented by some BMCs now.
> See:  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dmtf.org%2Fcontent%2Fnew-redfish-release-adds-openapi-30-support-telemetry&data=02%7C01%7Cneladk%40microsoft.com%7Ca3197c4ea71c426c110f08d6f921c441%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636970320773102040&sdata=2U11E0iOrhnRGxl8FgmBrjyaPzHKbWajeT509U5tmXw%3D&reserved=0
> And  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dmtf.org%2Fsites%2Fdefault%2Ffiles%2FRedfish_2018_Release_2_Overview.pdf&data=02%7C01%7Cneladk%40microsoft.com%7Ca3197c4ea71c426c110f08d6f921c441%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636970320773102040&sdata=%2B4JjaeCJ870NOUS%2F7%2FhKSu6TFeexSayU45N08YYj0dM%3D&reserved=0    (slides 2, 7 )
> The White Paper Deepak referred to was released as a wip earlier in the year.   It was not updated but is accurate as a general overview.
> There is a public-telemetry-mockup that is nice for understanding the model, but not yet published.   We could push them to do that soon.
> In Redfish, there are:
>     --MetricDefinitions -- that define a metric property (eg minimumConsumedWatts being the min of power consumption over an interval)
>     --MetricReportDefinitions -- define a metric report consisting of a set of MetricProperties, what triggers the generation of the report (eg scheduled, on trigger...), and how to send the report (log to metricReports Collection, send RedfishEvent, etc)
>     --MetricReports -- the report --which can is logged or sent as an event
>     --MetricTriggers -- defines triggers can can trigger a metric report creation eg a sensor crossing a threshold etc
> 
> Metric data can be collected by the BMC, and then read by a client with Redfish GET requests, or can be sent autonomously as RedfishEvents.
> 
> The data is JSON encoded and formatted along the lines of Redfish responses, but the reports generally only contains the relevant telemetry data (with minimal describing metadata) since the descriptive metadata is all defined by the MetricReportDefinitions and MetricDefinitions that are associated with the report.
>   
> The Redfish Telemetry model is very general in nature I think--and thus supports about any type of metric or telemetry data one might want.
> So this is worth a strong consideration as the basis for OpenBMC telemetry.
> 
> Note that the model does support allowing users to define metric reports (based on supported Redfish properties), however it does not require allowing users to define custom reports (which could be complicated to implement).
> I think most early implementations will support some set of pre-defined MetricReportDefinitions.
> However the DMTF has not officially published any 'standard' Report Definitions.
> 
> Thanks,  Paul Vancil   --Dell ESI
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 20 Jun 2019 14:54:35 +0530
> From: Deepak Kodihalli <dkodihal at linux.vnet.ibm.com>
> To: Neeraj Ladkani <neladk at microsoft.com>, OpenBMC Maillist
> 	<openbmc at lists.ozlabs.org>
> Subject: Re: Platform telemetry and health monitoring
> Message-ID: <582a29cf-e3bf-f7d3-2e78-c743c3a6a2d2 at linux.vnet.ibm.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
> 
> On 19/06/19 11:11 AM, Neeraj Ladkani wrote:
>> In last meeting, we discussed that telemetry data can be collected
>> using ?tools? and exported using binary ?blobs?,
>>
>> Should we define a standard data format so that information can be parsed through standard mechanism and help taking specific actions.
>>
>> Host CPU
>> Memory
>> Network Adapter
>> GPUs/IPUs
>> BMCs
>>
> 
> Redfish has a work-in-progress [1] Telemetry schema, and I believe via this schema, one can pull out telemetry information, as well as have the BMC push out Telemetry information via server-sent events. Are you looking at this as an option?
> 
> [1]
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.dmtf.org%2Fdocuments%2Fredfish-spmf%2Fredfish-telemetry-white-paper-010a&data=02%7C01%7Cneladk%40microsoft.com%7Ca3197c4ea71c426c110f08d6f921c441%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636970320773102040&sdata=YhukH6juS%2FbWyFd7PngBHmbHwsY98%2FEVe2zYuYb2d%2Fs%3D&reserved=0
> 
> Regards,
> Deepak
> 
>> Thanks
>> Neeraj
> 
> 



More information about the openbmc mailing list