Platform telemetry and health monitoring

Neeraj Ladkani neladk at microsoft.com
Wed Jun 19 15:41:26 AEST 2019


In last meeting, we discussed that telemetry data can be collected using “tools” and exported using binary “blobs”, 

Should we define a standard data format so that information can be parsed through standard mechanism and help taking specific actions.  

Host CPU
Memory
Network Adapter 
GPUs/IPUs
BMCs


Thanks
Neeraj

From: openbmc <openbmc-bounces+neladk=microsoft.com at lists.ozlabs.org> On Behalf Of Neeraj Ladkani
Sent: Tuesday, June 18, 2019 1:59 PM
To: OpenBMC Maillist <openbmc at lists.ozlabs.org>
Subject: RE: Platform telemetry and health monitoring

1. How do define what data to be collected and how ?  We need a way to let BMC know 
a. What data to read ? 
b. When to read ?
c. How to read ? 

2. Does redfish support  pulling telemetry from system? 

Neeraj


On 6/12/19 11:58 AM, Neeraj Ladkani wrote:
Thanks Kun for summarizing notes. 
 
For detailed notes: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenbmc%2Fopenbmc%2Fwiki%2FPlatform-telemetry-and-health-monitoring-Work-Group&data=02%7C01%7Cneladk%40microsoft.com%7C225d47c235d34091997908d6f42fe96d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964883955943283&sdata=QTPUp5lXfgVr0xdkUKzxDiiDg5gjikkLL9Kd0IM%2Bouc%3D&reserved=0
 
Neeraj
 
From: openbmc mailto:openbmc-bounces+neladk=microsoft.com at lists.ozlabs.org On Behalf Of Kun Yi
Sent: Tuesday, June 11, 2019 11:24 AM
To: Alexander Amelkin mailto:a.amelkin at yadro.com
Cc: OpenBMC Maillist mailto:openbmc at lists.ozlabs.org
Subject: Re: Platform telemetry and health monitoring
 
Neeraj mentioned he will send out the meeting minutes. He will also look into setting up a wiki page holding the contents as well as minutes.
 
A few quick notes from top of my head from the kick-off meeting:
- did a round table, all the orgs have similar requirements
- need to look into how existing infra fit into the needs and what falls short
- will have workstreams for:
    - what to collect
    - how to collect
    - how to store
    - how to export
- collectd sounds interesting and promising for collecting metrics
- IPMI SELs have limitations as an event reporting mechanism, possibly need to have a new events or error log reporting mechanism to aggregate fault logs from different components
- will need to look into Redish and expand the specs as necessary to fit our needs
 
On Tue, Jun 11, 2019 at 2:02 AM Alexander Amelkin <mailto:a.amelkin at yadro.com> wrote:
I second the idea of reusing collectd. It's pretty standard and popular.

With best regards,
Alexander Amelkin,
Leading BMC Software Engineer, YADRO
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyadro.com&data=02%7C01%7Cneladk%40microsoft.com%7C225d47c235d34091997908d6f42fe96d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964883955953276&sdata=8IxLbR1UB54kFYjrmRtbNESRZd8m5jllg5x4dXateIs%3D&reserved=0

05.06.2019 15:49, Brad Bishop wrote:
> On Tue, Jun 04, 2019 at 12:35:05PM -0700, Kun Yi wrote:
>> FYI: Srinivas, Neeraj, and I are finalizing a time slot for the kick off
>> meeting. We are thinking about a bi-weekly discussion.
>>
>> Also, I'm drafting a version of BMC metrics collection daemon. The first
>> draft is up on https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerrit.openbmc-project.xyz%2Fc%2Fopenbmc%2Fdocs%2F%2B%2F22257&data=02%7C01%7Cneladk%40microsoft.com%7C225d47c235d34091997908d6f42fe96d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964883955953276&sdata=clssDe%2ByvBJ4LCqtfdO89McP11phSeEHbBXSvzTGtps%3D&reserved=0,
>> which we probably will go over during the meeting.
>
> I just wanted to point out the collectd project:  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcollectd.org%2F&data=02%7C01%7Cneladk%40microsoft.com%7C225d47c235d34091997908d6f42fe96d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636964883955963271&sdata=fXghb%2BFr25M8zzGiesmd9K2JjfMP8pNyYjXTBwi3yiw%3D&reserved=0
>
> I'm not sure if it is suitable or not but it seems like a pretty close match to what you are trying to do and it would be a lot of code you don't have to write.
>
> Just something to consider.
>
> thx - brad


 
-- 
Regards,
Kun


More information about the openbmc mailing list