RFC for Telemetry data collection
Deepak Kodihalli
dkodihal at linux.vnet.ibm.com
Wed Mar 14 01:50:33 AEDT 2018
On 13/03/18 7:53 pm, Kurt Taylor wrote:
> Hi,
>
> I'd like to bump this topic and add some more details. I'd like to
> discuss design proposals/directions for a couple things :
>
> 1) A short/mid term proposal for telemetry requirements specific to
> IBM labs (which need to be implemented in a relatively short span of
> time, so there may not be the bandwidth to write an entirely new
> application not based on D-Bus or the OpenBMC REST API).
> 2) Industry standard methods for storing and retrieving telemetry
> data - thoughts on how to get here.
>
>
> 1) Telemetry requirements specific to IBM labs
> Here are the requirements and a design proposal.
>
> a) Instantaneous readings, such as temperatures, currents, errors,
> events etc. Let's call this Layer 0.
>
> Proposal:
> - The D-Bus model is the source for instantaneous readings. This
> means there would be D-Bus objects representing this data, and hence
> an OpenBMC REST API around it.
> - These D-Bus objects would not necessarily implement the same D-Bus
> interfaces.
> - Interested clients can read these D-Bus objects via the OpenBMC
> REST API.
> - If clients are interested in being notified about "changes" to the
> readings, that's possible via the existing event notification over
> WebSockets mechanism.
>
>
> This would also map well into an OBMC MIB extension for example.
>
>
>
> b) Instantaneous aggregations - this would mostly apply to, but may
> not be limited to, readings such as temperatures and currents. Let's
> call this Layer 1. This basically is to solve, for eg, "what is the
> min/max/average over the last X seconds?". We have a requirement to
> do such aggregations on the BMC.
>
>
> I would be interested in why aggregations (and historical - level 3) are
> a requirement and not just handled by the monitoring/event management
> app as done in network management.If this work is to be done in the
> BMC, it needs to be user definable and able to be turned off for
> resource-critical situations.
Right, it should be possible to turn off the layer 2 and 3 aggregation
apps, and not have them in the BMC image at all.
Why the aggregations are required to be done on the BMC - I think that's
the expectation of some of the IBM monitoring tools. I'm sure Todd
Rosedahl would have a better answer here.
>
> Proposal:
> - Aggregations are represented as D-Bus objects, created by a
> telemetry app. For eg if we need to know the min/max/avg ambient
> temp for the last 5 minutes, and say the the ambient temp is usually
> at temps/ambient, the aggregation could be at
> temps/aggregations/ambient.
> - Implement D-Bus interfaces to denote aggregations, for eg the
> temps/aggregation/ambient object could implement a D-Bus interface
> describing min/max/avg properties.
> - Aggregation objects will have the values as described in the D-Bus
> interface (such as min/max/avg), and a timestamp, as properties.
> - Enable a config (eg JSON) to let the telemetry app know things
> like : What (supported) aggregations should be performed
> (min/max/avg)? What D-Bus objects should be aggregated? How
> frequently should they be aggregated? What should be the paths of
> the aggregations? Potentially add a REST API to allow changing the
> (JSON) config at run-time.
> - It will be possible to read all aggregation objects, or
> aggregation objects of a specific type via one REST call.
>
>
> c) Historical aggregations or snapshot. Let's call this Layer 2.
> This is to solve, for eg, "Need a reading corresponding to every X
> minutes in a period of Y hours". This can be a snapshot of Layer 1
> or Layer 0 D-Bus objects. We have a requirement to store this
> snapshot on the BMC.
>
> Proposal:
> - The snapshot will be represented as a set of D-Bus objects. For eg
> if one needs an hourly reading for a period of 24 hours, the objects
> could be at temps/aggregations/ambient/per-hour/{1..24}.
> - Enable a config to let a telemetry app to know things like : What
> D-Bus objects should I keep a history of? What is the duration of
> the snapshot? At what frequency should entries be added into the
> snapshot? Once the snapshot is full, should the entries roll, or
> should we restart? Potentially add a REST API to allow changing the
> (JSON) config at run-time.
> - The historical aggregations can be read via one REST call. It
> should be one D-Bus call as well most likely for the REST server, if
> there's an object manager at temps/aggregations/ambient/per-hour for eg.
> - These objects in the snapshot will implement the same interfaces
> as Layer 1 objects, so they will have the same properties (eg
> min/max/avg, timestamp).
>
>
> d) Some notes
> - With the proposal above, the API to retrieve the telemetry data is
> via the current OpenBMC REST API, so it may not readily work with
> telemetry tools relying on industry-standard API (see point 2
> below), but it seems to be the feasible option to rely on to
> implement IBM's requirements in the expected timelines.
> - Layer 1 and Layer 2 telemetry apps would be different processes,
> and can function independent of each other.
>
>
>
> 2) Industry standard methods for storing and retrieving telemetry data
>
> - With the proposal above, the instantaneous readings are D-Bus
> objects, the instantaneous and historical aggregations are D-Bus
> objects as well. The API is the OpenBMC REST API.
> - Typically, aggregations may not have to happen on the BMC, in
> which case one can turn off layers 1 and 2.
> - This is regarding how the telemetry data is presented, and how
> we'd eventually not use the current OpenBMC REST API in production.
> I've heard (mostly from people on the To: list) of the following
> industry-standard ways to represent/retrieve telemetry data. This
> would mean transforming layer 0 D-Bus objects into these :
> - Via Redfish (events) API
> - Via IPMI events/PEF
>
>
> Meh. I'd stick with Redfish/OBMC REST API over this one.
>
> - Via SNMP traps
>
>
> If there is interest here, I have experience designing MIB extensions
> and sub-agents to support them.
>
> - Via an sqlite db, and have something like Logstash parse it
>
>
> Seems very heavy for BMC.
I tend to agree.
> Kurt Taylor (krtaylor)
>
Regards,
Deepak
More information about the openbmc
mailing list