RFC for Telemetry data collection
Rick Altherr
raltherr at google.com
Fri Sep 8 04:41:48 AEST 2017
I have many opinions on telemetry data formats and APIs. What I'm seeing
in your proposal looks pretty good with some subtlety in the details. For
example, I expect to collect most data at least once-per-second, not log
anything locally, and not alert. I'll do all aggregation and thresholding
at a higher level in the software stack. I also, ideally, want very
descriptive information about where in the system the sensor is. I've
attached a screenshot of what our existing host-based reporting software
makes available to higher-level software. This is a view via the
human-readable web interface, the data is normally served via protobufs.
Rick
On Thu, Sep 7, 2017 at 8:20 AM, tomjose <tomjose at linux.vnet.ibm.com> wrote:
> Hello,
>
> I am working on the issue (https://github.com/openbmc/openbmc/issues/1957)
> to design a telemetry application for the OpenBMC. I would be explaining a
> rough idea of how we plan to go about. Please share your thoughts and
> feedback on this proposal. This issue would depend on the design evolving
> out of following issues, since this app would utilize the capabilities
> provided. (https://github.com/openbmc/openbmc/issues/1856,
> https://github.com/openbmc/openbmc/issues/2102).
>
> Summary of the requirements that we came across relevant to this
> discussion.
>
>
> 1) BMC telemetry data (example VRM rail voltages) where the data is
> collected at different rates depending on the data and aggregated by the
> BMC app (minimum, maximum
> and average). Based on the collection timing request(frequency) the
> metrics are logged, so that the user can fetch it for analytics.
>
> 2) Users should be able to set thresholds for the temperature limits, and
> receive alerts. This would allow user to plan the cooling needs.
>
> 3) BMC would act as route for the OCC metrics to be send to the user. The
> OCC would send down telemetric data to the BMC and BMC should figure out a
> way to
> alert the user to consume this data.
>
>
> We would keep the focus of the discussion on the requirement no 1.
> This proposal presupposes that all the resources( example VRM rail
> voltages, ambient temperature) that the telemetry app is interested in,
> should be populated as dbus objects, which can
> be queried to read the instantaneous values. phosphor-hwmon application
> exposes many of the interested resources.
>
> The idea is to have a yaml based approach, where the policy of the
> telemetry app will be expressed. The application would be able to consume
> the yaml and initiate the telemetry
> data collection. The yaml would express the following:
>
> a) Dbus Info (object, interface, property) associated with the resource.
> b) Units associated with the value (celsius) and the associated scaling
> factor).
> c) Granularity - the time between two measures.
> d) Aggregation methods - min,max,avg..etc.
> e) Logging policy - frequency for creating an event and alerting the user.
>
> The application would operate based on the policy and log the telemetry
> data. The details of logging would evolve as we progress on the related
> issue.
>
> Regards,
> Tom
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20170907/0b6d9706/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zaius-fan-telemetry.png
Type: image/png
Size: 73436 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20170907/0b6d9706/attachment-0001.png>
More information about the openbmc
mailing list