<font size=2 face="sans-serif">We do need to provide live reads of a subset

of the data, which I think is what Rick is describing below.  For

instance fan speeds, 30 second power averages, component temperatures,

etc.  Much like IPMI/DCMI implementations out there today.  And

these need to alert based on trip levels that are set.   Higher layers

of software can then act on this data or log it away as they see fit.  I

like the idea of rich meta-data around these values, but I would think

we would use Redfish as the method of exporting this data.</font><br><br><font size=2 face="sans-serif">We also need deep traces where the data

is gathered, processed, and logged locally by the BMC.  Then the BMC

should alert every X hours and the log should be collected by the higher

layer entity.  This would be for things like VRM currents on every

output (hourly min, max, average).  It is not required that any other

company use these deep telemetry logs, but they are required on our systems.</font><br><font size=2 face="sans-serif"><br>As far as #3 below, this should not be a new requirement.  Just export

the OCC telemetry log in the same way that you export all OCC/HOST logs.<br><br>Todd Rosedahl<br>IBM Power and Thermal Management<br>(507) 250-3275<br>rosedahl@us.ibm.com</font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Rick Altherr <raltherr@google.com></font><br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">tomjose <tomjose@linux.vnet.ibm.com></font><br><font size=1 color=#5f5f5f face="sans-serif">Cc:      

 </font><font size=1 face="sans-serif">OpenBMC Maillist <openbmc@lists.ozlabs.org>,

thalerj@us.ibm.com, jkeusema@us.ibm.com, rosedahl@us.ibm.com</font><br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">09/07/2017 01:41 PM</font><br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: RFC for

Telemetry data collection</font><br><hr noshade><br><br><br><font size=3>I have many opinions on telemetry data formats and APIs. 

What I'm seeing in your proposal looks pretty good with some subtlety in

the details.  For example, I expect to collect most data at least

once-per-second, not log anything locally, and not alert.  I'll do

all aggregation and thresholding at a higher level in the software stack. 

I also, ideally, want very descriptive information about where in the system

the sensor is.  I've attached a screenshot of what our existing host-based

reporting software makes available to higher-level software.  This

is a view via the human-readable web interface, the data is normally served

via protobufs.</font><br><br><font size=3>Rick</font><br><br><font size=3>On Thu, Sep 7, 2017 at 8:20 AM, tomjose <</font><a href=mailto:tomjose@linux.vnet.ibm.com target=_blank><font size=3 color=blue><u>tomjose@linux.vnet.ibm.com</u></font></a><font size=3>>

wrote:</font><br><font size=3>Hello,<br><br>I am working on the issue (</font><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_1957&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=g3oFrUCNLMKFaP17rERsIZKSKE6yYvKL3jAcaxmIa64&m=UujlkiUKZq6cfZKZKDw9TP5RTSTH7aH4Geiv3kJpY8I&s=B7x6-SS4rBONuXqxyOGlOHXDKtm6XUFf3Yt3XH44xxo&e=" target=_blank><font size=3 color=blue><u>https://github.com/openbmc/openbmc/issues/1957</u></font></a><font size=3>)

to design a telemetry application for the OpenBMC. I would be explaining

a rough idea of how we plan to go about. Please share your thoughts and

feedback on this proposal. This issue would depend on the design evolving

out of following issues, since this app would utilize the capabilities

provided. (</font><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_1856&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=g3oFrUCNLMKFaP17rERsIZKSKE6yYvKL3jAcaxmIa64&m=UujlkiUKZq6cfZKZKDw9TP5RTSTH7aH4Geiv3kJpY8I&s=rDhtl0eeZYA7rqH8OBb4TmsXNNiQmyJT7jx0wu9fgJs&e=" target=_blank><font size=3 color=blue><u>https://github.com/openbmc/openbmc/issues/1856</u></font></a><font size=3>,

</font><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openbmc_openbmc_issues_2102&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=g3oFrUCNLMKFaP17rERsIZKSKE6yYvKL3jAcaxmIa64&m=UujlkiUKZq6cfZKZKDw9TP5RTSTH7aH4Geiv3kJpY8I&s=pdVKSjq_e2b1Z8vKixCfz1WGKAfhBIzIonDhoVW-dpg&e=" target=_blank><font size=3 color=blue><u>https://github.com/openbmc/openbmc/issues/2102</u></font></a><font size=3>).<br><br>Summary of the requirements that we came across relevant to this discussion.<br><br><br>1) BMC telemetry data (example VRM rail voltages) where the data is collected

at different rates depending on the data and aggregated by the BMC app 

(minimum, maximum<br>    and average). Based on the collection timing request(frequency)

the metrics are logged, so that the user can fetch it for analytics.<br><br>2)  Users should be able to set thresholds for the temperature limits,

and receive alerts. This would allow user to plan the cooling needs.<br><br>3)  BMC would act as route for the OCC metrics to be send to the user.

The OCC would send down telemetric data to the BMC and BMC should figure

out a way to<br>     alert the user to consume this data.<br><br><br>We would keep the focus of the discussion on the requirement no 1.<br>This proposal presupposes that all the resources( example VRM rail voltages,

ambient temperature) that the telemetry app is interested in, should be

populated as dbus objects, which can<br>be queried to read the instantaneous values. phosphor-hwmon application

exposes many of the interested resources.<br><br>The idea is to have a yaml based approach, where the policy of the telemetry

app will be expressed. The application would be able to consume the yaml

and initiate the telemetry<br>data collection. The yaml would express the following:<br><br>a) Dbus Info (object, interface, property) associated with the resource.<br>b) Units associated with the value (celsius) and the associated scaling

factor).<br>c) Granularity - the time between two measures.<br>d) Aggregation methods - min,max,avg..etc.<br>e) Logging policy - frequency for creating an event and alerting the user.<br><br>The application would operate based on the policy and log the telemetry

data. The details of logging would evolve as we progress on the related

issue.<br><br>Regards,<br>Tom<br><br><br><br></font><br><font size=1 face="sans-serif">[attachment "zaius-fan-telemetry.png"

deleted by Todd Rosedahl/Rochester/IBM] </font><br><br><BR>