Prioritizing URIs with tight performance requirement in openBmc with bmcweb

Fri Jun 9 03:18:51 AEST 2023

On Sat, Jun 3, 2023 at 1:49 AM Rohit Pai <ropai at nvidia.com> wrote:
>
> Hello Ed,

The below is all really great data

>
> Thermal metric URI has around 100 sensors and has tight latency perf requirement of 500ms.
> Stats/counter metric URI has around 2500 properties to fetch from the backend which uses the GetManagedObjects API.
> Time analysis was done on the latency measurement of stats/counter URI as this impacts the latency of thermal metric URI with the current bmcweb single threaded nature.

What two queries are you testing with?

>
>
>
> Method 1 - Object Manger call to the backend service, bmcweb handler code processes the response and prepares the required JSON objects.
> a. Backend dbus call turnaround time                                              - 584 ms

This is quite high.  Have you looked at reducing this?  This would
imply that you're doing blocking calls in your backend daemon.

> b. Logic in bmcweb route handler code to prepare response      - 365 ms

This could almost certainly be reduced with some targeted things.  You
didn't answer me on whether you're using TLS in this example, so I'm
going to assume you're not from your numbers.  I would've expected
crypto to be a significant part of your profile.

> c. Total URI latency                                                                               - 1019 ms

a + b != c.  Is the rest the time spent writing to the socket?  What
is the extra time?

>
> Method 2 - Backend populates all the needed properties in a single aggregate property.
> a. Backend dbus call turnaround time                                              - 161 ms

This is still higher than I would like to see, but in the realm of
what I would expect.

> b. Logic in bmcweb route handler code to prepare response      - 71   ms

I would've expected to see this in single digit ms for a single
property.  Can you profile here and see what's taking so long?

> c. Total URI latency                                                                               - 291 ms
>
> Method 3 - Bmcweb reads all the properties from a file fd. Here goal is to eliminate latency and load coming by using dbus as an IPC for large payloads.
> a. fd read call in bmcweb                                                                     - 64 ms

This is roughly equivalent to the dbus call, so if we figure out where
the bottleneck is in method 1B from the above, we could probably get
this comperable.

> b. JSON objection population from the read file contents             - 96 ms

This seems really high.

> c. Total URI latency                                                                                - 254 ms
> The file contents were in JSON format. If we can replace this with efficient data structure which can be used with fd passing, then I think we can further optimize point b.

In Method 3 you've essentially invented a new internal OpenBMC API.  I
would love to foster discussions of how to handle that, but we need to
treat it holistically in the system, and understand how:
1. The schemas will be managed
2. Concurrency will be managed
3. Blocking will be managed (presumably you did a blocking filesystem
read to get the data)

I'm happy to have those discussions, and the data you have above is
interesting, but any sort of change would require much larger project
buy-in.

> Optimization around CPU bound logic in handler code would certainly help the latency of the other requests pending in the queue.

Is it CPU bound or Memory bandwidth bound?  Most of the time I've seen
the latter.  How did you collect the measurements on cpu versus IO
versus memory bound?

>
> I will try the multi-threaded solution put by you in the coming days and share the results.
>

Sounds good.  Thanks for the input.