bmcweb multi-threaded solution

Rohit Pai ropai at nvidia.com
Thu Sep 7 19:36:37 AEST 2023


Hello All,

This previous thread<https://lists.ozlabs.org/pipermail/openbmc/2023-May/033495.html> captures the motive behind our interest in chasing multi-threaded solution for bmcweb.
Thanks to Ed for putting up this initial patch. https://gerrit.openbmc.org/c/openbmc/bmcweb/+/63710

We have been testing this patch in the recent times and I wanted to put a summary of our observations.


  1.  The original patch was not creating any explicit threads and we did not find boost::asio creating them for us.

So as per this article<https://theboostcpplibraries.com/boost.asio-scalability> from boost I modified the patch to create a thread pool and share the same IO context among all threads.

When I tested this change, I found two problems.

  1.  Sharing same IO context between multiple threads does not work.

I have logged this issue  https://github.com/chriskohlhoff/asio/issues/1353  in boost::asio git hub page with sample code to reproduce the issue.

It would be great if someone else test this sample code and share the results based on their platform.

  1.  Sharing dbus connection across threads is not safe:
when we share same IO context between multiple threads, it's possible that the async job posted by one thread, can be picked up by some other thread.
If thread1 makes crow::connections::systemBus().async_method_call then the response lambda can get executed in thead2's context.
When thread2 is trying to read from the dbus connection, thread1 can make a new request on the same bus connection as part of handling another URI request.
Sdbus is not thread safe when connection object is shared between multiple threads which can perform read/write operations.



  1.  IO Context per thread.

Since sharing IO context was not working I took the second approach mentioned in this article<https://theboostcpplibraries.com/boost.asio-scalability> which is to dedicate IO context per threads.

Major design challenge with this approach is to decide which jobs must be executed in which IO context.

I started with dedicating one thread/IO context to manage all the incoming requests and handling responses back to the clients.

I dedicated another thread/IO context to only manage aggregate URIs which have 1K+ sensors response (MRDs) to populate and does not have tighter latency requirements.

Our goal is to have faster response on the power/thermal URIs which is served by the main thread and is not blocked by huge response handling required by aggregate URIs which is managed by the secondary thread.

>From our previous performance experiments, we had found that JSON response preparation for 5K+ sensors was taking around 250 to 300ms in bmcweb during which power/thermals URIs were blocked.



     ┌──────────┐          ┌──────────────────┐

     │MainThread│          │MRD_Handler_Thread│

     └────┬─────┘          └────────┬─────────┘

                │   asio::post(request)        │

                │ ───────────────────>

                │                                            │

                │   asio::post(response)     │

                │ <───────────────────

     ┌────┴─────┐          ┌────────┴─────────┐

     │MainThread│          │MRD_Handler_Thread│

     └──────────┘          └──────────────────┘



Based on the URI main thread decides to continue to process the request or offload it to the MRD handler thread.

The response received from the MRD thread is returned to the client by the main thread.
               The performance results with the solution are great. We see almost 50% improvement in the performance of power/thermal URIs.
               Here is performance is measured based on worst case latency seen on power thermal URIs when there are concurrent clients accessing power/thermal + MRD URIs.

               However, this solution seems to have some stability issues in the overnight long run tests.
The crash is seen around boost:post APIs in multi-threading context. I have logged a different bug in boost::asio to demonstrate this problem. https://github.com/chriskohlhoff/asio/issues/1352
I will follow up to check if boost can help us with this fix.

What I am looking for

  1.  Does anyone have any different proposal for sharing IO context between threads which can work our bmc platform?
  2.  Feedback on handling dbus connection between multiple threads in the context of bmcweb?
  3.  Is this a good model to dedicate threads based on the use case as we are not able to make IO sharing between threads work well?
  4.  Any better way to Post asio jobs across threads and make it stable?

Thanks
Rohit PAI

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20230907/25b3fc13/attachment-0001.htm>


More information about the openbmc mailing list