bmcweb multi-threaded solution

Rohit Pai ropai at nvidia.com
Wed Sep 13 17:11:26 AEST 2023


Hello Ed, 

Here is the pull request for the MRD based thread model. 
https://gerrit.openbmc.org/c/openbmc/bmcweb/+/66583
Let's discus on Gerrit. 

thanks 
Rohit PAI 

-----Original Message-----
From: Ed Tanous <ed at tanous.net> 
Sent: Friday, September 8, 2023 11:39 PM
To: Rohit Pai <ropai at nvidia.com>
Cc: Ed Tanous <edtanous at google.com>; openbmc at lists.ozlabs.org
Subject: Re: bmcweb multi-threaded solution

External email: Use caution opening links or attachments


On Fri, Sep 8, 2023 at 5:57 AM Rohit Pai <ropai at nvidia.com> wrote:
>
> Hello Ed,
>
> Sharing the code snippets for the two approaches we have tested.
>
> 1. Original Patch + Thread Pool
>
> webserver_main.cpp 
> ----------------------------------------------------------------------
> ------------------------------------------------------------
>
> void runIOService()
> {
>     BMCWEB_LOG_INFO << "Starting ASIO worker thread";
>     boost::asio::io_context& io = crow::connections::getIoContext();
>     io.run();
>     BMCWEB_LOG_INFO << "Exiting ASIO worker thread"; }
>
> static int run()
> {
>     .......
>     app.run();
>
>     // Create a vector of threads
>     std::vector<std::thread> threads;
>
>     //Create and launch the threads
>     // 2 threads would be created for AST2600
>     for (unsigned int i = 0; i < boost::thread::hardware_concurrency(); ++i)
>     {
>         threads.emplace_back(runIOService);
>     }
>
>     // Wait for all threads to finish
>     for (auto& thread : threads) {
>         thread.join();
>     }
>
>     return 0;
> }
>
> With this approach we are facing the issue of dbus connections being shared between threads issue which I have explained previously.
>
>
>
> 2. Original Patch + Dedicate thread for special MRD URIs
>
> webserver_main.cpp 
> ----------------------------------------------------------------------
> ------------------------------------------------------------
>
> void runIOService()
> {
>     BMCWEB_LOG_INFO << "Starting ASIO worker thread";
>     boost::asio::io_context& io = crow::connections::getNextIoContext();
>     io.run();
>     BMCWEB_LOG_INFO << "Exiting ASIO worker thread"; }
>
> static int run()
> {
>     crow::Logger::setLogLevel(
>         static_cast<crow::LogLevel>(bmcwebLogLevel));
>
>     boost::asio::io_context& io = crow::connections::getIoContext();
>     App app(io);
>     // Create a work object to prevent ioContext.run() from returning immediately
>     auto work = 
> make_work_guard(crow::connections::getNextIoContext());
> .....................
>     app.run();
>
>     // Create a vector of threads
>     std::vector<std::thread> threads;
>
>     //Create and launch the threads
>    // Test code with one MRD handler thread
>     for (unsigned int i = 0; i < 1; ++i)
>     {
>         threads.emplace_back(runIOService);
>     }
>     io.run();
>     work.reset();
>     // Wait for all threads to finish
>     for (auto& thread : threads) {
>         thread.join();
>     }
> }
>
> dbus_singleton.cpp----------------------------------------------------
> ----------------------------------------------------------------------
> ---------
>
> boost::asio::io_context& getNextIoContext() {
>     int threadCount = 4;
>     static boost::asio::io_context io(threadCount);
>     return io;
> }
>
> dbus_singleton.hpp----------------------------------------------------
> ----------------------------------------------------------------------
> --------- boost::asio::io_context& getNextIoContext();
>
>
> Platform Specific MRD URI handling
> metric_report.hpp-----------------------------------------------------
> ----------------------------------------------------------------------
> ---------
>
> inline void
>     getPlatforMetrics(const crow::Request& req, const std::shared_ptr<bmcweb::AsyncResp>& asyncResp,
>                       const std::string& metricId,
>                       const uint64_t& requestTimestamp = 0) {
>             boost::asio::post(
>                 crow::connections::getNextIoContext(), [req, 
> asyncResp]() {
>
>                     nlohmann::json& resArray =
>                         asyncResp->res.jsonValue["MetricValues"];
>
>
>              // Test code which populates 5K objects for the response
>                    // In our actual implementation code we read this data from shared memory based backend API
>               nlohmann::json thisMetric = nlohmann::json::object();
>                     for (int i=0; i < 5000; i++) {
>                         thisMetric["MetricValue"] = 0;
>                         thisMetric["MetricProperty"] = "/redfish/v1/Fabrics/System_0/XYZ ";
>                         thisMetric["Timestamp"] = "2020-03-27T16:50:58.516+00:00";
>                         resArray.push_back(thisMetric);
>                     }
>
>                     boost::asio::post(*req.ioService, [asyncResp](){
>                         messages::success(asyncResp->res);
>                     });
>                 });
> }
>
> This code works well for some period but has stability issues.
> I was not sure if cross posting asio jobs between context is stable or not hence I wanted to test it with just simple boost::asio code.
> I created this defect https://github.com/chriskohlhoff/asio/issues/1352 as I was able to repro the issue with boost::asio code which is present in the bug description.
>

Please submit patches to gerrit, mark them WIP, and make sure they build.  What you wrote above won't build due to logging changes made a while ago on mainline, so it's not very helpful.  With that said, if the above is the only code you wrote, you're missing multi-threading locks in quite a few places, which is likely why your code is crashing but let's discuss that on gerrit, where we can talk line by line in the diff.

>
> Thanks
> Rohit PAI
>
> -----Original Message-----
> From: Ed Tanous <edtanous at google.com>
> Sent: Friday, September 8, 2023 1:20 AM
> To: Rohit Pai <ropai at nvidia.com>
> Cc: openbmc at lists.ozlabs.org
> Subject: Re: bmcweb multi-threaded solution
>
> External email: Use caution opening links or attachments
>
>
> On Thu, Sep 7, 2023 at 2:36 AM Rohit Pai <ropai at nvidia.com> wrote:
> >
> > Hello All,
> >
> >
> >
> > This previous thread captures the motive behind our interest in chasing multi-threaded solution for bmcweb.
> >
> > Thanks to Ed for putting up this initial patch.
> > https://gerrit.openbmc.org/c/openbmc/bmcweb/+/63710
> >
> >
> >
> > We have been testing this patch in the recent times and I wanted to put a summary of our observations.
> >
> >
> >
> > The original patch was not creating any explicit threads and we did not find boost::asio creating them for us.
> >
> > So as per this article from boost I modified the patch to create a thread pool and share the same IO context among all threads.
> >
> > When I tested this change, I found two problems.
> >
> > Sharing same IO context between multiple threads does not work.
> >
> > I have logged this issue  https://github.com/chriskohlhoff/asio/issues/1353  in boost::asio git hub page with sample code to reproduce the issue.
> >
> > It would be great if someone else test this sample code and share the results based on their platform.
> >
> > Sharing dbus connection across threads is not safe:
> >
> > when we share same IO context between multiple threads, it’s possible that the async job posted by one thread, can be picked up by some other thread.
> >
> > If thread1 makes crow::connections::systemBus().async_method_call then the response lambda can get executed in thead2’s context.
> >
> > When thread2 is trying to read from the dbus connection, thread1 can make a new request on the same bus connection as part of handling another URI request.
> >
> > Sdbus is not thread safe when connection object is shared between multiple threads which can perform read/write operations.
> >
> >
> >
> > IO Context per thread.
> >
> > Since sharing IO context was not working I took the second approach mentioned in this article which is to dedicate IO context per threads.
> >
> > Major design challenge with this approach is to decide which jobs must be executed in which IO context.
> >
> > I started with dedicating one thread/IO context to manage all the incoming requests and handling responses back to the clients.
> >
> > I dedicated another thread/IO context to only manage aggregate URIs which have 1K+ sensors response (MRDs) to populate and does not have tighter latency requirements.
> >
> > Our goal is to have faster response on the power/thermal URIs which is served by the main thread and is not blocked by huge response handling required by aggregate URIs which is managed by the secondary thread.
> >
> > From our previous performance experiments, we had found that JSON response preparation for 5K+ sensors was taking around 250 to 300ms in bmcweb during which power/thermals URIs were blocked.
> >
> >
> >
> >      ┌──────────┐          ┌──────────────────┐
> >
> >      │MainThread│          │MRD_Handler_Thread│
> >
> >      └────┬─────┘          └────────┬─────────┘
> >
> >                 │   asio::post(request)        │
> >
> >                 │ ───────────────────>
> >
> >                 │                                            │
> >
> >                 │   asio::post(response)     │
> >
> >                 │ <───────────────────
> >
> >      ┌────┴─────┐          ┌────────┴─────────┐
> >
> >      │MainThread│          │MRD_Handler_Thread│
> >
> >      └──────────┘          └──────────────────┘
> >
> >
> >
> > Based on the URI main thread decides to continue to process the request or offload it to the MRD handler thread.
> >
> > The response received from the MRD thread is returned to the client by the main thread.
> >
> >                The performance results with the solution are great. We see almost 50% improvement in the performance of power/thermal URIs.
> >
> >                Here is performance is measured based on worst case latency seen on power thermal URIs when there are concurrent clients accessing power/thermal + MRD URIs.
> >
> >
> >
> >                However, this solution seems to have some stability issues in the overnight long run tests.
> >
> > The crash is seen around boost:post APIs in multi-threading context. 
> > I have logged a different bug in boost::asio to demonstrate this 
> > problem. https://github.com/chriskohlhoff/asio/issues/1352
> >
> > I will follow up to check if boost can help us with this fix.
> >
> >
> >
> > What I am looking for
> >
> > Does anyone have any different proposal for sharing IO context between threads which can work our bmc platform?
> > Feedback on handling dbus connection between multiple threads in the context of bmcweb?
> > Is this a good model to dedicate threads based on the use case as we are not able to make IO sharing between threads work well?
> > Any better way to Post asio jobs across threads and make it stable?
> >
> >
>
> The above is great, but if you don't post the code you're using, it's really difficult to provide any input.  Your stability issues could be simply due to missing locks, or unintended multi-threaded sharing, but it's hard to know without being able to look at the code you're using.
> I would highly recommend building your binary with thread sanitizer.
>
> >
> > Thanks
> >
> > Rohit PAI
> >
> >


More information about the openbmc mailing list