bmcweb multi-threaded solution

Fri Sep 8 22:56:05 AEST 2023

Hello Ed, 

Sharing the code snippets for the two approaches we have tested. 

1. Original Patch + Thread Pool 

webserver_main.cpp ---------------------------------------------------------------------------------------------------------------------------------- 

void runIOService()
{
    BMCWEB_LOG_INFO << "Starting ASIO worker thread";
    boost::asio::io_context& io = crow::connections::getIoContext();
    io.run();
    BMCWEB_LOG_INFO << "Exiting ASIO worker thread";
}

static int run()
{
    .......
    app.run();

    // Create a vector of threads
    std::vector<std::thread> threads;

    //Create and launch the threads
    // 2 threads would be created for AST2600 
    for (unsigned int i = 0; i < boost::thread::hardware_concurrency(); ++i)
    {
        threads.emplace_back(runIOService);
    }

    // Wait for all threads to finish
    for (auto& thread : threads) {
        thread.join();
    }

    return 0;
}

With this approach we are facing the issue of dbus connections being shared between threads issue which I have explained previously.  

2. Original Patch + Dedicate thread for special MRD URIs 

webserver_main.cpp ---------------------------------------------------------------------------------------------------------------------------------- 

void runIOService()
{
    BMCWEB_LOG_INFO << "Starting ASIO worker thread";
    boost::asio::io_context& io = crow::connections::getNextIoContext();
    io.run();
    BMCWEB_LOG_INFO << "Exiting ASIO worker thread";
}

static int run()
{
    crow::Logger::setLogLevel(
        static_cast<crow::LogLevel>(bmcwebLogLevel));

    boost::asio::io_context& io = crow::connections::getIoContext();
    App app(io);
    // Create a work object to prevent ioContext.run() from returning immediately
    auto work = make_work_guard(crow::connections::getNextIoContext());
.....................
    app.run();

    // Create a vector of threads
    std::vector<std::thread> threads;

    //Create and launch the threads
   // Test code with one MRD handler thread 
    for (unsigned int i = 0; i < 1; ++i)
    {
        threads.emplace_back(runIOService);
    }
    io.run();
    work.reset();
    // Wait for all threads to finish
    for (auto& thread : threads) {
        thread.join();
    }
}

dbus_singleton.cpp-----------------------------------------------------------------------------------------------------------------------------------

boost::asio::io_context& getNextIoContext()
{
    int threadCount = 4;
    static boost::asio::io_context io(threadCount);
    return io;
}

dbus_singleton.hpp-----------------------------------------------------------------------------------------------------------------------------------
boost::asio::io_context& getNextIoContext();

Platform Specific MRD URI handling 
metric_report.hpp------------------------------------------------------------------------------------------------------------------------------------

inline void
    getPlatforMetrics(const crow::Request& req, const std::shared_ptr<bmcweb::AsyncResp>& asyncResp,
                      const std::string& metricId,
                      const uint64_t& requestTimestamp = 0)
{
            boost::asio::post(
                crow::connections::getNextIoContext(), [req, asyncResp]() {

                    nlohmann::json& resArray =
                        asyncResp->res.jsonValue["MetricValues"];

	     // Test code which populates 5K objects for the response    
                   // In our actual implementation code we read this data from shared memory based backend API 
	      nlohmann::json thisMetric = nlohmann::json::object();
                    for (int i=0; i < 5000; i++) {
                        thisMetric["MetricValue"] = 0;
                        thisMetric["MetricProperty"] = "/redfish/v1/Fabrics/System_0/XYZ ";
                        thisMetric["Timestamp"] = "2020-03-27T16:50:58.516+00:00";
                        resArray.push_back(thisMetric);
                    }                    

                    boost::asio::post(*req.ioService, [asyncResp](){
                        messages::success(asyncResp->res);
                    });
                });
}

This code works well for some period but has stability issues. 
I was not sure if cross posting asio jobs between context is stable or not hence I wanted to test it with just simple boost::asio code. 
I created this defect https://github.com/chriskohlhoff/asio/issues/1352 as I was able to repro the issue with boost::asio code which is present in the bug description. 

Thanks 
Rohit PAI 

-----Original Message-----
From: Ed Tanous <edtanous at google.com> 
Sent: Friday, September 8, 2023 1:20 AM
To: Rohit Pai <ropai at nvidia.com>
Cc: openbmc at lists.ozlabs.org
Subject: Re: bmcweb multi-threaded solution

External email: Use caution opening links or attachments

On Thu, Sep 7, 2023 at 2:36 AM Rohit Pai <ropai at nvidia.com> wrote:
>
> Hello All,
>
>
>
> This previous thread captures the motive behind our interest in chasing multi-threaded solution for bmcweb.
>
> Thanks to Ed for putting up this initial patch. 
> https://gerrit.openbmc.org/c/openbmc/bmcweb/+/63710
>
>
>
> We have been testing this patch in the recent times and I wanted to put a summary of our observations.
>
>
>
> The original patch was not creating any explicit threads and we did not find boost::asio creating them for us.
>
> So as per this article from boost I modified the patch to create a thread pool and share the same IO context among all threads.
>
> When I tested this change, I found two problems.
>
> Sharing same IO context between multiple threads does not work.
>
> I have logged this issue  https://github.com/chriskohlhoff/asio/issues/1353  in boost::asio git hub page with sample code to reproduce the issue.
>
> It would be great if someone else test this sample code and share the results based on their platform.
>
> Sharing dbus connection across threads is not safe:
>
> when we share same IO context between multiple threads, it’s possible that the async job posted by one thread, can be picked up by some other thread.
>
> If thread1 makes crow::connections::systemBus().async_method_call then the response lambda can get executed in thead2’s context.
>
> When thread2 is trying to read from the dbus connection, thread1 can make a new request on the same bus connection as part of handling another URI request.
>
> Sdbus is not thread safe when connection object is shared between multiple threads which can perform read/write operations.
>
>
>
> IO Context per thread.
>
> Since sharing IO context was not working I took the second approach mentioned in this article which is to dedicate IO context per threads.
>
> Major design challenge with this approach is to decide which jobs must be executed in which IO context.
>
> I started with dedicating one thread/IO context to manage all the incoming requests and handling responses back to the clients.
>
> I dedicated another thread/IO context to only manage aggregate URIs which have 1K+ sensors response (MRDs) to populate and does not have tighter latency requirements.
>
> Our goal is to have faster response on the power/thermal URIs which is served by the main thread and is not blocked by huge response handling required by aggregate URIs which is managed by the secondary thread.
>
> From our previous performance experiments, we had found that JSON response preparation for 5K+ sensors was taking around 250 to 300ms in bmcweb during which power/thermals URIs were blocked.
>
>
>
>      ┌──────────┐          ┌──────────────────┐
>
>      │MainThread│          │MRD_Handler_Thread│
>
>      └────┬─────┘          └────────┬─────────┘
>
>                 │   asio::post(request)        │
>
>                 │ ───────────────────>
>
>                 │                                            │
>
>                 │   asio::post(response)     │
>
>                 │ <───────────────────
>
>      ┌────┴─────┐          ┌────────┴─────────┐
>
>      │MainThread│          │MRD_Handler_Thread│
>
>      └──────────┘          └──────────────────┘
>
>
>
> Based on the URI main thread decides to continue to process the request or offload it to the MRD handler thread.
>
> The response received from the MRD thread is returned to the client by the main thread.
>
>                The performance results with the solution are great. We see almost 50% improvement in the performance of power/thermal URIs.
>
>                Here is performance is measured based on worst case latency seen on power thermal URIs when there are concurrent clients accessing power/thermal + MRD URIs.
>
>
>
>                However, this solution seems to have some stability issues in the overnight long run tests.
>
> The crash is seen around boost:post APIs in multi-threading context. I 
> have logged a different bug in boost::asio to demonstrate this 
> problem. https://github.com/chriskohlhoff/asio/issues/1352
>
> I will follow up to check if boost can help us with this fix.
>
>
>
> What I am looking for
>
> Does anyone have any different proposal for sharing IO context between threads which can work our bmc platform?
> Feedback on handling dbus connection between multiple threads in the context of bmcweb?
> Is this a good model to dedicate threads based on the use case as we are not able to make IO sharing between threads work well?
> Any better way to Post asio jobs across threads and make it stable?
>
>

The above is great, but if you don't post the code you're using, it's really difficult to provide any input.  Your stability issues could be simply due to missing locks, or unintended multi-threaded sharing, but it's hard to know without being able to look at the code you're using.
I would highly recommend building your binary with thread sanitizer.

>
> Thanks
>
> Rohit PAI
>
>