Redfish requests for dbus-sensors data are needlessly slow - analysis

Ed Tanous ed at tanous.net
Fri Jul 30 00:53:38 AEST 2021


On Thu, Jul 29, 2021 at 2:46 AM Paul Fertser <fercerpav at gmail.com> wrote:
>
> Hello,
>
> Short version:
>
> Our operations team complains about very high (seconds) latency of
> requesting sensor data used to monitor health of the servers via
> Redfish. Boards are OCP Tioga Pass with AST2500 BMC. Software
> involved: OpenBMC bmcweb, dbus-sensors, ObjectMapper, EntityManager.
>
> In this mail I provide details about the testing methodology, the
> complete call chain (including RPC) involved in handling Redfish
> requests, and a quick-and-dirty patch that caches some D-Bus replies
> in bmcweb that results in cutting down response times by more than 4
> times. It also gives significantly less CPU overhead.
>
> My impression so far is that the current way is rather wasteful and
> can be considerably improved.
>
> Our question is basically whether what we see is normal, if we should
> change the way servers are monitored, if bmcweb should really be
> better optimised and if yes, how exactly?
>
>
> Longer version:
>
> Our operations team is normally using Zabbix for monitoring, and they
> developed a set of scripts and templates to handle OpenBMC machines:
> https://github.com/Gagar-in/redfish-bmc-lld .
>
> For easier reproduction and analysis I instead use plain curl command
> to request data for a single specific sensor. The following text
> assumes we're interested to get output current of the first DRAM VR
> (handled by pxe1610 kernel driver and PSUSensor OpenBMC daemon which
> caches the values periodically read from hwmon nodes).
>
> To make sure I'm measuring just the time it takes to answer the GET
> request itself (and not counting any TLS session establishment
> overhead as curl is reusing an open connection) I'm running this
> command:
>
> $ curl -w '\n\nTime spent: %{time_total}\n\n\n' -k $(for i in $(seq 5); do echo https://root:0penBmc@$BMCIP/redfish/v1/Chassis/TiogaPass_Baseboard/Sensors/CPU0_VDDQ_A_IOUT; done)
>
> I typically get over 2 seconds for each request made. With the patch
> at the end of the mail the same test consistently shows times below
> half a second.
>
> The major part of the delay comes from numerous (and probably
> suboptimal) D-Bus calls, plus the overhead to process the responses.
>
> Here follows D-Bus profiling data typical for each Redfish sensors
> request made. "mc" is the method call performed by "bmcweb", "mr" is
> the response. I've added the second column (by processing dbus-monitor
> --profile output through awk -vOFS='\t' '{ d=$2-prev; prev=$2;
> $2=d"\t"$2; print $0 }' ) to show the difference between the current
> timestamp and the previous one so it's visible how much time it takes
> for the daemons to answer each request and how much time bmcweb needs
> to process the result.
>
> mc      ---------       1625473150.291539       1665    :1.79   xyz.openbmc_project.User.Manager        /xyz/openbmc_project/user       xyz.openbmc_project.User.Manager        GetUserInfo
> mr      0.0463991       1625473150.337938       469     :1.36   :1.79   1665
> mc      0.00556493      1625473150.343503       1666    :1.79   xyz.openbmc_project.ObjectMapper        /xyz/openbmc_project/object_mapper      xyz.openbmc_project.ObjectMapper        GetSubTree
> err     0.0143089       1625473150.357812       470     :1.36   :1.79   1665
> mr      0.0478861       1625473150.405698       5891    :1.49   :1.79   1666
> mc      0.0611379       1625473150.466836       1667    :1.79   xyz.openbmc_project.ObjectMapper        /xyz/openbmc_project/object_mapper      xyz.openbmc_project.ObjectMapper        GetSubTree
> mr      0.0716419       1625473150.538478       5893    :1.49   :1.79   1667
> mc      0.0977321       1625473150.636210       1668    :1.79   xyz.openbmc_project.ObjectMapper        /xyz/openbmc_project/object_mapper      xyz.openbmc_project.ObjectMapper        GetSubTree
> mr      0.0268672       1625473150.663077       5895    :1.49   :1.79   1668
> mc      0.0232749       1625473150.686352       1669    :1.79   xyz.openbmc_project.ObjectMapper        /       org.freedesktop.DBus.ObjectManager      GetManagedObjects
> mr      0.156963        1625473150.843315       5896    :1.49   :1.79   1669
> mc      0.428404        1625473151.271719       1670    :1.79   xyz.openbmc_project.ObjectMapper        /xyz/openbmc_project/object_mapper      xyz.openbmc_project.ObjectMapper        GetSubTree
> mr      0.030153        1625473151.301872       5898    :1.49   :1.79   1670
> mc      0.123439        1625473151.425311       1671    :1.79   xyz.openbmc_project.EntityManager       /       org.freedesktop.DBus.ObjectManager      GetManagedObjects
> mr      0.284928        1625473151.710239       1408    :1.63   :1.79   1671
> mc      0.108877        1625473151.819116       1672    :1.79   xyz.openbmc_project.ObjectMapper        /xyz/openbmc_project/object_mapper      xyz.openbmc_project.ObjectMapper        GetSubTree
> mr      0.0525708       1625473151.871687       5900    :1.49   :1.79   1672
> mc      0.0175102       1625473151.889197       1673    :1.79   xyz.openbmc_project.PSUSensor   /       org.freedesktop.DBus.ObjectManager      GetManagedObjects
> mr      0.119283        1625473152.008480       124202  :1.104  :1.79   1673
>
>
> To provide more insight into what and why bmcweb is doing I made the
> following call chain description by carefully reading
> redfish-core/lib/sensors.hpp source code:
>
> 0. requestRoutesSensor() is called to process the GET request.
>
> For further processing a D-Bus call is made. For experimenting on a
> running system I'm providing copy-pasteable "busctl" invocations for
> all the calls involved.
>
> # busctl --json=pretty call xyz.openbmc_project.ObjectMapper /xyz/openbmc_project/object_mapper xyz.openbmc_project.ObjectMapper GetSubTree sias /xyz/openbmc_project/sensors 2 1 xyz.openbmc_project.Sensor.Value
>
> The returned array is searched for the requested sensor name, a
> flat_set is created with a single string containing D-Bus sensor path
> (in this example
> "/xyz/openbmc_project/sensors/current/CPU0_VDDQ_A_IOUT"). The flat_set
> is passed to processSensorList().
>
>
> 1. processSensorList() calls getConnections() which calls
> getObjectsWithConnection() (the latter can also be called from
> setSensorsOverride()).
>
> It does this call (exactly the same as previous!):
>
> # busctl --json=pretty call xyz.openbmc_project.ObjectMapper /xyz/openbmc_project/object_mapper xyz.openbmc_project.ObjectMapper GetSubTree sias /xyz/openbmc_project/sensors 2 1 xyz.openbmc_project.Sensor.Value
>
> For each returned subtree it determines if it's one of the sensors we
> requested. In this case the bus names are added to a "connections"
> flat_set (in this example it's just "xyz.openbmc_project.PSUSensor")
> and a set of tuples "objectsWithConnection" is created (in this
> example it's
> ("/xyz/openbmc_project/sensors/current/CPU0_VDDQ_A_IOUT","xyz.openbmc_project.PSUSensor")
> . "connections" is passed to the callback provided by
> processSensorList(), "objectsWithConnection" is ignored.
>
>
> 2. getConnectionCb() defined in processSensorList() calls getObjectManagerPaths().
>
> The following call is made:
>
> # busctl --json=pretty call xyz.openbmc_project.ObjectMapper /xyz/openbmc_project/object_mapper xyz.openbmc_project.ObjectMapper GetSubTree sias / 0 1 org.freedesktop.DBus.ObjectManager
>
> The result is converted to a "objectMgrPaths" associative array, with
> connection used as a key and path as a value (in this example the
> essential element would be
> objectMgrPaths["xyz.openbmc_project.PSUSensor"]="/" and
> objectMgrPaths["xyz.openbmc_project.ObjectMapper"]="/"). This map is
> passed to getObjectManagerPathsCb() defined inside processSensorList()
>
>
> 3. getInventoryItems() calls getInventoryItemAssociations() which uses
> the objectMgrPaths map to look for the path for
> "xyz.openbmc_project.ObjectMapper". If it's not found, "/" is assumed.
>
> The D-Bus call is made:
>
> # busctl --json=pretty call xyz.openbmc_project.ObjectMapper / org.freedesktop.DBus.ObjectManager GetManagedObjects
>
> For each sensor path we're interested in we're trying to find a
> corresponding inventory object (in this case
> "/xyz/openbmc_project/sensors/current/CPU0_VDDQ_A_IOUT/inventory") to
> get the list of endpoints for its associations. The first endopoint
> path is used in a call to addInventoryItem() (in this example it's
> "/xyz/openbmc_project/inventory/system/board/TiogaPass_Baseboard" for
> "/xyz/openbmc_project/sensors/current/CPU0_VDDQ_A_IOUT").
>
> Then we again walk the list returned from D-Bus for each entry
> checking if it represents a LED of any item stored in the current
> inventory list, and if found the first associated endpoint path is
> stored for this inventory item. In this example no associated LEDs
> exist.
>
> The resulting inventory items set is passed to the callback
> getInventoryItemAssociationsCb() defined in getInventoryItems().
>
>
> 4. getInventoryItemsConnections() is called which makes the following
> D-Bus call:
>
> # busctl --json=pretty call xyz.openbmc_project.ObjectMapper /xyz/openbmc_project/object_mapper xyz.openbmc_project.ObjectMapper GetSubTree sias /xyz/openbmc_project/inventory 0 4 xyz.openbmc_project.Inventory.Item xyz.openbmc_project.Inventory.Item.PowerSupply xyz.openbmc_project.Inventory.Decorator.Asset xyz.openbmc_project.State.Decorator.OperationalStatus
>
> For each item returned we see if it's present in our inventory and if
> yes, add all connections for it (in this example we only add
> "xyz.openbmc_project.EntityManager" for
> "/xyz/openbmc_project/inventory/system/board/TiogaPass_Baseboard"). Then
> it provides all the gathered inventory connections to the
> "getInventoryItemsConnectionsCb" callback provided by
> getInventoryItems().
>
>
> 5. getInventoryItemsData() gets called which for every stored
> inventory connection tries to find a corresponding path from the
> object manager data returned earlier (uses "/" if not found) and then
> performs a D-Bus call (in this example just one):
>
> # busctl --json=pretty call xyz.openbmc_project.EntityManager / org.freedesktop.DBus.ObjectManager GetManagedObjects
>
> For each inventory path it saves the result with
> storeInventoryItemData() and then calls getInventoryItemsData()
> recursively for all the other connections (in this example there was
> just one, "xyz.openbmc_project.EntityManager"). After processing them
> all, calls getInventoryItemsDataCb() defined in getInventoryItems().
>
>
> 6. getInventoryLeds() is called which does this D-Bus call:
>
> # busctl --json=pretty call xyz.openbmc_project.ObjectMapper /xyz/openbmc_project/object_mapper xyz.openbmc_project.ObjectMapper GetSubTree sias /xyz/openbmc_project 0 1 xyz.openbmc_project.Led.Physical
>
> The result is parsed to see if there's any LED path corresponding to
> an inventory item, and stores everything found. The collected data is
> passed to getInventoryLedData() which performs similarly to
> getInventoryItemsData(). In the example no suitable LEDs are found so
> it proceeds to calling getInventoryLedsCb() defined in
> getInventoryItems().
>
>
> 7. getPowerSupplyAttributes() is called but in this example (since
> it's requesting a sensor, not power node) it does nothing but proceeds
> to calling getInventoryItemsCb() defined in processSensorList().
>
>
> 8. getSensorData() is called which for every sensor connection finds
> objectMgrPath (if any, else "/" is used) and calls D-Bus (in this
> example just once since we have a single sensor connection):
>
> # busctl --json=pretty call xyz.openbmc_project.PSUSensor / org.freedesktop.DBus.ObjectManager GetManagedObjects
>
> This provides all the current data (including all the associations,
> OperationalStatus and Availability!) for all the sensors handled by
> this bus (so much more than we actually need in this example).
>
> It finally proceeds to building the JSON response (based on the data
> returned by the call and in some cases the inventory data obtained
> earlier) and exits; the response (values for the single sensor
> requested) is sent back to the client.
>
>
> Here follows my naive exploratory patch. It works for the sensors
> we're interested in (unless they appear after the first Redfish
> request made to bmcweb as the cache for the expensive D-Bus calls is
> never invalidated). It doesn't handle complex inventory associations.

The "unless" is kind of important here.  Sensors can show up or be
removed at any time, especially on an entity-manager enabled system.

As a general rule, bmcweb has avoided having a cache at all.  There
have been several attempts at adding one, but most failed because of
the same thing you found: cache invalidation is hard.  If we're going
to implement a cache, I'd really like to see it done at a higher level
(somewhere in the connection class) and done more generically than
this is here, which would ensure that all endpoints are sped up, not
just sensors.  With that said, if this really solves the problem and
solves it well, I can't very well ignore it.

>
> I would be willing to work on doing the right thing but for that I'm
> lacking the understanding of the complete picture involved in handling
> all types of sensors and interfaces, so I'm kindly asking for your
> help with it.
>
> diff --git a/redfish-core/lib/sensors.hpp b/redfish-core/lib/sensors.hpp

Can you please submit the below to gerrit as a WIP instead.  There's
much better tooling there for reviewing and testing patches.  I can
review it more there.

FWIW, it's on my list to look into the mapper shared-memory caching
layer that @arj wrote a while back.  It might also be able to solve
this problem;  I don't have the link offhand, but I think it was
trying to solve a similar problem, but handled the cache invalidation
issue as well.

> index ac987359ce5b..266dd25b27e0 100644
> --- a/redfish-core/lib/sensors.hpp
> +++ b/redfish-core/lib/sensors.hpp
> @@ -51,6 +51,9 @@ static constexpr std::string_view sensors = "Sensors";
>  static constexpr std::string_view thermal = "Thermal";
>  } // namespace node
>
> +static boost::container::flat_map<std::string, std::string> objectMgrPathsCache;
> +static GetSubTreeType subTreeCache;
> +
>  namespace dbus
>  {
>  static const boost::container::flat_map<std::string_view,
> @@ -244,6 +247,7 @@ void getObjectsWithConnection(
>              return;
>          }
>
> +        sensors::subTreeCache = subtree;
>          BMCWEB_LOG_DEBUG << "Found " << subtree.size() << " subtrees";
>
>          // Make unique list of connections only for requested sensor types and
> @@ -281,11 +285,19 @@ void getObjectsWithConnection(
>          callback(std::move(connections), std::move(objectsWithConnection));
>          BMCWEB_LOG_DEBUG << "getObjectsWithConnection resp_handler exit";
>      };
> -    // Make call to ObjectMapper to find all sensors objects
> -    crow::connections::systemBus->async_method_call(
> -        std::move(respHandler), "xyz.openbmc_project.ObjectMapper",
> -        "/xyz/openbmc_project/object_mapper",
> -        "xyz.openbmc_project.ObjectMapper", "GetSubTree", path, 2, interfaces);
> +    if (sensors::subTreeCache.empty())
> +    {
> +        // Make call to ObjectMapper to find all sensors objects
> +        crow::connections::systemBus->async_method_call(
> +            std::move(respHandler), "xyz.openbmc_project.ObjectMapper",
> +            "/xyz/openbmc_project/object_mapper",
> +            "xyz.openbmc_project.ObjectMapper", "GetSubTree", path, 2,
> +            interfaces);
> +    }
> +    else
> +    {
> +        respHandler(boost::system::error_code{}, sensors::subTreeCache);
> +    }
>      BMCWEB_LOG_DEBUG << "getObjectsWithConnection exit";
>  }
>
> @@ -606,15 +618,26 @@ void getObjectManagerPaths(std::shared_ptr<SensorsAsyncResp> SensorsAsyncResp,
>                                   << objectPath;
>              }
>          }
> +        sensors::objectMgrPathsCache = *objectMgrPaths;
>          callback(objectMgrPaths);
>          BMCWEB_LOG_DEBUG << "getObjectManagerPaths respHandler exit";
>      };
>
> -    // Query mapper for all DBus object paths that implement ObjectManager
> -    crow::connections::systemBus->async_method_call(
> -        std::move(respHandler), "xyz.openbmc_project.ObjectMapper",
> -        "/xyz/openbmc_project/object_mapper",
> -        "xyz.openbmc_project.ObjectMapper", "GetSubTree", "/", 0, interfaces);
> +    if (sensors::objectMgrPathsCache.empty())
> +    {
> +        // Query mapper for all DBus object paths that implement ObjectManager
> +        crow::connections::systemBus->async_method_call(
> +            std::move(respHandler), "xyz.openbmc_project.ObjectMapper",
> +            "/xyz/openbmc_project/object_mapper",
> +            "xyz.openbmc_project.ObjectMapper", "GetSubTree", "/", 0,
> +            interfaces);
> +    }
> +    else
> +    {
> +        callback(std::make_shared<
> +                 boost::container::flat_map<std::string, std::string>>(
> +            sensors::objectMgrPathsCache));
> +    }
>      BMCWEB_LOG_DEBUG << "getObjectManagerPaths exit";
>  }
>
> @@ -2628,9 +2669,11 @@ inline void processSensorList(
>                          };
>
>                      // Get inventory items associated with sensors
> -                    getInventoryItems(SensorsAsyncResp, sensorNames,
> -                                      objectMgrPaths,
> -                                      std::move(getInventoryItemsCb));
> +                    // getInventoryItems(SensorsAsyncResp, sensorNames,
> +                    //                  objectMgrPaths,
> +                    //                  std::move(getInventoryItemsCb));
> +                    getInventoryItemsCb(
> +                        std::make_shared<std::vector<InventoryItem>>());
>
>                      BMCWEB_LOG_DEBUG << "getObjectManagerPathsCb exit";
>                  };
> @@ -3105,66 +3148,75 @@ class Sensor : public Node
>          const std::array<const char*, 1> interfaces = {
>              "xyz.openbmc_project.Sensor.Value"};
>
> -        // Get a list of all of the sensors that implement Sensor.Value
> -        // and get the path and service name associated with the sensor
> -        crow::connections::systemBus->async_method_call(
> -            [asyncResp, sensorName](const boost::system::error_code ec,
> -                                    const GetSubTreeType& subtree) {
> -                BMCWEB_LOG_DEBUG << "respHandler1 enter";
> -                if (ec)
> -                {
> -                    messages::internalError(asyncResp->res);
> -                    BMCWEB_LOG_ERROR << "Sensor getSensorPaths resp_handler: "
> -                                     << "Dbus error " << ec;
> -                    return;
> -                }
> +        auto respHandler = [asyncResp,
> +                            sensorName](const boost::system::error_code ec,
> +                                        const GetSubTreeType& subtree) {
> +            BMCWEB_LOG_DEBUG << "respHandler1 enter";
> +            if (ec)
> +            {
> +                messages::internalError(asyncResp->res);
> +                BMCWEB_LOG_ERROR << "Sensor getSensorPaths resp_handler: "
> +                                 << "Dbus error " << ec;
> +                return;
> +            }
>
> -                GetSubTreeType::const_iterator it = std::find_if(
> -                    subtree.begin(), subtree.end(),
> -                    [sensorName](
> -                        const std::pair<
> -                            std::string,
> -                            std::vector<std::pair<std::string,
> -                                                  std::vector<std::string>>>>&
> -                            object) {
> -                        std::string_view sensor = object.first;
> -                        std::size_t lastPos = sensor.rfind("/");
> -                        if (lastPos == std::string::npos ||
> -                            lastPos + 1 >= sensor.size())
> -                        {
> -                            BMCWEB_LOG_ERROR << "Invalid sensor path: "
> -                                             << sensor;
> -                            return false;
> -                        }
> -                        std::string_view name = sensor.substr(lastPos + 1);
> +            sensors::subTreeCache = subtree;
>
> -                        return name == sensorName;
> -                    });
> +            GetSubTreeType::const_iterator it = std::find_if(
> +                subtree.begin(), subtree.end(),
> +                [sensorName](
> +                    const std::pair<
> +                        std::string,
> +                        std::vector<std::pair<
> +                            std::string, std::vector<std::string>>>>& object) {
> +                    std::string_view sensor = object.first;
> +                    std::size_t lastPos = sensor.rfind("/");
> +                    if (lastPos == std::string::npos ||
> +                        lastPos + 1 >= sensor.size())
> +                    {
> +                        BMCWEB_LOG_ERROR << "Invalid sensor path: " << sensor;
> +                        return false;
> +                    }
> +                    std::string_view name = sensor.substr(lastPos + 1);
>
> -                if (it == subtree.end())
> -                {
> -                    BMCWEB_LOG_ERROR << "Could not find path for sensor: "
> -                                     << sensorName;
> -                    messages::resourceNotFound(asyncResp->res, "Sensor",
> -                                               sensorName);
> -                    return;
> -                }
> -                std::string_view sensorPath = (*it).first;
> -                BMCWEB_LOG_DEBUG << "Found sensor path for sensor '"
> -                                 << sensorName << "': " << sensorPath;
> +                    return name == sensorName;
> +                });
>
> -                const std::shared_ptr<boost::container::flat_set<std::string>>
> -                    sensorList = std::make_shared<
> -                        boost::container::flat_set<std::string>>();
> +            if (it == subtree.end())
> +            {
> +                BMCWEB_LOG_ERROR << "Could not find path for sensor: "
> +                                 << sensorName;
> +                messages::resourceNotFound(asyncResp->res, "Sensor",
> +                                           sensorName);
> +                return;
> +            }
> +            std::string_view sensorPath = (*it).first;
> +            BMCWEB_LOG_DEBUG << "Found sensor path for sensor '" << sensorName
> +                             << "': " << sensorPath;
>
> -                sensorList->emplace(sensorPath);
> -                processSensorList(asyncResp, sensorList);
> -                BMCWEB_LOG_DEBUG << "respHandler1 exit";
> -            },
> -            "xyz.openbmc_project.ObjectMapper",
> -            "/xyz/openbmc_project/object_mapper",
> -            "xyz.openbmc_project.ObjectMapper", "GetSubTree",
> -            "/xyz/openbmc_project/sensors", 2, interfaces);
> +            const std::shared_ptr<boost::container::flat_set<std::string>>
> +                sensorList =
> +                    std::make_shared<boost::container::flat_set<std::string>>();
> +
> +            sensorList->emplace(sensorPath);
> +            processSensorList(asyncResp, sensorList);
> +            BMCWEB_LOG_DEBUG << "respHandler1 exit";
> +        };
> +
> +        if (sensors::subTreeCache.empty())
> +        {
> +            // Get a list of all of the sensors that implement Sensor.Value
> +            // and get the path and service name associated with the sensor
> +            crow::connections::systemBus->async_method_call(
> +                std::move(respHandler), "xyz.openbmc_project.ObjectMapper",
> +                "/xyz/openbmc_project/object_mapper",
> +                "xyz.openbmc_project.ObjectMapper", "GetSubTree",
> +                "/xyz/openbmc_project/sensors", 2, interfaces);
> +        }
> +        else
> +        {
> +            respHandler(boost::system::error_code{}, sensors::subTreeCache);
> +        }
>      }
>  };
>
>
> Thank you for your attention.
>
> --
> Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
> mailto:fercerpav at gmail.com


More information about the openbmc mailing list