$expand on sensors slower than individual gets

Ed Tanous ed at tanous.net
Wed Apr 26 03:14:50 AEST 2023


On Fri, Apr 21, 2023 at 2:23 AM Aishwary Joshi <aishwaryj at nvidia.com> wrote:
>
> Hi All,
>
>
>
> We would like to get feedback on following performance issue that we have observed with $expand on /redfish/v1/Chassis/<ChassisId>/Sensors URI compared to using GET on individual Sensor URI (/redfish/v1/Chassis/<ChassisId>/Sensors/<SensorName>) on some Chassis
>
>
>
> Little bit background about the system:
>
> 1. No of Sensors present on the Chassis_X(where we see performance drop with $expand) : 7 sensors
>
> 2. No of Sensors present on the Chassis_Y(where we DONOT see the performance drop $expand): 31 sensors
>
> 3. We have a common service that host 24 Chassis (including Chassis_X, Chassis_Y)
>
> 4. Total No of Sensors supported by service that host 24 Chassis instances: 102
>
> 5. Time it took with $expand on Chassis_X sensors: 0.48secs('/redfish/v1/Chassis/Chassis_X/Sensors?$expand=*($levels=1)')
>
> 6. Total time taken by querying 7 sensors(present on Chassis_X) URI: 0.6secs
>
> 6. Time it took with $expand on Chassis_Y sensors: 0.48secs('/redfish/v1/Chassis/Chassis_X/Sensors?$expand=*($levels=1)')
>
> 7. Total time taken by querying 31 sensors(present on Chassis_y) URI: 0.91secs
>
>
>
> We see advantage of using $expand on Chassis_Y but not on Chassis_X.
>
> And Based on our analysis on $expand, looks like performance of $expand on sensors is tied to the number of sensors hosted by backend service and not by the number of sensors present on a Chassis. This is because of "GetManagedObjects" call done on the backend service which returns 102 sensors in our case irrespective of the number of sensors present on the requested chassis.
>
> Code Ref: https://github.com/openbmc/bmcweb/blob/master/redfish-core/lib/sensors.hpp#L2471
>
>
>
> Because of this issue problem, we are noticing significant perf drop when using $expand
>
> 8. Total time to query individual Sensors URI(101) : 3.08secs
>
> 9. Total time to query sensors with $expand(24 URIs) : 12secs

Could you please try to bisect this down to the commit that caused the
regression?  928fefb9a542b816d7c0418077def2b3874d1b0f might be of
note, because I think that's the one that did the GetManagedObjects?

>
>
>
>
>
>
>
> We would like to know
>
> 1. Is this the correct current behaviour with bmcweb.

Obviously anything that provides worse performance is not ideal, and
should be looked into, but in terms of correctness, it sounds like
it's giving the correct responses in both cases, so it sounds like
it's not a bug, but a performance regression.

>
> 2. if community is also experiencing similar performance drop with case mentioned above and what has been done to resolve it ?
>
> Also like to know if any recent $expand enhancement done in the sensor area which might help with performance issue, please do let me know.

see the SHA1 above.


My suspicion is that we either need to:
1. Roll back the efficient expand patches until they don't cause a
performance regression.
2. Determine the crossover of how many sensors for which
GetManagedObjects is faster than GetAll, and have bmcweb pick between
the two paths dependent on how many sensors are there.


More information about the openbmc mailing list