Redfish requests for dbus-sensors data are needlessly slow - analysis
ed at tanous.net
Fri Jul 30 02:08:30 AEST 2021
On Thu, Jul 29, 2021 at 8:29 AM Paul Fertser <fercerpav at gmail.com> wrote:
> Hello Ed,
> Thank you for the reply.
> On Thu, Jul 29, 2021 at 07:53:38AM -0700, Ed Tanous wrote:
> > On Thu, Jul 29, 2021 at 2:46 AM Paul Fertser <fercerpav at gmail.com> wrote:
> > > Here follows my naive exploratory patch. It works for the sensors
> > > we're interested in (unless they appear after the first Redfish
> > > request made to bmcweb as the cache for the expensive D-Bus calls is
> > > never invalidated). It doesn't handle complex inventory associations.
> > The "unless" is kind of important here. Sensors can show up or be
> > removed at any time, especially on an entity-manager enabled system.
> So bmcweb can subscribe to the signals from entity-manager and
> invalidate the caches just in case if any EM event registered.
You have to monitor signals from all connections. bmcweb doesn't only
run against entity-manager. FWIW, what you described is essentially
entirely what the mapper does; Handling all the race conditions, add
events, remove events, with and without object manager support was
non-trivial, and this is why Andrews attempt tries to put it into a
library so we don't lose that code.
None of this is to say "don't try".... please, give it an attempt and
let's see if we can tease something good out here, but just understand
that there are race conditions there.
> > As a general rule, bmcweb has avoided having a cache at all. There
> > have been several attempts at adding one, but most failed because of
> > the same thing you found: cache invalidation is hard.
> Even if it's just dropped on any ObjectMapper or EntityManager signal?
Those aren't the only signals you need to monitor. If you're
interested in going down this route, I'd start by looking at the
mapper implementation. You'd also have to handle connections,
disconnections, and other stuff. At some point, what you've done
starts to look a lot like the mapper.
> > If we're going to implement a cache, I'd really like to see it done
> > at a higher level (somewhere in the connection class) and done more
> > generically than this is here, which would ensure that all endpoints
> > are sped up, not just sensors. With that said, if this really
> > solves the problem and solves it well, I can't very well ignore it.
> But is it really a problem? I'm still not sure; probably it's just
> that we invented a weird usecase not worth optimising for? Have the
> others faced it and do they consider it to be a problem?
It's for sure a problem, although your 4 second response times sound
worse than most. On an ast2500 I've normally seen a 700ms (ish)
response time, but depending on how many sensors you have, and how
fast the backends are for each one, I could easily see it getting up
there pretty easily.
Just one thing to double check; You don't have any custom sensor
daemons running on your system do you? Ideally everything should be
implementing object manager. If it doesn't, the code paths to build
the sensor schema has to take the very slow path (ie calling Get on
each property individually).
> > > I would be willing to work on doing the right thing but for that I'm
> > > lacking the understanding of the complete picture involved in handling
> > > all types of sensors and interfaces, so I'm kindly asking for your
> > > help with it.
> > >
> > > diff --git a/redfish-core/lib/sensors.hpp b/redfish-core/lib/sensors.hpp
> > Can you please submit the below to gerrit as a WIP instead. There's
> > much better tooling there for reviewing and testing patches. I can
> > review it more there.
> Sure, will do. I didn't mean to offer it as a solution, it's just to
> prove my understanding of where the bottlenecks are was about correct.
> My real point is that apparently bmcweb is (in some cases) requesting
> way too much data in way too many calls, using only a tiny fraction of
> the obtained values in the end. That's why I made that lengthy
> description of how data for a single sensor is obtained. Caching can
> hide this to a degree, but shouldn't the root cause be fixed as well?
Yes, we should come at it from both angles, because a cache will still
have to invalidate quite a bit, and nobody likes a bimodal
distribution of response times. The next thing on my personal list is
to get a new version of GetSubtree added to the mapper, that accepts a
list of optional interfaces as well as required. I talked about this
a while back on discord. This might clean up a couple of the extra
calls in the sensor code, but the primary target was inventory
In a perfect world where software is easy and architectures are
beautiful, ideally I'd like to see most redfish calls go:
Call mapper once with a list of interfaces; Find all connections and
objects that implement this interface.
Loop over connections, calling GetManagedObjects on each to read in
the data and write directly to redfish tree.
> Does bmcweb really _need_ to do all the calls mentioned? I think the
> fact it performs two identical D-Bus calls in a row gives a clear "no"
You are almost certainly right, two identical calls is wasteful, and
we should clean that up if we can. I'm certain there's probably
several speedups to be gained by someone understanding the sensor code
and optimizing it a little.
> After reading my outline of the process can you tell that's
> exactly the way the OpenBMC D-Bus API is designed to be used?
I'm not sure I quite follow this question. The sensor code you're
looking at evolved significantly over time into the behemoth you see
today. It quite likely has several anti-patterns and duplicated calls
that can be consolidated, but that didn't really show up in the
individual diffs as features were added. I'm really happy someone has
gone through the full flow. Ideally we can start cleaning up the
obvious stuff (duplicated calls, ect) in some patchsets.
> Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
> mailto:fercerpav at gmail.com
More information about the openbmc