[entity-manager] Issue about entity-manager getting stuck

Scron Chang (張仲延) Scron.Chang at quantatw.com
Mon Feb 8 17:53:51 AEDT 2021



> -----Original Message-----
> From: Ed Tanous <edtanous at google.com>
> Sent: Thursday, February 4, 2021 12:11 AM
> To: Scron Chang (張仲延) <Scron.Chang at quantatw.com>
> Cc: openbmc at lists.ozlabs.org
> Subject: Re: [entity-manager] Issue about entity-manager getting stuck
> 
> On Tue, Feb 2, 2021 at 11:22 PM Scron Chang (張仲延)
> <Scron.Chang at quantatw.com> wrote:
> >
> > Hi Ed,
> > Thanks for your reply.
> >
> > In my case, I have a script using the following command to check the host
> status and then resetting the peci module based on its result.
> > busctl get-property xyz.openbmc_project.State.Chassis
> > /xyz/openbmc_project/state/os
> > xyz.openbmc_project.State.OperatingSystem.Status OperatingSystemState
> 
> This sounds like something that should be using a match expression rather
> than polling every second.  If you did that, your problem would likely go
> away (and your system would be better off as a whole for it).
> 

Ok, thanks for your suggestion.

> >
> > Now I understood the reason why entity-manager catch the
> nameOwnerChanged signal.
> > However, please allow me to discuss one question furthermore. How does
> entity-manager define the waiting time for the system to become ready?
> According to the source code, the current waiting time is 5 seconds.
> > (Please refer to this line:
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >
> ub.com%2Fopenbmc%2Fentity-manager%2Fblob%2Ff094125cd3bdbc8737dc
> 8035a6e
> >
> 9ac252f6e8840%2Fsrc%2FEntityManager.cpp%23L1687&data=04%7C01%
> 7CScr
> >
> on.Chang%40quantatw.com%7C32ff4195653240bb0ea508d8c85e58f8%7C179
> b03270
> >
> 7fc4973ac738de7313561b2%7C1%7C0%7C637479654869269250%7CUnknow
> n%7CTWFpb
> >
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> Mn0
> >
> %3D%7C1000&sdata=jgVbnjuoi4IdO%2FrZjCM9iwFNJU1ho3g64B8DHzZ2
> sC4%3D&
> > amp;reserved=0)
> >
> > If the waiting time can change to 1 second, the entity-manager's response
> can become faster and barely get stuck. I found entity-manager did use 1
> second before this PR.
> > (Please refer to this PR:
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerr
> >
> it.openbmc-project.xyz%2Fc%2Fopenbmc%2Fentity-manager%2F%2B%2F251
> 93&am
> >
> p;data=04%7C01%7CScron.Chang%40quantatw.com%7C32ff4195653240bb0e
> a508d8
> >
> c85e58f8%7C179b032707fc4973ac738de7313561b2%7C1%7C0%7C637479654
> 8692692
> >
> 50%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> iLCJBTiI
> >
> 6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7OvEQ32quWNeqGlDTZN
> dD%2Fi9dB1
> > 1NFoK8s1k3V3k%2F4M%3D&reserved=0)
> >
> > In this PR, there is not much comment. May I ask the reason for changing
> the waiting time? And what should be concerned if entity-manager uses the
> shorter waiting time?
> 
> All the properties changed events can take more than a second to process.
> 5 seconds is on the safe size.
> 
> 
> PS, please don't top post.   This mailing list prefers inline replies.

OK, I understood. Thank you for remind.

> 
> >
> > Scron Chang
> > E-Mail  Scron.Chang at quantatw.com
> > Ext.    11936
> >
> >
> > -----Original Message-----
> > From: Ed Tanous <edtanous at google.com>
> > Sent: Thursday, January 28, 2021 1:07 AM
> > To: Scron Chang (張仲延) <Scron.Chang at quantatw.com>
> > Cc: openbmc at lists.ozlabs.org
> > Subject: Re: [entity-manager] Issue about entity-manager getting stuck
> >
> > On Tue, Jan 26, 2021 at 10:34 PM Scron Chang (張仲延)
> > <Scron.Chang at quantatw.com> wrote:
> > >
> > > Hi all,
> > >
> > > I am using openbmc/entity-manager in this version:
> "f094125cd3bdbc8737dc8035a6e9ac252f6e8840" and I found calling Dbus
> makes entity-manager get stuck.
> > >
> > > Reproduce this by following steps:
> > > 1. systemctl stop xyz.openbmc_project.EntityManager 2. open another
> > > terminal and do this while-loop: "while true; do busctl ; sleep 1; done"
> > > 3. systemctl start xyz.openbmc_project.EntityManager I think the
> > > root cause is this function: "nameOwnerChangedMatch." (Please refer
> > > to this
> > > line:
> > > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > >
> ub.com%2Fopenbmc%2Fentity-manager%2Fblob%2Ff094125cd3bdbc8737dc
> 8035a
> > > 6e
> > >
> 9ac252f6e8840%2Fsrc%2FEntityManager.cpp%23L1859&data=04%7C01%
> 7CS
> > > cr
> > >
> on.Chang%40quantatw.com%7C31b46c0c041b402dc3d608d8c2e5f9dd%7C17
> 9b032
> > > 70
> > >
> 7fc4973ac738de7313561b2%7C1%7C0%7C637473640299652770%7CUnknow
> n%7CTWF
> > > pb
> > >
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> M
> > > n0
> > >
> %3D%7C1000&sdata=VcBRR5%2BTG%2FFscHR14bsDMqghE73qRhGYhztE
> 25FKIzE
> > > %3
> > > D&reserved=0.)
> >
> > My first thought is: Don't run an empty busctl in a loop then, but I'm
> guessing that's not what you're really trying to do.  If we had more ideas
> about what you were really hoping to accomplish, we might have some
> better advice for how to proceed.
> >
> > The intent of that code is to reconfigure entity-manager when interfaces
> are changed, so if you're constantly attaching and detaching to dbus,
> entity-manager (and object manager) never sees the system as "up" and
> keeps waiting for the system to finish stabilizing before it runs the config
> logic.
> >
> > In your specific case above, the code could be a little smarter, and ignore
> unique names in that check, only caring about newly-defined well known
> names, but without knowing your real use case, it's hard to know if that
> would help.
> >
> > >
> > > Manually calling Dbus or calling Dbus in a script makes
> NameOwnerChanged signal and thus triggers the function:
> "propertiesChangedCallback" repeatedly. Meanwhile, the async_wait in
> propertiesChangedCallback gets returned because of the
> operation_aborted.
> >
> > Personal opinion: Don't call busctl continuously in a script.  It's inefficient,
> and causes problems like this.
> >
> > > So here is the conclusion:
> > > Manually calling Dbus in a period that is less than 5 seconds leads
> entity-manager keeping to trigger new async_wait and abort the old one.
> However, the async_wait never gets done.
> > >
> > > Is this a bug of entity-manager, or I get something wrong. Please help me
> with this.
> >
> > IMO, entity-manager is working as intended, but lets try to figure out what
> you're really trying to do, and see if we can find you a solution.
> >
> > >
> > > Scron Chang
> > > E-Mail  Scron.Chang at quantatw.com
> > >


More information about the openbmc mailing list