[entity-manager] Issue about entity-manager getting stuck
Ed Tanous
edtanous at google.com
Fri Jan 29 05:39:44 AEDT 2021
On Thu, Jan 28, 2021 at 1:56 AM Konstantin Klubnichkin
<kitsok at yandex-team.ru> wrote:
>
> Hello, Ed!
>
> I'm not sure if my issue is relevant to what Scron discovered, but it may be.
> Sometimes (not every BMC reboot) dbus get stuck during startup.
That sounds slightly different, but certainly concerning.
> systemctl or dbus don't work (fail by timeout), services stuck trying to start, dbus-broker consumes a lot of CPU and dbus monitor shows storm of "Property Changed" events from anonymous application.
Can you track down which anonymous application it is?
> The work around I've found is to kill dbus-broker and dbus-broker-launch, then I can at least issue "reboot" without "-f", usually (8 times out of 10) BMC starts normally next time.
This doesn't really seem workable long term.
> Unfortunately I don't know how to reproduce the issue for sure. It happens more often when BMC has no network and doesn't have time source like NTP or date/time saved in RTC.
> So I suspect calling busctl in a cycle is not the only way to get system stuck.
Lets see if we can get this debugged. I know I haven't seen anything
similar, so I'm not sure I can be much help to you, but good luck
hunting it down.
>
> Thank you!
>
> 27.01.2021, 20:08, "Ed Tanous" <edtanous at google.com>:
>
> On Tue, Jan 26, 2021 at 10:34 PM Scron Chang (張仲延)
> <Scron.Chang at quantatw.com> wrote:
>
>
> Hi all,
>
> I am using openbmc/entity-manager in this version: "f094125cd3bdbc8737dc8035a6e9ac252f6e8840" and I found calling Dbus makes entity-manager get stuck.
>
> Reproduce this by following steps:
> 1. systemctl stop xyz.openbmc_project.EntityManager
> 2. open another terminal and do this while-loop: "while true; do busctl ; sleep 1; done"
> 3. systemctl start xyz.openbmc_project.EntityManager
> I think the root cause is this function: "nameOwnerChangedMatch." (Please refer to this line: https://github.com/openbmc/entity-manager/blob/f094125cd3bdbc8737dc8035a6e9ac252f6e8840/src/EntityManager.cpp#L1859.)
>
>
> My first thought is: Don't run an empty busctl in a loop then, but I'm
> guessing that's not what you're really trying to do. If we had more
> ideas about what you were really hoping to accomplish, we might have
> some better advice for how to proceed.
>
> The intent of that code is to reconfigure entity-manager when
> interfaces are changed, so if you're constantly attaching and
> detaching to dbus, entity-manager (and object manager) never sees the
> system as "up" and keeps waiting for the system to finish stabilizing
> before it runs the config logic.
>
> In your specific case above, the code could be a little smarter, and
> ignore unique names in that check, only caring about newly-defined
> well known names, but without knowing your real use case, it's hard to
> know if that would help.
>
>
>
> Manually calling Dbus or calling Dbus in a script makes NameOwnerChanged signal and thus triggers the function: "propertiesChangedCallback" repeatedly. Meanwhile, the async_wait in propertiesChangedCallback gets returned because of the operation_aborted.
>
>
> Personal opinion: Don't call busctl continuously in a script. It's
> inefficient, and causes problems like this.
>
>
> So here is the conclusion:
> Manually calling Dbus in a period that is less than 5 seconds leads entity-manager keeping to trigger new async_wait and abort the old one. However, the async_wait never gets done.
>
> Is this a bug of entity-manager, or I get something wrong. Please help me with this.
>
>
> IMO, entity-manager is working as intended, but lets try to figure out
> what you're really trying to do, and see if we can find you a
> solution.
>
>
>
> Scron Chang
> E-Mail Scron.Chang at quantatw.com
>
>
>
>
> --
> Best regards,
> Konstantin Klubnichkin,
> lead firmware engineer,
> server hardware R&D group,
> Yandex Moscow office.
> tel: +7-903-510-33-33
>
More information about the openbmc
mailing list