phosphor-network terminated due to SIGBUS
William Kennington
wak at google.com
Sat Sep 9 06:38:17 AEST 2023
It should be fixed now :)
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/66533
On Thu, Sep 7, 2023 at 6:24 PM Chhabra, DipinderSingh <
Dipinder.Chhabra at dell.com> wrote:
> Something to do with the callback in the timer context. As a temporary
> workaround, I have removed the inline implementation of reloadConfigs and
> moved the completed block of code inside reload.setCallback directly inside
> reloadConfigs (including reloadPreHooks, actual dbus call and
> reloadPostHooks). This works pretty good and no more SIGBUS.
>
>
>
> Depending upon the scenario this will cause multiple Reload calls to
> systemd-networkd (unlike the timer case where it be always be a single
> call) but I guess it may be ok in the interim.
>
>
>
> Will continue investigating further from my end too.
>
>
>
> Thanks
>
> Dipinder
>
>
>
> *From:* William Kennington <wak at google.com>
> *Sent:* Thursday, September 7, 2023 5:07 PM
> *To:* Chhabra, DipinderSingh <Dipinder_Chhabra at Dell.com>
> *Cc:* openbmc at lists.ozlabs.org
> *Subject:* Re: phosphor-network terminated due to SIGBUS
>
>
>
> [EXTERNAL EMAIL]
>
> We are investigating the same issue on our side, I'm trying some other
> tests to figure out why the references aren't working as expected.
>
>
>
> On Thu, Sep 7, 2023 at 1:27 PM Chhabra, DipinderSingh <
> Dipinder.Chhabra at dell.com> wrote:
>
> Yes.
>
>
>
> *From:* William Kennington <wak at google.com>
> *Sent:* Thursday, September 7, 2023 2:55 PM
> *To:* Chhabra, DipinderSingh <Dipinder_Chhabra at Dell.com>
> *Cc:* openbmc at lists.ozlabs.org
> *Subject:* Re: phosphor-network terminated due to SIGBUS
>
>
>
> [EXTERNAL EMAIL]
>
> Do you happen to be using aarch64?
>
>
>
> On Thu, Sep 7, 2023 at 12:52 PM Chhabra, DipinderSingh <
> Dipinder.Chhabra at dell.com> wrote:
>
> Hi There
>
>
>
> Recently we updated our OpenBMC distro to tag 2.14.0 (phosphor-network
> SRCREV f78a415e154bac274e1d07ce8128c69e9d1cd710).
>
>
>
> Since then we are seeing that the phosphor-network service crashes after
> configuration change due to SIGBUS.
>
>
>
> Sep 07 09:51:45 bmc phosphor-network-manager[627]: Wrote networkd file: /etc/systemd/network/00-bmc-end1.network
>
> Sep 07 09:51:45 bmc phosphor-network-manager[627]: Wrote networkd file: /etc/systemd/network/00-bmc-end0.network
>
> Sep 07 09:51:49 bmc systemd[1]: xyz.openbmc_project.Network.service: Main process exited, code=dumped, status=7/BUS
>
> Sep 07 09:51:49 bmc systemd[1]: xyz.openbmc_project.Network.service: Failed with result 'core-dump'.
>
> Sep 07 09:51:49 bmc systemd[1]: xyz.openbmc_project.Network.service: Consumed 1.365s CPU time.
>
> Sep 07 09:51:50 bmc systemd[1]: xyz.openbmc_project.Network.service: Scheduled restart job, restart counter is at 1.
>
> Sep 07 09:51:50 bmc systemd[1]: Stopped Phosphor Network Manager.
>
> Sep 07 09:51:50 bmc systemd[1]: xyz.openbmc_project.Network.service: Consumed 1.365s CPU time.
>
> Sep 07 09:51:50 bmc systemd[1]: Starting Phosphor Network Manager...
>
>
>
> Based on my debugging, I can confirm that the timer gets scheduled
> correctly after the config write and the registered call back does get
> invoked. The crash happens due to the below dbus call in
> network_manager.cpp.
>
>
>
> try
>
> {
>
> bus.get()
>
> .new_method_call("org.freedesktop.network1",
>
> "/org/freedesktop/network1",
>
> "org.freedesktop.network1.Manager",
> "Reload")
>
> .call();
>
> lg2::info("Reloaded systemd-networkd");
>
> }
>
>
>
> I have looked into any fixes to this in the later commits but do not find
> any.
>
>
>
> I also tried to change it to call_noreply but that does not help and get
> the same BUS error.
>
>
>
> try
>
> {
>
> lg2::info("Try systemd-networkd reload...");
>
> auto method = bus.get().new_method_call(NETWORKD_BUSNAME, NETWORKD_PATH,
>
> NETWORKD_INTERFACE, "Reload");
>
> bus.get().call_noreply(method);
>
> lg2::info("Reloaded systemd-networkd");
>
> }
>
>
>
> When I manually invoke this from the shell that seems to go fine.
>
>
>
> root at bmc:~# busctl call org.freedesktop.network1 /org/freedesktop/network1 org.freedesktop.network1.Manager Reload
>
> root at bmc:~# echo $?
>
> 0
>
>
>
> Anyone else seeing this issue with phosphor-network or any idea why this
> could be happening?
>
>
>
> Thanks
>
> Dip
>
>
>
> Internal Use - Confidential
>
>
>
> Internal Use - Confidential
>
>
>
> Internal Use - Confidential
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20230908/694e79f8/attachment-0001.htm>
More information about the openbmc
mailing list