phosphor-network terminated due to SIGBUS

Chhabra, DipinderSingh Dipinder.Chhabra at dell.com
Fri Sep 8 11:24:04 AEST 2023


Something to do with the callback in the timer context. As a temporary workaround, I have removed the inline implementation of reloadConfigs and moved the completed block of code inside reload.setCallback directly inside reloadConfigs (including reloadPreHooks, actual dbus call and reloadPostHooks). This works pretty good and no more SIGBUS.

Depending upon the scenario this will cause multiple Reload calls to systemd-networkd (unlike the timer case where it be always be a single call) but I guess it may be ok in the interim.

Will continue investigating further from my end too.

Thanks
Dipinder

From: William Kennington <wak at google.com>
Sent: Thursday, September 7, 2023 5:07 PM
To: Chhabra, DipinderSingh <Dipinder_Chhabra at Dell.com>
Cc: openbmc at lists.ozlabs.org
Subject: Re: phosphor-network terminated due to SIGBUS


[EXTERNAL EMAIL]
We are investigating the same issue on our side, I'm trying some other tests to figure out why the references aren't working as expected.

On Thu, Sep 7, 2023 at 1:27 PM Chhabra, DipinderSingh <Dipinder.Chhabra at dell.com<mailto:Dipinder.Chhabra at dell.com>> wrote:
Yes.

From: William Kennington <wak at google.com<mailto:wak at google.com>>
Sent: Thursday, September 7, 2023 2:55 PM
To: Chhabra, DipinderSingh <Dipinder_Chhabra at Dell.com<mailto:Dipinder_Chhabra at Dell.com>>
Cc: openbmc at lists.ozlabs.org<mailto:openbmc at lists.ozlabs.org>
Subject: Re: phosphor-network terminated due to SIGBUS


[EXTERNAL EMAIL]
Do you happen to be using aarch64?

On Thu, Sep 7, 2023 at 12:52 PM Chhabra, DipinderSingh <Dipinder.Chhabra at dell.com<mailto:Dipinder.Chhabra at dell.com>> wrote:
Hi There

Recently we updated our OpenBMC distro to tag 2.14.0 (phosphor-network SRCREV f78a415e154bac274e1d07ce8128c69e9d1cd710).

Since then we are seeing that the phosphor-network service crashes after configuration change due to SIGBUS.


Sep 07 09:51:45 bmc phosphor-network-manager[627]: Wrote networkd file: /etc/systemd/network/00-bmc-end1.network

Sep 07 09:51:45 bmc phosphor-network-manager[627]: Wrote networkd file: /etc/systemd/network/00-bmc-end0.network

Sep 07 09:51:49 bmc systemd[1]: xyz.openbmc_project.Network.service: Main process exited, code=dumped, status=7/BUS

Sep 07 09:51:49 bmc systemd[1]: xyz.openbmc_project.Network.service: Failed with result 'core-dump'.

Sep 07 09:51:49 bmc systemd[1]: xyz.openbmc_project.Network.service: Consumed 1.365s CPU time.

Sep 07 09:51:50 bmc systemd[1]: xyz.openbmc_project.Network.service: Scheduled restart job, restart counter is at 1.

Sep 07 09:51:50 bmc systemd[1]: Stopped Phosphor Network Manager.

Sep 07 09:51:50 bmc systemd[1]: xyz.openbmc_project.Network.service: Consumed 1.365s CPU time.

Sep 07 09:51:50 bmc systemd[1]: Starting Phosphor Network Manager...

Based on my debugging, I can confirm that the timer gets scheduled correctly after the config write and the registered call back does get invoked. The crash happens due to the below dbus call in network_manager.cpp.

        try
        {
            bus.get()
                .new_method_call("org.freedesktop.network1",
                                 "/org/freedesktop/network1",
                                 "org.freedesktop.network1.Manager", "Reload")
                .call();
            lg2::info("Reloaded systemd-networkd");
        }

I have looked into any fixes to this in the later commits but do not find any.

I also tried to change it to call_noreply but that does not help and get the same BUS error.


        try

        {

            lg2::info("Try systemd-networkd reload...");

            auto method = bus.get().new_method_call(NETWORKD_BUSNAME, NETWORKD_PATH,

                                 NETWORKD_INTERFACE, "Reload");

            bus.get().call_noreply(method);

            lg2::info("Reloaded systemd-networkd");

        }

When I manually invoke this from the shell that seems to go fine.


root at bmc:~# busctl call org.freedesktop.network1 /org/freedesktop/network1 org.freedesktop.network1.Manager Reload

root at bmc:~# echo $?

0

Anyone else seeing this issue with phosphor-network or any idea why this could be happening?

Thanks
Dip


Internal Use - Confidential


Internal Use - Confidential


Internal Use - Confidential
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20230908/49f9a8e7/attachment-0001.htm>


More information about the openbmc mailing list