Checking for network online
Milton Miller II
miltonm at us.ibm.com
Wed Mar 2 06:56:23 AEDT 2022
On Feb 23, 2022, Johnathan Mantey wrote:
>On 2/23/22 09:44, Jiaqing Zhao wrote:
>> On 2022-02-23 21:48, Patrick Williams wrote:
>>> On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
>>>> I think a solution is to set RequiredForOnline=no
>>>> (https://www.freedesktop.org/software/systemd/man/systemd.network.htm
>>>> l#RequiredForOnline=) in all network interface config. This option
>>>> skips the interface when running
>>>> systemd-networkd-wait-online.service. Canonical netplan (used in
>>>> ubuntu server) also uses this option to skip the online check for
>>>> given interface
>>>> (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L
>>>> 639).
>>>>
>>>> I'll submit a patch to phosphor-networkd later.
>>>
>>> I really don't think this is appropriate for all systems.
>>> Services have
>>> dependencies on network-online.target for a reason. If the
>>> side-effect of
>>> having the BMC network cable unplugged is that the host doesn't
>>> boot, that might
>>> be entirely reasonable behavior in some environments.
>>>
>>> We use rsyslog as the mechanism to offload our BMC logging data to
>>> an
>>> aggregation point. When you have a very large scale deployment,
>>> it is actually
>>> better for the system to not come online than for us to lose out
>>> on that data,
>>> since we have spare capacity to take its place.
>>
>> My understanding is that in OpenBMC, the propose to use rsyslog is
>> to format the Redfish and IPMI SEL logs from system journal. The "r"
>> of rsyslogd is not used in most cases. I think the "network not
>> available" can be handled same as "server misconfigured" in rsyslogd,
>> as in both cases it fails to connect to the server, and may exit or
>> print some error messages? (not tried yet)
>>
>> Jonathan mentions that the 120s wait blocks multi-user.target in
>> his initial email. Considering that there is no BMC serial port in
>> most production hardware, when BMC has no network connection, the
>> only way to interact with BMC is to use IPMI in host. However, IPMI
>> services are started in multi-user.target, if BMC infinitely waits
>> network online, there would be no way to debug the issue.
>>
>>> Note that the Canonical netplan only applies this option if the
>>> configuration
>>> indicates that the interface is optional, which is entirely
>>> appropriate. The
>>> way you wrote it could have been interpreted that they set this on
>>> *every*
>>> interface, which is what it seems like you're proposing to do to
>>> phosphor-networkd
>>>
>>> If this is desired behavior for someone, can't you supply a
>>> wildcard .network
>>> file that adds this option, rather than modifying
>>> phosphor-networkd to manually
>>> add it to each network interface that it is managing?
>>
>> Maybe we can add a similar DBus property like how netplan does?
>> Reading/writing systemd-networkd config files is feasible in
>> phosphor-networkd. Default value can be assigned via build option.
>>
>>> I believe some designs use a USB network device to connect two
>>> internal pieces
>>> of the system and those interfaces are not necessarily managed by
>>> phosphor-networkd (interfaces that, for example connect BMC-to-BMC
>>> or
>>> BMC-to-Host). While it is obviously up to the system designer to
>>> work through
>>> this bug, by applying this configuration as you proposed you are
>>> causing
>>> unusual default behavior in that networkd is going to start
>>> waiting for these
>>> internal connections to come online instead of the external
>>> interface.
>>
>> I think this is a extremely rare case, internal interfaces should
>> be configurable. For example, host OS can change the IP of its
>> BMC-Host virtual interface, BMC should also be able to change its,
>> and for BMC-to-BMC interfaces, it is impossible to assign a fixed LAN
>> IP without conflicts in manufacturing. The easiest way to configure
>> it is to utilize the phosphor-networkd.
>>
>> Even it is not managed by phosphor-networkd, keeping default
>> RequiredForOnline=yes will cause the 120s wait on BMC boot.
>> Developers can simply search it and find out the solution. I remember
>> it will show a timer with message on BMC serial console, that's how I
>>found I should set the "optional" on my ubuntu server.
>
>FWIW, my experimentation with systemd-networkd-wait-online was not
>successful in doing much to change the 120 second timeout.
>
>Setting the RequiredForOnline entry to false in systemd.network did
>not
>prevent the 120 second timeout from elapsing.
>
>Setting any of the following switches in the service file failed to
>eliminate the timeout:
>--ignore=eth0
>--interface=eth0:no-carrier # overrides RequiredForOnline
>--interface=eth0:no-carrier:no-carrier # <- probably a bad setting in
> # hindsight
>
>It appears systemd-networkd-wait-online expects some state greater
>than
>no-carrier to consider the link online, thus allowing it to exit with
>a
>SUCCESS error code. This even when explicitly instructed no-carrier
>is
>defined as "online".
>
>The only switch that seemed to perform as expected in this instance
>was
>--timeout. Assigning a value less than 120 to the --timeout control
>did
>reduce the wait period. It does assign a SUCCESS error code upon
>timing
>out, which is expected behavior.
>
>systemd-networkd-wait-online appears to have logic preventing
>no-carrier
>state from being assigned as the "network online" value.
>
>rsyslogd has both a network and network-online target. If the
>network-online target is removed then systemd-networkd-wait-online
>doesn't run, and any configuation of that service appears to be
>pointless. The conclusion I have from that is that
>network-online.target
>is a valid configuration option for a service to assign.
>
>There may be openbmc powered servers that do use the distributed
>logging
>provided by rsyslogd. If there are then globally removing
>network-online
>from the rsyslog service file is undesirable. I consider the same to
>be
>true of assigning a default RequiredForOnline=false.
>
>Based on the above, it's my opinion this is a vendor based decision
>for
>how to configure rsyslog/systemd-networkd-wait-online.
>
I just wanted to point out that for those using the kernel NCSI stack,
the networks are always showing on line and link up because of how
the stack was created. My reading is it would take a new slave
interface to overcome this limitation.
Milton
More information about the openbmc
mailing list