Checking for network online

Johnathan Mantey johnathanx.mantey at intel.com
Thu Feb 24 07:04:12 AEDT 2022



On 2/23/22 09:44, Jiaqing Zhao wrote:
> On 2022-02-23 21:48, Patrick Williams wrote:
>> On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
>>> I think a solution is to set RequiredForOnline=no (https://www.freedesktop.org/software/systemd/man/systemd.network.html#RequiredForOnline=) in all network interface config. This option skips the interface when running systemd-networkd-wait-online.service. Canonical netplan (used in ubuntu server) also uses this option to skip the online check for given interface (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L639).
>>>
>>> I'll submit a patch to phosphor-networkd later.
>>
>> I really don't think this is appropriate for all systems.  Services have
>> dependencies on network-online.target for a reason.  If the side-effect of
>> having the BMC network cable unplugged is that the host doesn't boot, that might
>> be entirely reasonable behavior in some environments.
>>
>> We use rsyslog as the mechanism to offload our BMC logging data to an
>> aggregation point.  When you have a very large scale deployment, it is actually
>> better for the system to not come online than for us to lose out on that data,
>> since we have spare capacity to take its place.
> 
> My understanding is that in OpenBMC, the propose to use rsyslog is to format the Redfish and IPMI SEL logs from system journal. The "r" of rsyslogd is not used in most cases. I think the "network not available" can be handled same as "server misconfigured" in rsyslogd, as in both cases it fails to connect to the server, and may exit or print some error messages? (not tried yet)
> 
> Jonathan mentions that the 120s wait blocks multi-user.target in his initial email. Considering that there is no BMC serial port in most production hardware, when BMC has no network connection, the only way to interact with BMC is to use IPMI in host. However, IPMI services are started in multi-user.target, if BMC infinitely waits network online, there would be no way to debug the issue.
> 
>> Note that the Canonical netplan only applies this option if the configuration
>> indicates that the interface is optional, which is entirely appropriate.  The
>> way you wrote it could have been interpreted that they set this on *every*
>> interface, which is what it seems like you're proposing to do to
>> phosphor-networkd
>>
>> If this is desired behavior for someone, can't you supply a wildcard .network
>> file that adds this option, rather than modifying phosphor-networkd to manually
>> add it to each network interface that it is managing?
> 
> Maybe we can add a similar DBus property like how netplan does? Reading/writing systemd-networkd config files is feasible in phosphor-networkd. Default value can be assigned via build option.
>   
>> I believe some designs use a USB network device to connect two internal pieces
>> of the system and those interfaces are not necessarily managed by
>> phosphor-networkd (interfaces that, for example connect BMC-to-BMC or
>> BMC-to-Host).  While it is obviously up to the system designer to work through
>> this bug, by applying this configuration as you proposed you are causing
>> unusual default behavior in that networkd is going to start waiting for these
>> internal connections to come online instead of the external interface.
> 
> I think this is a extremely rare case, internal interfaces should be configurable. For example, host OS can change the IP of its BMC-Host virtual interface, BMC should also be able to change its, and for BMC-to-BMC interfaces, it is impossible to assign a fixed LAN IP without conflicts in manufacturing. The easiest way to configure it is to utilize the phosphor-networkd.
> 
> Even it is not managed by phosphor-networkd, keeping default RequiredForOnline=yes will cause the 120s wait on BMC boot. Developers can simply search it and find out the solution. I remember it will show a timer with message on BMC serial console, that's how I found I should set the "optional" on my ubuntu server.

FWIW, my experimentation with systemd-networkd-wait-online was not 
successful in doing much to change the 120 second timeout.

Setting the RequiredForOnline entry to false in systemd.network did not 
prevent the 120 second timeout from elapsing.

Setting any of the following switches in the service file failed to 
eliminate the timeout:
--ignore=eth0
--interface=eth0:no-carrier            # overrides RequiredForOnline
--interface=eth0:no-carrier:no-carrier # <- probably a bad setting in
                                        # hindsight

It appears systemd-networkd-wait-online expects some state greater than 
no-carrier to consider the link online, thus allowing it to exit with a 
SUCCESS error code. This even when explicitly instructed no-carrier is 
defined as "online".

The only switch that seemed to perform as expected in this instance was 
--timeout. Assigning a value less than 120 to the --timeout control did 
reduce the wait period. It does assign a SUCCESS error code upon timing 
out, which is expected behavior.

systemd-networkd-wait-online appears to have logic preventing no-carrier 
state from being assigned as the "network online" value.

rsyslogd has both a network and network-online target. If the 
network-online target is removed then systemd-networkd-wait-online 
doesn't run, and any configuation of that service appears to be 
pointless. The conclusion I have from that is that network-online.target 
is a valid configuration option for a service to assign.

There may be openbmc powered servers that do use the distributed logging 
provided by rsyslogd. If there are then globally removing network-online 
from the rsyslog service file is undesirable. I consider the same to be 
true of assigning a default RequiredForOnline=false.

Based on the above, it's my opinion this is a vendor based decision for 
how to configure rsyslog/systemd-networkd-wait-online.

-- 
Johnathan Mantey
Senior Software Engineer
*azad te**chnology partners*
Contributing to Technology Innovation since 1992
Phone: (503) 712-6764
Email: johnathanx.mantey at intel.com <mailto:johnathanx.mantey at intel.com>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20220223/56b60fee/attachment.sig>


More information about the openbmc mailing list