Checking for network online
Patrick Williams
patrick at stwcx.xyz
Thu Feb 24 05:55:59 AEDT 2022
On Thu, Feb 24, 2022 at 01:44:18AM +0800, Jiaqing Zhao wrote:
> On 2022-02-23 21:48, Patrick Williams wrote:
> > On Wed, Feb 23, 2022 at 10:09:19AM +0800, Jiaqing Zhao wrote:
> >> I think a solution is to set RequiredForOnline=no (https://www.freedesktop.org/software/systemd/man/systemd.network.html#RequiredForOnline=) in all network interface config. This option skips the interface when running systemd-networkd-wait-online.service. Canonical netplan (used in ubuntu server) also uses this option to skip the online check for given interface (https://github.com/canonical/netplan/blob/main/src/networkd.c#L636-L639).
> >>
> >> I'll submit a patch to phosphor-networkd later.
> >
> > I really don't think this is appropriate for all systems. Services have
> > dependencies on network-online.target for a reason. If the side-effect of
> > having the BMC network cable unplugged is that the host doesn't boot, that might
> > be entirely reasonable behavior in some environments.
> >
> > We use rsyslog as the mechanism to offload our BMC logging data to an
> > aggregation point. When you have a very large scale deployment, it is actually
> > better for the system to not come online than for us to lose out on that data,
> > since we have spare capacity to take its place.
>
> My understanding is that in OpenBMC, the propose to use rsyslog is to format the Redfish and IPMI SEL logs from system journal. The "r" of rsyslogd is not used in most cases.
I might have left some ambiguity in 'we' in this context. I meant 'the
deployments I am working on'. I believe at least one other company leverages
this as well.
> I think the "network not available" can be handled same as "server misconfigured" in rsyslogd, as in both cases it fails to connect to the server, and may exit or print some error messages? (not tried yet)
That is probably true, but it means that I can't offload any data about the
system in the meantime. Like I said, I'd rather leave the system out of my
deployment if it is degraded.
>
> Jonathan mentions that the 120s wait blocks multi-user.target in his initial email. Considering that there is no BMC serial port in most production hardware, when BMC has no network connection, the only way to interact with BMC is to use IPMI in host.
Your assertion "no BMC serial port in most production hardware" might be true
globally speaking. It isn't necessarily true for any particular deployment.
With the 120s wait time, is rsyslog actually running after that? Or is it
failed? I guess since it has a Wants and not a Requires on network-online,
it'll still start up after the 120s timeout of systemd-networkd-wait-online.
My understanding of systemd-networkd's defaults here is that it waits for DHCP
in order for network-online.target to pass. You can have the IPv6-LL address
configured still, which can allow remote access, even if the IPv6-DHCP address
is not assigned.
> However, IPMI services are started in multi-user.target, if BMC infinitely waits network online, there would be no way to debug the issue.
Sure, but the BMC doesn't wait forever, does it? It just waits 120s.
I'm not suggesting this isn't the right solution for your systems, or even that
it might not be the right solution for most systems, but I don't think it is the
right solution for _all_ systems so we need to ensure it can be opt-out.
>
> > Note that the Canonical netplan only applies this option if the configuration
> > indicates that the interface is optional, which is entirely appropriate. The
> > way you wrote it could have been interpreted that they set this on *every*
> > interface, which is what it seems like you're proposing to do to
> > phosphor-networkd
> >
> > If this is desired behavior for someone, can't you supply a wildcard .network
> > file that adds this option, rather than modifying phosphor-networkd to manually
> > add it to each network interface that it is managing?
>
> Maybe we can add a similar DBus property like how netplan does? Reading/writing systemd-networkd config files is feasible in phosphor-networkd. Default value can be assigned via build option.
I'm not sure if it belongs as a DBus property or not. I'd have to see what
you're proposing and think about it. I think this is a system design constraint
and not really configurable by users (hence why exposing a DBus property might
not make sense) but maybe I'm wrong on this.
> > I believe some designs use a USB network device to connect two internal pieces
> > of the system and those interfaces are not necessarily managed by
> > phosphor-networkd (interfaces that, for example connect BMC-to-BMC or
> > BMC-to-Host). While it is obviously up to the system designer to work through
> > this bug, by applying this configuration as you proposed you are causing
> > unusual default behavior in that networkd is going to start waiting for these
> > internal connections to come online instead of the external interface.
>
> I think this is a extremely rare case, internal interfaces should be configurable. For example, host OS can change the IP of its BMC-Host virtual interface, BMC should also be able to change its, and for BMC-to-BMC interfaces, it is impossible to assign a fixed LAN IP without conflicts in manufacturing.
I don't follow your concern here. We can (and do) easily assign a static IP
address for the BMC-to-BMC interfaces based on position information fed into the
BMC via GPIO signals.
> The easiest way to configure it is to utilize the phosphor-networkd.
>
> Even it is not managed by phosphor-networkd, keeping default RequiredForOnline=yes will cause the 120s wait on BMC boot. Developers can simply search it and find out the solution. I remember it will show a timer with message on BMC serial console, that's how I found I should set the "optional" on my ubuntu server.
Agreed. Someone _can_ find it and debug it. It is to me not an obvious or easy
thing to work out though because automated "network down" test cases are not
often done in my experience.
--
Patrick Williams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20220223/8d409aab/attachment-0001.sig>
More information about the openbmc
mailing list