Out-of-band NIC management
Justin.Lee1 at Dell.com
Justin.Lee1 at Dell.com
Thu Jul 18 05:44:42 AEST 2019
Hi Ben,
I have a few questions about the 2.c item below.
> For the CLI tool and management & monitoring daemon, I was initially thinking using NC-SI over RMII/RBT, mainly because kernel already supports this and it provides a netlink interface for userspace to send/receive commands.
> But I think we can make our management tool transportation agnostic, so for NCSIoRMII/RBT, it communicates to kernel NCSI driver over netlink, and for NCSI over MCTP, it uses a the mechanism provided by libmctp.
>
> > > And in kernel 5.x , NC-SI driver supports Netlink interface for
> > > communicating with userspace processes.
> > >
> > > I'm thinking adding the following tools to OpenBMC as a starting
> > > point and build form there:
> > >
> > > 1. A command line utility (e.g. ncsi-util) to send raw NC-SI
> > > commands, useful for debugging and initial NIC bring up,
> > > For example:
> > > ncsi-util -eth0 -ch 0 <raw NC-SI command>
> > >
> > > We can further extend this command line tool to support other
> > > management interfaces, e.g sending MCTP or PLDM commands to NIC.
> > >
> > > 2. A daemon running on OpenBMC (e.g ncsid) monitoring NIC
> > > status, for example:
> > > a. Query and log NIC capability and current parameter
> > > setting
> > > b. Periodically check NIC link status, re-initialize NC-SI
> > > link if NIC is unreachable, log the status
> > > c. Enable and monitor NIC Asynchronous Event Notifications
> > > (AENs)
For selected channels, AEN is enabled by default. Do you plan to enable the AEN for non-selected channels too?
If yes, what is the approach you are going to do? Enable it by userspace or modify NC-SI driver to achieve that?
We are planning to monitor all channels but still look for the best way.
For delivering the AEN to userspace, currently, we implement it via the mcgrps locally but plan to upstream.
enum ncsi_genl_multicast_groups {
NCSI_GENL_MCGRP_AEN,
};
static const struct genl_multicast_group ncsi_genl_mcgrps[] = {
[NCSI_GENL_MCGRP_AEN] = { .name = NCSI_GENL_MCGRP_AEN_NAME },
};
static struct genl_family ncsi_genl_family __ro_after_init = {
.name = "NCSI",
.version = 0,
.maxattr = NCSI_ATTR_MAX,
.module = THIS_MODULE,
.ops = ncsi_ops,
.n_ops = ARRAY_SIZE(ncsi_ops),
.mcgrps = ncsi_genl_mcgrps,
.n_mcgrps = ARRAY_SIZE(ncsi_genl_mcgrps),
};
> > > i. such as Link Status Change, Configuration
> > > required, Host driver status change
> > > ii. there are OEM-specific AENs that BMC may also
> > > enable and monitor
> > > iii. either log these events, and/or performs
> > > recovery and remediation as needed
> > > d. Additional monitoring such as
> > > i. temperature (not in standard NC-SI command yet),
> > > ii. firmware version, update event, network traffic
> > > statistics
> > >
> > > Both the CLI tool and the monitoring daemon can either communicate
> > > to kernel driver directly via Netlink independently, or we can have
> > > the ncsi daemon acting as command serializer to kernel and other
> > > user
> > space processes.
> > > These are just some of my initial thoughts and I'd love to hear some
> > > feedback if these would be useful to OpenBMC.
> > >
> > > If anyone in interested in collaborate on these we can discuss more
> > > on features and design details.
> > I am interested in collaborating on the design details.
>
> Great! I can put a draft on Gerrit and we can work together on this. Do you have additional uses cases you're looking for?
>
> Regards
> -Ben
Thanks,
Justin
More information about the openbmc
mailing list