Out-of-band NIC management
Ben Wei
benwei at fb.com
Wed Jul 17 07:45:35 AEST 2019
Hi all,
Would anyone be interested in collaborating on out-of-band NIC management and monitoring?
DMTF has as a NCSI spec (https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf), that defines a standard interface for BMCs to manage NICs.
And in kernel 5.x , NC-SI driver supports Netlink interface for communicating with userspace processes.
I'm thinking adding the following tools to OpenBMC as a starting point and build form there:
1. A command line utility (e.g. ncsi-util) to send raw NC-SI commands, useful for debugging and initial NIC bring up,
For example:
ncsi-util -eth0 -ch 0 <raw NC-SI command>
We can further extend this command line tool to support other management interfaces, e.g sending MCTP or PLDM commands to NIC.
2. A daemon running on OpenBMC (e.g ncsid) monitoring NIC status, for example:
a. Query and log NIC capability and current parameter setting
b. Periodically check NIC link status, re-initialize NC-SI link if NIC is unreachable, log the status
c. Enable and monitor NIC Asynchronous Event Notifications (AENs)
i. such as Link Status Change, Configuration required, Host driver status change
ii. there are OEM-specific AENs that BMC may also enable and monitor
iii. either log these events, and/or performs recovery and remediation as needed
d. Additional monitoring such as
i. temperature (not in standard NC-SI command yet),
ii. firmware version, update event, network traffic statistics
Both the CLI tool and the monitoring daemon can either communicate to kernel driver directly via Netlink independently, or we can have the ncsi daemon acting as command serializer to kernel and other user space processes.
These are just some of my initial thoughts and I'd love to hear some feedback if these would be useful to OpenBMC.
If anyone in interested in collaborate on these we can discuss more on features and design details.
Regards,
-Ben
More information about the openbmc
mailing list