support NVMe drive health monitoring
Ed Tanous
ed at tanous.net
Fri Nov 13 03:25:08 AEDT 2020
On Wed, Nov 11, 2020 at 9:56 PM Jet Li <Jet.Li at ibm.com> wrote:
>
> Hi Ed, Rashmi,
>
> IBM are working on supporting NVMe drive health monitoring for some new system designs and have been posting some patches to dbus-sensors (e.g. [1]).
>
> [1] https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/38058/
>
> As some background our high level requirements are that we need to:
>
> 1. Monitor NVMe drive health to supply data for fan management and error logging
> 2. Support arbitrary drive configurations in the platform(s), which are plugged into removable IO cards
>
> As you're no doubt aware, NVMe drive state is exposed via two interfaces:
>
> 3. A Basic Management Command over SMBus
> 4. NVMe-MI over MCTP, via the SMBus MCTP binding
>
> And in OpenBMC there are two corresponding NVMe monitoring implementations:
>
> 5. phosphor-nvme: Basic Management Command, static configuration
> 6. dbus-sensors: NVMe-MI over MCTP, dynamic configuration
>
> IBM are migrating our BMC designs towards entity-manager and dbus-sensors, and our newer system designs would benefit from dbus-sensor's dynamic configuration via entity-manager (we have reservations about the drive presence GPIO configuration in the phosphor-nvme with respect to our system designs).
> Zooming out briefly we're looking to, where possible, use upstream code and avoid the use of forks. However, as dbus-sensors uses NVMe-MI over MCTP via libmctp, we have some concerns:
>
> 7. Upstream libmctp (openbmc/libmctp) currently implements just two bindings
> a. Serial (DSP0253)
> b. ASTLPC (Vendor-specific binding)
> 8. Intel have forked libmctp (intel-bmc/libmctp) and implemented a further two bindings
> a. SMBus (DSP0237)
> b. PCIe VDM (DSP0238)
> 9. Both the SMBus and PCIe VDM binding implementations in intel-bmc/libmctp require kernel patches that only exist in Intel's OpenBMC tree
Nit; the SMBus side requires one patch which was written by aspeed,
and was part of the Aspeed original BSP. That doesn't exist "only" in
the Intel tree.
It should also be noted, this patch was in fact submitted to the
kernel, and hopefully looks like it just needs some minor cleanups
that never got done.
https://lkml.org/lkml/2018/4/23/835
PCIe VDM, I agree with you, is only in the intel tree.
>
> Finally, for the moment, the data provided by the NVMe Basic Management Command meets IBM's current requirements, and we're using drives that support the Basic Management Command.
>
> So the direction we chose is to use entity-manager and dbus-sensors for NVMe drive monitoring, and
> implement support for the Basic Management Command over SMBus in the NVMeSensor application. To get there, as far as I have determined, we should do the following:
>
> 10. Make optional the dependency of NVMeSensor on the forked libmctp
> 11. Add a compile-time flag to {en,dis}able the MCTP NVMe-MI backend
> 12. Add a compile-time flag to {en,dis}able the Basic Management Command backend
> 13. Patch intel-bmc/openbmc to configure NVMeSensor with MCTP NVMe-MI enabled
> 14. Change the default build configuration of NVMeSensor use the Basic Management Command
> 15. Enable out-of-tree builds of NVMeSensor by default
> 16. Add NVMeSensor unit tests
> 17. Enable CI for dbus-sensors where we can / is necessary
>
>
You've listed quite a few things here, but I'm not following "why" we
need two NVMe implementations. MCTP NVMe-MI is more feature rich, and
is much better supported than its MI basic brethren, and generally
provides more information. Considering we have patches to do it, why
don't we just work on getting the required patch upstreamed? On top
of that, the committee that writes the NVMe-MI spec keeps threatening
to deprecate it, so we might end up with code that's used for one
generation, then never again. I really don't like the idea of having
a second NVMe subsystem just because one is _slightly_ easier to
write. If there's other good technical reasons why a user would
prefer MI-basic, and we can get those reasons written down, I'm happy
to hear them, but if the overall reason is "we don't want to upstream
code to the kernel" that doesn't seem like a good enough justification
to build out support for both; Especially considering the MI basic
functionality probably would best be done as a kernel driver.
It should also be noted if we end up doing this, I'd probably advocate
for it to be its own separate sensor, distinct from the MCTP one,
because from a BMC perspective, it has very little in common with the
existing NVMe sensor (different protocols, different IO loops, ect),
and would probably be a good candidate for adding the hwmon. Also, it
would need to be distinctly selectable from the entity-manager json,
as some drives don't possess support for NVMe-MI basic, and we need to
keep track of which protocol to use in which case.
Thanks,
-Ed
More information about the openbmc
mailing list