support NVMe drive health monitoring

Andrew Jeffery andrew at aj.id.au
Fri Nov 13 09:56:24 AEDT 2020


Hey Ed,

On Fri, 13 Nov 2020, at 02:55, Ed Tanous wrote:
> On Wed, Nov 11, 2020 at 9:56 PM Jet Li <Jet.Li at ibm.com> wrote:
> >
> > Hi Ed, Rashmi,
> >
> > IBM are working on supporting NVMe drive health monitoring for some new system designs and have been posting some patches to dbus-sensors (e.g. [1]).
> >
> > [1] https://gerrit.openbmc-project.xyz/c/openbmc/dbus-sensors/+/38058/
> >
> > As some background our high level requirements are that we need to:
> >
> > 1. Monitor NVMe drive health to supply data for fan management and error logging
> > 2. Support arbitrary drive configurations in the platform(s), which are plugged into removable IO cards
> >
> > As you're no doubt aware, NVMe drive state is exposed via two interfaces:
> >
> > 3. A Basic Management Command over SMBus
> > 4. NVMe-MI over MCTP, via the SMBus MCTP binding
> >
> > And in OpenBMC there are two corresponding NVMe monitoring implementations:
> >
> > 5. phosphor-nvme: Basic Management Command, static configuration
> > 6. dbus-sensors: NVMe-MI over MCTP, dynamic configuration
> >
> > IBM are migrating our BMC designs towards entity-manager and dbus-sensors, and our newer system designs would benefit from dbus-sensor's dynamic configuration via entity-manager (we have reservations about the drive presence GPIO configuration in the phosphor-nvme with respect to our system designs).
> > Zooming out briefly we're looking to, where possible, use upstream code and avoid the use of forks. However, as dbus-sensors uses NVMe-MI over MCTP via libmctp, we have some concerns:
> >
> > 7. Upstream libmctp (openbmc/libmctp) currently implements just two bindings
> >     a. Serial (DSP0253)
> >     b. ASTLPC (Vendor-specific binding)
> > 8. Intel have forked libmctp (intel-bmc/libmctp) and implemented a further two bindings
> >     a. SMBus (DSP0237)
> >     b. PCIe VDM (DSP0238)
> > 9. Both the SMBus and PCIe VDM binding implementations in intel-bmc/libmctp require kernel patches that only exist in Intel's OpenBMC tree
> 
> Nit;  the SMBus side requires one patch which was written by aspeed,
> and was part of the Aspeed original BSP.  That doesn't exist "only" in
> the Intel tree.

The broader point was that it doesn't exist in trees that we care about: 
openbmc/linux or Torvalds' tree. It probably could have been worded a bit 
better.

> 
> It should also be noted, this patch was in fact submitted to the
> kernel, and hopefully looks like it just needs some minor cleanups
> that never got done.
> https://lkml.org/lkml/2018/4/23/835

There's also the issue of the Intel fork of libmctp, so there's a bit of a 
dependency chain of out-of-tree patches going on here. No-one has tried to 
upstream the SMBus binding implementation, probably because no-one has 
completed the job of upstreaming the required kernel interface.

There was a short route to getting the information we want while avoiding 
forks, which was to go via the basic management command (which I acknowledge 
also has its issues).

> 
> PCIe VDM, I agree with you, is only in the intel tree.
> 
> >
> > Finally, for the moment, the data provided by the NVMe Basic Management Command meets IBM's current requirements, and we're using drives that support the Basic Management Command.
> >
> > So the direction we chose is to use entity-manager and dbus-sensors for NVMe drive monitoring, and
> > implement support for the Basic Management Command over SMBus in the NVMeSensor application. To get there, as far as I have determined, we should do the following:
> >
> > 10. Make optional the dependency of NVMeSensor on the forked libmctp
> > 11. Add a compile-time flag to {en,dis}able the MCTP NVMe-MI backend
> > 12. Add a compile-time flag to {en,dis}able the Basic Management Command backend
> > 13. Patch intel-bmc/openbmc to configure NVMeSensor with MCTP NVMe-MI enabled
> > 14. Change the default build configuration of NVMeSensor use the Basic Management Command
> > 15. Enable out-of-tree builds of NVMeSensor by default
> > 16. Add NVMeSensor unit tests
> > 17. Enable CI for dbus-sensors where we can / is necessary
> >
> >
> 
> You've listed quite a few things here, but I'm not following "why" we
> need two NVMe implementations.  MCTP NVMe-MI is more feature rich, and
> is much better supported than its MI basic brethren, and generally
> provides more information.  Considering we have patches to do it, why
> don't we just work on getting the required patch upstreamed?

As above, it's not just the one required patch to the kernel, we also have to 
get the libmctp SMBus binding implementation upstream. Putting on my libmctp 
maintainer hat, I'd like to see some effort from Intel on both of those fronts.

>  On top
> of that, the committee that writes the NVMe-MI spec keeps threatening
> to deprecate it, so we might end up with code that's used for one
> generation, then never again.  I really don't like the idea of having
> a second NVMe subsystem just because one is _slightly_ easier to
> write. 

I disagree with "_slightly_" if you account for the MCTP stack that goes with 
it, and the fact that the kernel and libmctp code is not yet upstream.

> If there's other good technical reasons why a user would
> prefer MI-basic, and we can get those reasons written down, I'm happy
> to hear them, but if the overall reason is "we don't want to upstream
> code to the kernel" that doesn't seem like a good enough justification
> to build out support for both;

Implementing the SMBus MCTP binding around the i2c-slave-mqueue kernel 
interface is just a stop-gap solution in place of the (continually deferred) 
in-kernel socket-based MCTP implementation (no-one besides Jeremy has really 
put significant effort into collaborating with me on that concept). So if we're 
aiming for stop-gaps, why not implement support for the basic management 
command given the rest of the code is out-of-tree?

The trade-offs here seem to be the crux of the discussion.

>  Especially considering the MI basic
> functionality probably would best be done as a kernel driver.

Right, that's an interesting idea.

> 
> It should also be noted if we end up doing this, I'd probably advocate
> for it to be its own separate sensor, distinct from the MCTP one,
> because from a BMC perspective, it has very little in common with the
> existing NVMe sensor (different protocols, different IO loops, ect),

The difference in IO loop does have an impact here - I hadn't completely 
appreciated how we'd make the synchronous call fit into the current code. The 
hack approach was to just not worry about it for the moment. However, 
implementing the basic command backend isn't actually that much of a change; 
it's a bit of a reorganisation of the NVMeContext class and a small encoder / 
decoder for the SMBus commands.

> and would probably be a good candidate for adding the hwmon.  Also, it
> would need to be distinctly selectable from the entity-manager json,
> as some drives don't possess support for NVMe-MI basic, and we need to
> keep track of which protocol to use in which case.

If both backends were built in, there's never a reason to use the basic 
management command. If only one of the backends is built in then you don't need 
to keep track unless the interface presented on DBus was different depending on 
the backend, but I was hoping we could avoid that.

Thanks for the response!

Andrew


More information about the openbmc mailing list