support NVMe drive health monitoring
Andrew Jeffery
andrew at aj.id.au
Fri Nov 13 12:29:08 AEDT 2020
> > > >
> > > > So the direction we chose is to use entity-manager and dbus-sensors for NVMe drive monitoring, and
> > > > implement support for the Basic Management Command over SMBus in the NVMeSensor application. To get there, as far as I have determined, we should do the following:
> > > >
> > > > 10. Make optional the dependency of NVMeSensor on the forked libmctp
> > > > 11. Add a compile-time flag to {en,dis}able the MCTP NVMe-MI backend
> > > > 12. Add a compile-time flag to {en,dis}able the Basic Management Command backend
> > > > 13. Patch intel-bmc/openbmc to configure NVMeSensor with MCTP NVMe-MI enabled
> > > > 14. Change the default build configuration of NVMeSensor use the Basic Management Command
> > > > 15. Enable out-of-tree builds of NVMeSensor by default
> > > > 16. Add NVMeSensor unit tests
> > > > 17. Enable CI for dbus-sensors where we can / is necessary
> > > >
> > > >
> > >
> > > You've listed quite a few things here, but I'm not following "why" we
> > > need two NVMe implementations. MCTP NVMe-MI is more feature rich, and
> > > is much better supported than its MI basic brethren, and generally
> > > provides more information. Considering we have patches to do it, why
> > > don't we just work on getting the required patch upstreamed?
> >
> > As above, it's not just the one required patch to the kernel, we also have to
> > get the libmctp SMBus binding implementation upstream. Putting on my libmctp
> > maintainer hat, I'd like to see some effort from Intel on both of those fronts.
>
> Fair point. Considering I wrote the original libmctp smbus patch, I'm
> far from blameless in this situation, despite my change in email
> address.
:)
>
> >
> > > On top
> > > of that, the committee that writes the NVMe-MI spec keeps threatening
> > > to deprecate it, so we might end up with code that's used for one
> > > generation, then never again. I really don't like the idea of having
> > > a second NVMe subsystem just because one is _slightly_ easier to
> > > write.
> >
> > I disagree with "_slightly_" if you account for the MCTP stack that goes with
> > it, and the fact that the kernel and libmctp code is not yet upstream.
>
> Let me rephrase, if we were to write this as a hwmon driver in
> torvalds/linux, would this be more or less effort than trying to get
> the mctp driver stack upstreamed along with the mslave device driver?
> I had assumed more, but maybe not?
Probably the hardest bit of the i2c-slave-mqueue patch is the energy it
generally takes to argue for new userspace interfaces (answering the question
of whether it's the right abstraction). I imagine it's generally preferred to
implement slave behaviour in-kernel (which is why a userspace interface doesn't
already exist).
This is the attraction of the in-kernel MCTP implementation: we'll have a much
easier time arguing for the abstraction because if you squint, MCTP was
designed at a high level to look like an IP network, and with this approach we
won't have incoherent per-device interfaces sticking out of the kernel to
support the binding implementations in libmctp.
If the MI-basic data fits into the hwmon abstractions, I expect that might
almost be an easier route.
>
> In re-reading my last email, it made me sound like I'm anti- MI-Basic.
> To be clear, I really have no problem with adding MI-Basic support in
> dbus-sensors. I'd like to see it made an explicit and separate option
> in the per-device configuration, as some devices don't support it, and
> to clear the way for if we ever get the existing (or maybe a new)
> nvme-sensor buildable by more than just my former teams fork.
Okay.
>
> >
> > > If there's other good technical reasons why a user would
> > > prefer MI-basic, and we can get those reasons written down, I'm happy
> > > to hear them, but if the overall reason is "we don't want to upstream
> > > code to the kernel" that doesn't seem like a good enough justification
> > > to build out support for both;
> >
> > Implementing the SMBus MCTP binding around the i2c-slave-mqueue kernel
> > interface is just a stop-gap solution in place of the (continually deferred)
> > in-kernel socket-based MCTP implementation (no-one besides Jeremy has really
> > put significant effort into collaborating with me on that concept). So if we're
> > aiming for stop-gaps, why not implement support for the basic management
> > command given the rest of the code is out-of-tree?
>
> It should be noted, it's our intention to start working on an
> upstreamable mctp socket kernel driver in the near (1-2 quarters)
> future.
Fantastic! This might even align with my (continually revised) timeline. Please
do get in touch when you start to look at the problem.
> That doesn't really change this discussion, but figured it
> was worth pointing out. If we land MI-basic, and have a little bit of
> luck, maybe it doesn't have to live for long.
Yep, I'd be happy to forget about MI-basic once we have appropriate MCTP
support.
>
> >
> > The trade-offs here seem to be the crux of the discussion.
> >
> > > Especially considering the MI basic
> > > functionality probably would best be done as a kernel driver.
> >
> > Right, that's an interesting idea.
>
> This was just an idea; A userspace-only nvme-mi setup would probably
> be fine IMO as well, depending on what aligns with your schedule.
>
> >
> > >
> > > It should also be noted if we end up doing this, I'd probably advocate
> > > for it to be its own separate sensor, distinct from the MCTP one,
> > > because from a BMC perspective, it has very little in common with the
> > > existing NVMe sensor (different protocols, different IO loops, ect),
> >
> > The difference in IO loop does have an impact here - I hadn't completely
> > appreciated how we'd make the synchronous call fit into the current code. The
> > hack approach was to just not worry about it for the moment. However,
> > implementing the basic command backend isn't actually that much of a change;
> > it's a bit of a reorganisation of the NVMeContext class and a small encoder /
> > decoder for the SMBus commands.
>
> If we can fit it cleanly into the application that's there, great; If
> it makes more sense as another app, fine too. I haven't looked at it
> in depth.
>
> >
> > > and would probably be a good candidate for adding the hwmon. Also, it
> > > would need to be distinctly selectable from the entity-manager json,
> > > as some drives don't possess support for NVMe-MI basic, and we need to
> > > keep track of which protocol to use in which case.
> >
> > If both backends were built in, there's never a reason to use the basic
> > management command. If only one of the backends is built in then you don't need
> > to keep track unless the interface presented on DBus was different depending on
> > the backend, but I was hoping we could avoid that.
>
> See above. I'd really prefer if this were a per-device setting, not a
> per-build setting.
Right, my sense in reading what you wrote was that you were pushing back on the
idea on the basis that we'd have to differentiate in the entity-manager config.
But it sounds like you're actually not fussed about that and even consider the
idea in a positive light given what you've written below:
> nvme-mi support is spotty at best, and buggy
> nvme-mi mctp implementations are also common. I'd rather we
> explicitly select one that works well per-device type, rather making
> it a build time option.
Andrew
More information about the openbmc
mailing list