dbus-sensors:hwmontemp: additional attribute proposal

Tue Aug 11 10:08:53 AEST 2020

Hi Ed and James,

Is it acceptable for a device to be listed in both HwmonTempSensor and
PSUSensor?

- Alex Qiu

On Fri, Aug 7, 2020 at 2:53 PM Ed Tanous <ed at tanous.net> wrote:

> On Fri, Aug 7, 2020 at 2:05 PM Jason Ling <jasonling at google.com> wrote:
> >>
> >>  don't like the merging of base lists with extended lists, as it adds
> >> a dependency between how we represent that, and implies that we have a
> >> published plugin interface, which we definitely don't, nor do we want
> >> to maintain it at the entity manager level.  It also means that
> >> upstream never tests the "extended" list, which means it's a lot more
> >> likely to break.
> >
> > My stance on devices added to the extended list is the same as devices
> added by downstream patches. Upstream maintainers aren't responsible for
> testing those, if you're patching in devices then you take the
> responsibility of testing those.
>
> Sure, but upstream does bear the responsibility for testing that this
> "extended types" system works?
>
> > My objective is to make it so devices that we don't want to upstream yet
> can be maintained more easily.
>
> Ever?  To be frank for a moment, you're going to have a hard time
> pushing that one forward in an open source project.  If there are a
> lot of users that are planning on maintaining private forks forever,
> maybe there's a case for this, but I think most uses of private forks
> tends to be temporary, even if it's longer term, and the long term
> intention is to upstream.
>
> > First idea was a json file that extends types/lists but there are
> concerns with runtime parsing for devices for the purposes of exposing
> telemetry (I'd think runtime parsing of things like PID configurations
> would be more worrisome).
> > So second idea was just to move data structures around in
> dbus-sensors/PSUSensor to make upstream changes less likely to result in
> merge conflicts for those who are maintaining downstream patches.
> Furthermore just split the data structure up into a portion that is
> upstream and downstream.
> > Yes, it's definitely making it friendlier for those who want to maintain
> downstream patches to extend devices. I don't see how this increases
> maintenance or testing burden for the maintainers though.
>
> It increases burden because any time the maintainer wants to change
> the name of that file, change the name of the structure in that file,
> add a field, rename fields, or change compiler flags, that file will
> break in your downstream, and the maintainers will have no idea.  It
> definitely increases the maintenance burden, and all of the things
> I've mentioned and more have happened over the short life of entity
> manager.  I'm sure they will continue to happen as they evolve (or
> until EM is replaced by something better).
>
> > If however , the intent is to explicitly send the message
> > "You shouldn't try to use this service for any devices that do not have
> explicit upstream support. Any patches that make it easier to do so will be
> rejected."
> > then I agree with your earlier suggestion that maybe the best approach
> is to create another service for those devices.
>
> I'm not the maintainer of this project anymore, so my opinion is just
> that, and I have no ability to reject patches :)  I would rephrase my
> position as: Modifying the source code directly is not an adequate
> long term API for making permanent, never to be upstreamed changes.
> The closest guarantee to that kind of API that is dbus, the second
> closest is an Entity manager config, each of which have their own pros
> and cons.  If we as a project can do anything to make the transition
> between downstream code to upstream code easier, like trying to make
> merge conflicts less likely on commonly modified files, without the
> expense of maintainability or complexity, I'm absolutely for it, but
> creating explicit data structures and hard guarantees about downstream
> code on a binary boundary needs a much larger discussion, and speaks
> to some of the project's main principles about the "Open" part of
> OpenBMC.
>
> >
> >> I originally wrote a big long idea about how to make the above work,
> >> but got to the end, and realized that I'm still trying to understand
> >> what we're trying to avoid here with the downstream/upstream lists
> >> thing.  It's easy enough to patch the existing list to add your new
> >> custom types, then when it comes time to merge because the
> >> project/component is public, the patch is ready and good to upstream.
> >> What are we buying by moving that info to a non-patch format?
> >
> > You get the benefit of separating devices into two classes..
> > (1) types that are upstream , have been tested by someone else and they
> are ready to go without additional work.
> > (2) types that are not upstream, because the devices are not public yet
> or may never be public and need to be kept downstream for a long period of
> time (or forever).
>
> Don't you already have that distinction today?  Devices that are in
> your downstream patch fall into category 2, devices that are on
> openbmc master fall into category 1.  Maybe I'm missing something?
>
> >>
> >> I think
> >> moving it to a file means it's a lot less likely to be upstreamed, as
> >> it requires the next person to understand it to use it, and modify the
> >> patch rather than simply cleaning up the commit message, testing it,
> >> and firing it at gerrit.
> >
> > Yes, the file would be for those things that are never meant to be
> upstreamed or won't be upstreamed for a long while.
>
> See above.
>
> >>
> >> Having done this pattern many times
> >> (including with that list specifically) I think the only thing we
> >> could improve would be to move that list to its own file (but still
> >> C++ code), so it gets fewer merge conflicts, and you can completely
> >> replace it if you like, but even that doesn't solve the problem if
> >> code is never upstreamed.
> >
> > Yup, the problem here are the following
> > "I have patches I keep downstream and they keep getting broken whenever
> the types/device list gets updated. I wish these data structures were in a
> file that doesn't get patched often"
>
> Moving that structure to its own file sounds totally reasonable,
> although (if I were the maintainer) it would have no guarantees
> granted.  Said file may change name, structure, naming, or compiler
> magic that will cause downstream to break.  Funnily enough, in the act
> of implementing that, you will ironically break a lot of peoples
> downstreams, and I'm personally ok with that.  You (and your team)
> needs to be ok when people break your downstream for similar reasons.
>
> On a personal note, if you haven't already, I highly recommend
> spending some learning time on getting a good setup and mental model
> for merging conflicting patches.  As a useful skill, it comes up
> ridiculously often in software, especially if you're a maintainer.  If
> you have the ability to resolve conflicts quickly and correctly it
> puts you at a significant time advantage to your peers that don't.
>
> > and
> > "I have patches to add devices to the type/list data structure and I
> can't upstream them for a long time (or ever). I don't want to waste time
> constantly fixing my broken patches everytime someone adds a new public
> supported type."
> > Both approaches (parsing json and extending the list runtime and
> separating the data structs into a separate file + returning the union of
> base + extended) accomplish the same thing. One requires a recipe to copy a
> ExtendPSUSensorConfig.json in a recipe somewhere to usr/share/PSUSensors
> (or something) and the other is just a patch that gets applied. Talking it
> through, I now realize that the slight code refactoring approach is a lot
> less work and a lot more simple..and something that I'd actually have time
> to contribute.
>
> EXCELLENT!  Add me to the patch, and I'd be happy to review it for you
> (With that said, James is pretty fast and sometimes beats me to the
> punch).
>
> >
> >
> >> Have you filed a bug, or asked on the mailing list before now?  This
> >> is the kind of feedback the authors of that sensor need (Ideally
> >> before you move over to another subsystem like hwmontemp).
> >
> > I never really considered hwmontemp a different sub-system since they
> live in the same repo and seemed to be more specific towards monitoring
> temperature telemetry.
>
> Ignore the above comment about subsystem.  I thought you were talking
> about phosphor-hwmon.  My bad.
>
> > As far as bugs go, I CC'ed Alex Qiu who has more experience with the
> performance of PSUSensors. I haven't personally ran into this problem; just
> know about it from talk amongst the larger team.
>
> Excellent.  Looking forward to details.
>
> >>
> >> If I
> >> didn't see your message/bug and you did post it, I apologize, I'm not
> >> trying to call you out.
> >> If you have specifics, like the value of N, and the details around
> >> what chips you're interacting with and whatever debugging you've done,
> >> it would be helpful to put that in a bug for triage.
> >
> > Alex, maybe you can add some color here?
> >>
> >> Keep in mind, PSUSensor by default has a 1 second scan rate.
> >>
> https://github.com/openbmc/dbus-sensors/blob/41061e2c3198c0f597d4f6bb702b690a273ab45d/include/PSUSensor.hpp#L38
> >> If it's not obeying that, clearly there's a bug to fix somewhere.
> >> On some platforms, I have seen very high rate polling of power values
> >> on the PSU I2c bus by other devices, and that tends to hold up
> >> transactions for other components.  If that bus is misbehaving or
> >> overloaded on your platform, it might have triggered a weird condition
> >> within the PSU sensor (like the scans taking longer than the scan
> >> rate).
> >> If you have any more details here, it's quite possible someone will
> >> have an idea where to look, or know exactly where the problem is.
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20200810/aced47df/attachment.htm>