dbus-sensors:hwmontemp: additional attribute proposal

Sat Aug 8 07:53:42 AEST 2020

On Fri, Aug 7, 2020 at 2:05 PM Jason Ling <jasonling at google.com> wrote:
>>
>>  don't like the merging of base lists with extended lists, as it adds
>> a dependency between how we represent that, and implies that we have a
>> published plugin interface, which we definitely don't, nor do we want
>> to maintain it at the entity manager level.  It also means that
>> upstream never tests the "extended" list, which means it's a lot more
>> likely to break.
>
> My stance on devices added to the extended list is the same as devices added by downstream patches. Upstream maintainers aren't responsible for testing those, if you're patching in devices then you take the responsibility of testing those.

Sure, but upstream does bear the responsibility for testing that this
"extended types" system works?

> My objective is to make it so devices that we don't want to upstream yet can be maintained more easily.

Ever?  To be frank for a moment, you're going to have a hard time
pushing that one forward in an open source project.  If there are a
lot of users that are planning on maintaining private forks forever,
maybe there's a case for this, but I think most uses of private forks
tends to be temporary, even if it's longer term, and the long term
intention is to upstream.

> First idea was a json file that extends types/lists but there are concerns with runtime parsing for devices for the purposes of exposing telemetry (I'd think runtime parsing of things like PID configurations would be more worrisome).
> So second idea was just to move data structures around in dbus-sensors/PSUSensor to make upstream changes less likely to result in merge conflicts for those who are maintaining downstream patches. Furthermore just split the data structure up into a portion that is upstream and downstream.
> Yes, it's definitely making it friendlier for those who want to maintain downstream patches to extend devices. I don't see how this increases maintenance or testing burden for the maintainers though.

It increases burden because any time the maintainer wants to change
the name of that file, change the name of the structure in that file,
add a field, rename fields, or change compiler flags, that file will
break in your downstream, and the maintainers will have no idea.  It
definitely increases the maintenance burden, and all of the things
I've mentioned and more have happened over the short life of entity
manager.  I'm sure they will continue to happen as they evolve (or
until EM is replaced by something better).

> If however , the intent is to explicitly send the message
> "You shouldn't try to use this service for any devices that do not have explicit upstream support. Any patches that make it easier to do so will be rejected."
> then I agree with your earlier suggestion that maybe the best approach is to create another service for those devices.

I'm not the maintainer of this project anymore, so my opinion is just
that, and I have no ability to reject patches :)  I would rephrase my
position as: Modifying the source code directly is not an adequate
long term API for making permanent, never to be upstreamed changes.
The closest guarantee to that kind of API that is dbus, the second
closest is an Entity manager config, each of which have their own pros
and cons.  If we as a project can do anything to make the transition
between downstream code to upstream code easier, like trying to make
merge conflicts less likely on commonly modified files, without the
expense of maintainability or complexity, I'm absolutely for it, but
creating explicit data structures and hard guarantees about downstream
code on a binary boundary needs a much larger discussion, and speaks
to some of the project's main principles about the "Open" part of
OpenBMC.

>
>> I originally wrote a big long idea about how to make the above work,
>> but got to the end, and realized that I'm still trying to understand
>> what we're trying to avoid here with the downstream/upstream lists
>> thing.  It's easy enough to patch the existing list to add your new
>> custom types, then when it comes time to merge because the
>> project/component is public, the patch is ready and good to upstream.
>> What are we buying by moving that info to a non-patch format?
>
> You get the benefit of separating devices into two classes..
> (1) types that are upstream , have been tested by someone else and they are ready to go without additional work.
> (2) types that are not upstream, because the devices are not public yet or may never be public and need to be kept downstream for a long period of time (or forever).

Don't you already have that distinction today?  Devices that are in
your downstream patch fall into category 2, devices that are on
openbmc master fall into category 1.  Maybe I'm missing something?

>>
>> I think
>> moving it to a file means it's a lot less likely to be upstreamed, as
>> it requires the next person to understand it to use it, and modify the
>> patch rather than simply cleaning up the commit message, testing it,
>> and firing it at gerrit.
>
> Yes, the file would be for those things that are never meant to be upstreamed or won't be upstreamed for a long while.

See above.

>>
>> Having done this pattern many times
>> (including with that list specifically) I think the only thing we
>> could improve would be to move that list to its own file (but still
>> C++ code), so it gets fewer merge conflicts, and you can completely
>> replace it if you like, but even that doesn't solve the problem if
>> code is never upstreamed.
>
> Yup, the problem here are the following
> "I have patches I keep downstream and they keep getting broken whenever the types/device list gets updated. I wish these data structures were in a file that doesn't get patched often"

Moving that structure to its own file sounds totally reasonable,
although (if I were the maintainer) it would have no guarantees
granted.  Said file may change name, structure, naming, or compiler
magic that will cause downstream to break.  Funnily enough, in the act
of implementing that, you will ironically break a lot of peoples
downstreams, and I'm personally ok with that.  You (and your team)
needs to be ok when people break your downstream for similar reasons.

On a personal note, if you haven't already, I highly recommend
spending some learning time on getting a good setup and mental model
for merging conflicting patches.  As a useful skill, it comes up
ridiculously often in software, especially if you're a maintainer.  If
you have the ability to resolve conflicts quickly and correctly it
puts you at a significant time advantage to your peers that don't.

> and
> "I have patches to add devices to the type/list data structure and I can't upstream them for a long time (or ever). I don't want to waste time constantly fixing my broken patches everytime someone adds a new public supported type."
> Both approaches (parsing json and extending the list runtime and separating the data structs into a separate file + returning the union of base + extended) accomplish the same thing. One requires a recipe to copy a ExtendPSUSensorConfig.json in a recipe somewhere to usr/share/PSUSensors (or something) and the other is just a patch that gets applied. Talking it through, I now realize that the slight code refactoring approach is a lot less work and a lot more simple..and something that I'd actually have time to contribute.

EXCELLENT!  Add me to the patch, and I'd be happy to review it for you
(With that said, James is pretty fast and sometimes beats me to the
punch).

>
>
>> Have you filed a bug, or asked on the mailing list before now?  This
>> is the kind of feedback the authors of that sensor need (Ideally
>> before you move over to another subsystem like hwmontemp).
>
> I never really considered hwmontemp a different sub-system since they live in the same repo and seemed to be more specific towards monitoring temperature telemetry.

Ignore the above comment about subsystem.  I thought you were talking
about phosphor-hwmon.  My bad.

> As far as bugs go, I CC'ed Alex Qiu who has more experience with the performance of PSUSensors. I haven't personally ran into this problem; just know about it from talk amongst the larger team.

Excellent.  Looking forward to details.

>>
>> If I
>> didn't see your message/bug and you did post it, I apologize, I'm not
>> trying to call you out.
>> If you have specifics, like the value of N, and the details around
>> what chips you're interacting with and whatever debugging you've done,
>> it would be helpful to put that in a bug for triage.
>
> Alex, maybe you can add some color here?
>>
>> Keep in mind, PSUSensor by default has a 1 second scan rate.
>> https://github.com/openbmc/dbus-sensors/blob/41061e2c3198c0f597d4f6bb702b690a273ab45d/include/PSUSensor.hpp#L38
>> If it's not obeying that, clearly there's a bug to fix somewhere.
>> On some platforms, I have seen very high rate polling of power values
>> on the PSU I2c bus by other devices, and that tends to hold up
>> transactions for other components.  If that bus is misbehaving or
>> overloaded on your platform, it might have triggered a weird condition
>> within the PSU sensor (like the scans taking longer than the scan
>> rate).
>> If you have any more details here, it's quite possible someone will
>> have an idea where to look, or know exactly where the problem is.
>
>