Feedback on Current OpenBMC and Proposing Some Improvements ---- "Improvements" Ideas

Wed Jul 1 07:28:37 AEST 2020

On Mon, Jun 29, 2020 at 6:28 PM Ed Tanous <ed at tanous.net> wrote:
>
> On Mon, Jun 29, 2020 at 3:09 PM Alex Qiu <xqiu at google.com> wrote:
> >
> > On Mon, Jun 29, 2020 at 7:53 AM Ed Tanous <ed at tanous.net> wrote:
> > >
> > >
> > > On Thu, Jun 25, 2020 at 6:08 PM Alex Qiu <xqiu at google.com> wrote:
> > > > Yes, there are some restrictions in my current demo, and I'm afraid
> > > > that I may not have the bandwidth to cover it further alone. My point
> > > > is that, sometimes hardwares is designed with some unexpected
> > > > complexity on topology (EEPROM behind MUX for example).
> > > To my understanding this case is already handled.  Assign the mux to the parent FRU config file, and the eeprom behind it will be detected correctly.  With that said, this type of hardware (optional mux with an eeprom behind it) is difficult to identify automatically with no other impact, hence needing to explicitly add it to the parent board.  Can you think of any other examples of unexpected topology that aren't covered?
> >
> > There's no parent FRU in this case; the MUX belongs to the specific
> > FRU, and its EEPROM is behind the MUX.
>
> I called the baseboard a FRU, that was my bad and I suspect you got
> confused.  I should've said baseboard "entity".  The FRU you're trying
> to detect is plugged into _something_ else.  If it's not detectable by
> other means, you need to add the circuity to the parent component.  If
> you've implemented entity manager as intended, you would have a
> configuration file for your baseboard/motherboard/primary comms board.
> That is the one I was suggesting you should put it in.  This is the
> exact reason the baseboard is a first class component in EM.
> Look at one of the *_baseboard.json as an example.  I believe Wolf
> Pass handles this exact case for a PCIe riser (although I'm not sure
> about the state of it in EM).

Ah, I see. So basically it's a workaround to register the MUX that may
be plugged onto the baseboard? On the other hand, I just realized
today that our current workaround to statically assign these possible
MUX in the device tree could make these logical I2C bus numbers fixed,
which is very friendly for engineers to issue raw I2C commands with
i2ctools. Non-BMC engineers would probably have a headache when they
are told how to find the bus number in sysfs for a device instead of
being given a formula to calculate (which is already a headache to
explain).

>
> > Unfortunately, maybe usually we
> > only realize some hardware design is problematic to the software until
> > we see it? :) I haven't started in Google when these boards were
> > designed, and I'm not so sure if I could point it out even if I had
> > been started in Google.
>
> To reiterate, I think this case is handled, in a similar way to what
> you'd have to implement in control flow code, given the fact that the
> card itself is undetectable through conventional scanning.  You have
> to make some assumption about "If I'm on Platform X, I MIGHT have an
> undetectable card behind a mux at address Y.  If I see something
> there, assume it's that type, and try to scan behind it.
>
> With that said, I'm still interested to see if there's a way to make
> your hardcoded approach viable for the things we need entity manager
> to do.  Again, if you have time, start hacking on it and see what you
> can integrate.
>
> >
> > >
> > >
> > > > Having the
> > > > ability to aid the topology discovery with code, and having the
> > > > topology info available to other functionalities can help a lot. JSON
> > > > config files are having a hard time bearing these logics, and any
> > > > extra logic implemented in JSON config files requires some kind of
> > > > script parser in daemons processing them.
> > > The majority of the config parsing is also able to be done at compile time, it just isn't implemented today.  With that said, the config file parsing in practice takes up very little CPU time in the last profile I did, so it hasn't been a priority.
> >
> > I'm not quite concerned about CPU time on the parsing, but more on the
> > burden of developing. Because right now I feel like we need to
> > implement a parser per daemon for what it's consuming.
>
> I have no clue where you got that idea.  There is, by design, one json
> parser, and it lives in entity manager.  Entity manager runs the
> detection and posts the relevant interfaces to Dbus.  You do not have
> to reimplement it for every single daemon.  Can you point out any
> example of a json parser in one of the existing sensor daemons?  I
> suspect you're not quite understanding the code you're looking at.

I think I just got this idea from the virtual sensor design doc, which
is against the design principle of EM in your opinion...

>
> If you're talking about the sensor daemons "parsing" dbus, I agree,
> dbus interfaces are relatively complicated and error prone, but at
> this point, a non-dbus OpenBMC is probably a massive undertaking
> (although I'm sure you'd get a lot of support if you did it).
>
> > Unless we agree
> > on a format and implement an OpenBMC library for it. Take the Virtual
> > Sensor design doc under review for example:
> > https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/32345/ I think it
> > will also have its own parser to deal with the "Algo" attribute.
> Yes, I agree.  If I'm honest, I think the virtual sensor design goes
> against some of the principles that EM was built on, as it moves large
> amounts of complexity into config files (in exactly the way you've
> noted), essentially ignores the dynamic nature of system topology, is
> parsing "code" at runtime and makes debugging issues difficult.  I
> think it will be hard to build, and even harder to maintain.  I (nor
> any of the EM/dbus-sensors maintainers last time I looked) have
> weighed in on it, so it's far from done (update, I just did).  Clearly
> I should've left feedback on it earlier, but I, like you, don't have
> much time for openbmc these days, so I pick my battles carefully.
> > To
> > make more fragments, right now entity-manager does the calculation
> > without support for parenthesis and does not follow arithmetic order
> > of operations, and we are trying to come up with one supporting
> > parenthesis without breaking the compatibility.
>
> Again, remember that you're looking at something not on master.  I had
> a bunch of comments staged on that review that I just pushed.  I'm
> glad to see you left some similar comments to what I posted.
> If you're talking about the parser in entity manager, I'm confused.
> There aren't any arithmetic operations (besides one hack), nor is it
> doing any DSL level parsing at that level.  That would go against a
> lot of the intent.

For the parser, I'm referring to the function templateCharReplace() in
https://github.com/openbmc/entity-manager/blob/master/src/Utils.cpp#L154.
We found it unintuitive that it does not support parenthesis and does
not follow arithmetic order of operations. If we try to improve it to
support parenthesis and arithmetic order of operations, it will break
compatibility if we don't watch it carefully.

>
> One thing to remember is that so long as you update the relevant
> config files, you should feel free to change semantics of how some of
> these things work.  There aren't that many config files on master.
>
> >
> > >
> > >
> > > > Based on your replies, the
> > > > concept for functionally extensions that I was asking for should be
> > > > implemented as daemons either standalone or plugged onto dbus?
> > >
> > > I'm not understanding the distinction of standalone vs plugged into dbus, but I''ll hazard a guess, and say yes, the dbus interfaces to the rest of the system is (one of) the project's intended extension points.  You can either manipulate them from an existing daemon, or create an all new daemon that has exactly the behavior you want.
> > >
> > >
> > > >
> > > > On "reading sensors within the BMC console", I'm actually using a
> > > > script to directly read from hwmon right now, because we are having
> > > > sensor number limit on IPMI and performance issues with IPMI and dbus.
> > > > We are still actively investigating these performance issues now to
> > > > unblock the project, but based on the current findings, I think it's
> > > > better to have this tool before the protocol layers.
> > > Have you considered opening a review with this tool to make it available to others?  I'd recommend opening a review to put it in here:
> > > https://github.com/openbmc/openbmc-tools
> > > This repo is much less formal, but gives people a place for these "might be useful to others" type scripts.  Write up a commit message with something to the effect of "I wrote this tool, this is how you use it, I find it makes platform development easier because X." and get it checked in.
> >
> > It had topology information and sensor information that we would like
> > baked in as its major part, so unfortunately it's not an upstream-able
> > script...
> Here is yet another opportunity to make things better, and I feel like
> you're squandering it.  I like to complain about the current state as
> much as anyone, but if we're not putting up patchsets, it will never
> improve, and the next person will just come in with the same
> complaints.  If you have tools that you think are better, or provide
> the start to a better tool, consider putting them up under your
> username so other people can benefit, and see the ideas encapsulated
> within it.

Sorry about that, but we've been really doing a lot of
platform-specific scripts or hacks, and it's non-trivial or losing a
lot of its core to upstream them.

>
> >
> > >
> > >
> > > >
> > > > On issues like uint8_t, yes, we've noted them down, but they are still
> > > > tech debts on our backlog, and dealing with the performance issue
> > > > described above remains as our priority right now.
> > >
> > > It sounds like you're swamped for time, which I can respect.  With that said, If you start by making technical improvements on small things like the above, you're much more likely to have feedback (and help) when you propose more wide sweeping changes, like your python example.
> > > If you ever get free time, and want to continue moving your proposal more toward an actionable change we can make, I'm happy to help discuss options.  To be clear, I think if you can resolve some of the technical limitations of your proposal, and put together a patchset that implements it in a language that the project can use on a majority of platforms, I think it could be a better developer experience.  We just can't remove some of the user facing features that are implemented and/or planned already.
> >
> > Makes sense. We'll see if we could gather enough resources at some
> > time to actually make it a concrete product, or we can come up with a
> > plan to improve the existing ones bit by bit. It's been a pleasure to
> > hear from you on what I haven't realized or taken into account yet,
> > because my team was more hardware project focused and had less
> > exposure to the general OpenBMC discussions or design philosophies.
> > Thank you!
> >
> You're very welcome.

- Alex Qiu