Feedback on Current OpenBMC and Proposing Some Improvements ---- "Improvements" Ideas

Tue Jun 30 11:28:18 AEST 2020

On Mon, Jun 29, 2020 at 3:09 PM Alex Qiu <xqiu at google.com> wrote:
>
> On Mon, Jun 29, 2020 at 7:53 AM Ed Tanous <ed at tanous.net> wrote:
> >
> >
> > On Thu, Jun 25, 2020 at 6:08 PM Alex Qiu <xqiu at google.com> wrote:
> > > Yes, there are some restrictions in my current demo, and I'm afraid
> > > that I may not have the bandwidth to cover it further alone. My point
> > > is that, sometimes hardwares is designed with some unexpected
> > > complexity on topology (EEPROM behind MUX for example).
> > To my understanding this case is already handled.  Assign the mux to the parent FRU config file, and the eeprom behind it will be detected correctly.  With that said, this type of hardware (optional mux with an eeprom behind it) is difficult to identify automatically with no other impact, hence needing to explicitly add it to the parent board.  Can you think of any other examples of unexpected topology that aren't covered?
>
> There's no parent FRU in this case; the MUX belongs to the specific
> FRU, and its EEPROM is behind the MUX.

I called the baseboard a FRU, that was my bad and I suspect you got
confused.  I should've said baseboard "entity".  The FRU you're trying
to detect is plugged into _something_ else.  If it's not detectable by
other means, you need to add the circuity to the parent component.  If
you've implemented entity manager as intended, you would have a
configuration file for your baseboard/motherboard/primary comms board.
That is the one I was suggesting you should put it in.  This is the
exact reason the baseboard is a first class component in EM.
Look at one of the *_baseboard.json as an example.  I believe Wolf
Pass handles this exact case for a PCIe riser (although I'm not sure
about the state of it in EM).

> Unfortunately, maybe usually we
> only realize some hardware design is problematic to the software until
> we see it? :) I haven't started in Google when these boards were
> designed, and I'm not so sure if I could point it out even if I had
> been started in Google.

To reiterate, I think this case is handled, in a similar way to what
you'd have to implement in control flow code, given the fact that the
card itself is undetectable through conventional scanning.  You have
to make some assumption about "If I'm on Platform X, I MIGHT have an
undetectable card behind a mux at address Y.  If I see something
there, assume it's that type, and try to scan behind it.

With that said, I'm still interested to see if there's a way to make
your hardcoded approach viable for the things we need entity manager
to do.  Again, if you have time, start hacking on it and see what you
can integrate.

>
> >
> >
> > > Having the
> > > ability to aid the topology discovery with code, and having the
> > > topology info available to other functionalities can help a lot. JSON
> > > config files are having a hard time bearing these logics, and any
> > > extra logic implemented in JSON config files requires some kind of
> > > script parser in daemons processing them.
> > The majority of the config parsing is also able to be done at compile time, it just isn't implemented today.  With that said, the config file parsing in practice takes up very little CPU time in the last profile I did, so it hasn't been a priority.
>
> I'm not quite concerned about CPU time on the parsing, but more on the
> burden of developing. Because right now I feel like we need to
> implement a parser per daemon for what it's consuming.

I have no clue where you got that idea.  There is, by design, one json
parser, and it lives in entity manager.  Entity manager runs the
detection and posts the relevant interfaces to Dbus.  You do not have
to reimplement it for every single daemon.  Can you point out any
example of a json parser in one of the existing sensor daemons?  I
suspect you're not quite understanding the code you're looking at.

If you're talking about the sensor daemons "parsing" dbus, I agree,
dbus interfaces are relatively complicated and error prone, but at
this point, a non-dbus OpenBMC is probably a massive undertaking
(although I'm sure you'd get a lot of support if you did it).

> Unless we agree
> on a format and implement an OpenBMC library for it. Take the Virtual
> Sensor design doc under review for example:
> https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/32345/ I think it
> will also have its own parser to deal with the "Algo" attribute.
Yes, I agree.  If I'm honest, I think the virtual sensor design goes
against some of the principles that EM was built on, as it moves large
amounts of complexity into config files (in exactly the way you've
noted), essentially ignores the dynamic nature of system topology, is
parsing "code" at runtime and makes debugging issues difficult.  I
think it will be hard to build, and even harder to maintain.  I (nor
any of the EM/dbus-sensors maintainers last time I looked) have
weighed in on it, so it's far from done (update, I just did).  Clearly
I should've left feedback on it earlier, but I, like you, don't have
much time for openbmc these days, so I pick my battles carefully.
> To
> make more fragments, right now entity-manager does the calculation
> without support for parenthesis and does not follow arithmetic order
> of operations, and we are trying to come up with one supporting
> parenthesis without breaking the compatibility.

Again, remember that you're looking at something not on master.  I had
a bunch of comments staged on that review that I just pushed.  I'm
glad to see you left some similar comments to what I posted.
If you're talking about the parser in entity manager, I'm confused.
There aren't any arithmetic operations (besides one hack), nor is it
doing any DSL level parsing at that level.  That would go against a
lot of the intent.

One thing to remember is that so long as you update the relevant
config files, you should feel free to change semantics of how some of
these things work.  There aren't that many config files on master.

>
> >
> >
> > > Based on your replies, the
> > > concept for functionally extensions that I was asking for should be
> > > implemented as daemons either standalone or plugged onto dbus?
> >
> > I'm not understanding the distinction of standalone vs plugged into dbus, but I''ll hazard a guess, and say yes, the dbus interfaces to the rest of the system is (one of) the project's intended extension points.  You can either manipulate them from an existing daemon, or create an all new daemon that has exactly the behavior you want.
> >
> >
> > >
> > > On "reading sensors within the BMC console", I'm actually using a
> > > script to directly read from hwmon right now, because we are having
> > > sensor number limit on IPMI and performance issues with IPMI and dbus.
> > > We are still actively investigating these performance issues now to
> > > unblock the project, but based on the current findings, I think it's
> > > better to have this tool before the protocol layers.
> > Have you considered opening a review with this tool to make it available to others?  I'd recommend opening a review to put it in here:
> > https://github.com/openbmc/openbmc-tools
> > This repo is much less formal, but gives people a place for these "might be useful to others" type scripts.  Write up a commit message with something to the effect of "I wrote this tool, this is how you use it, I find it makes platform development easier because X." and get it checked in.
>
> It had topology information and sensor information that we would like
> baked in as its major part, so unfortunately it's not an upstream-able
> script...
Here is yet another opportunity to make things better, and I feel like
you're squandering it.  I like to complain about the current state as
much as anyone, but if we're not putting up patchsets, it will never
improve, and the next person will just come in with the same
complaints.  If you have tools that you think are better, or provide
the start to a better tool, consider putting them up under your
username so other people can benefit, and see the ideas encapsulated
within it.

>
> >
> >
> > >
> > > On issues like uint8_t, yes, we've noted them down, but they are still
> > > tech debts on our backlog, and dealing with the performance issue
> > > described above remains as our priority right now.
> >
> > It sounds like you're swamped for time, which I can respect.  With that said, If you start by making technical improvements on small things like the above, you're much more likely to have feedback (and help) when you propose more wide sweeping changes, like your python example.
> > If you ever get free time, and want to continue moving your proposal more toward an actionable change we can make, I'm happy to help discuss options.  To be clear, I think if you can resolve some of the technical limitations of your proposal, and put together a patchset that implements it in a language that the project can use on a majority of platforms, I think it could be a better developer experience.  We just can't remove some of the user facing features that are implemented and/or planned already.
>
> Makes sense. We'll see if we could gather enough resources at some
> time to actually make it a concrete product, or we can come up with a
> plan to improve the existing ones bit by bit. It's been a pleasure to
> hear from you on what I haven't realized or taken into account yet,
> because my team was more hardware project focused and had less
> exposure to the general OpenBMC discussions or design philosophies.
> Thank you!
>
You're very welcome.