Feedback on Current OpenBMC and Proposing Some Improvements ---- "Improvements" Ideas

Alex Qiu xqiu at google.com
Thu Jul 2 03:06:33 AEST 2020


On Tue, Jun 30, 2020 at 7:00 PM Ed Tanous <ed at tanous.net> wrote:
>
> On Tue, Jun 30, 2020 at 2:28 PM Alex Qiu <xqiu at google.com> wrote:
> >
> > On Mon, Jun 29, 2020 at 6:28 PM Ed Tanous <ed at tanous.net> wrote:
> > >
> > > On Mon, Jun 29, 2020 at 3:09 PM Alex Qiu <xqiu at google.com> wrote:
> > > >
> > > > On Mon, Jun 29, 2020 at 7:53 AM Ed Tanous <ed at tanous.net> wrote:
> > > > >
> > > > >
> > > > > On Thu, Jun 25, 2020 at 6:08 PM Alex Qiu <xqiu at google.com> wrote:
> > > > > > Yes, there are some restrictions in my current demo, and I'm afraid
> > > > > > that I may not have the bandwidth to cover it further alone. My point
> > > > > > is that, sometimes hardwares is designed with some unexpected
> > > > > > complexity on topology (EEPROM behind MUX for example).
> > > > > To my understanding this case is already handled.  Assign the mux to the parent FRU config file, and the eeprom behind it will be detected correctly.  With that said, this type of hardware (optional mux with an eeprom behind it) is difficult to identify automatically with no other impact, hence needing to explicitly add it to the parent board.  Can you think of any other examples of unexpected topology that aren't covered?
> > > >
> > > > There's no parent FRU in this case; the MUX belongs to the specific
> > > > FRU, and its EEPROM is behind the MUX.
> > >
> > > I called the baseboard a FRU, that was my bad and I suspect you got
> > > confused.  I should've said baseboard "entity".  The FRU you're trying
> > > to detect is plugged into _something_ else.  If it's not detectable by
> > > other means, you need to add the circuity to the parent component.  If
> > > you've implemented entity manager as intended, you would have a
> > > configuration file for your baseboard/motherboard/primary comms board.
> > > That is the one I was suggesting you should put it in.  This is the
> > > exact reason the baseboard is a first class component in EM.
> > > Look at one of the *_baseboard.json as an example.  I believe Wolf
> > > Pass handles this exact case for a PCIe riser (although I'm not sure
> > > about the state of it in EM).
> >
> > Ah, I see. So basically it's a workaround to register the MUX that may
> > be plugged onto the baseboard?
>
> Correct.
>
> > On the other hand, I just realized
> > today that our current workaround to statically assign these possible
> > MUX in the device tree could make these logical I2C bus numbers fixed,
> > which is very friendly for engineers to issue raw I2C commands with
> > i2ctools.
> For a given configuration, entity manager will give consistent bus
> numbers as well, and also provides helpful symlinks in the filesystem,
> for example, /dev/PCIE_SLOT1 points to the bus of the first PCIe slot,
> be it a root bus plugged directly into the bmc, or 3 levels of mux
> connected through several boards.  I believe the i2c tools can also
> use the symlink to interact directly with that in a named way that's
> friendly.
> > Non-BMC engineers would probably have a headache when they
> > are told how to find the bus number in sysfs for a device instead of
> > being given a formula to calculate (which is already a headache to
> > explain).
> I'm not following that statement.  "find the bus number" would occur
> whether or not you have the busses hardcoded.  Are you advocating for
> not using hwmon sensors here?  Needing to do a calculation for the new
> part you're adding would need to be done regardless.  If you turn it
> into a hwmon sensor, you could have the kernel do the math for you,
> and keep your debugability.

Hardware engineers want to set the output voltage for voltage
regulators for debugging, which is not covered by hwmon interface or
drivers, so we need to send raw I2C commands. When a system is not
fully populated, I believe the kernel always assigns the largest
sequential numbers to newly created MUX channels, so that number will
vary based on the debug system configurations. On the other hand, our
current workaround to populate the MUXes in the device tree while they
may not exist fixes the bus numbers which can be calculated from a
formula, instead of tracing symlinks.

>
> >
> > >
> > > If you're talking about the sensor daemons "parsing" dbus, I agree,
> > > dbus interfaces are relatively complicated and error prone, but at
> > > this point, a non-dbus OpenBMC is probably a massive undertaking
> > > (although I'm sure you'd get a lot of support if you did it).
> > >
> > > > Unless we agree
> > > > on a format and implement an OpenBMC library for it. Take the Virtual
> > > > Sensor design doc under review for example:
> > > > https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/32345/ I think it
> > > > will also have its own parser to deal with the "Algo" attribute.
> > > Yes, I agree.  If I'm honest, I think the virtual sensor design goes
> > > against some of the principles that EM was built on, as it moves large
> > > amounts of complexity into config files (in exactly the way you've
> > > noted), essentially ignores the dynamic nature of system topology, is
> > > parsing "code" at runtime and makes debugging issues difficult.  I
> > > think it will be hard to build, and even harder to maintain.  I (nor
> > > any of the EM/dbus-sensors maintainers last time I looked) have
> > > weighed in on it, so it's far from done (update, I just did).  Clearly
> > > I should've left feedback on it earlier, but I, like you, don't have
> > > much time for openbmc these days, so I pick my battles carefully.
> > > > To
> > > > make more fragments, right now entity-manager does the calculation
> > > > without support for parenthesis and does not follow arithmetic order
> > > > of operations, and we are trying to come up with one supporting
> > > > parenthesis without breaking the compatibility.
> > >
> > > Again, remember that you're looking at something not on master.  I had
> > > a bunch of comments staged on that review that I just pushed.  I'm
> > > glad to see you left some similar comments to what I posted.
> > > If you're talking about the parser in entity manager, I'm confused.
> > > There aren't any arithmetic operations (besides one hack), nor is it
> > > doing any DSL level parsing at that level.  That would go against a
> > > lot of the intent.
> >
> > For the parser, I'm referring to the function templateCharReplace() in
> > https://github.com/openbmc/entity-manager/blob/master/src/Utils.cpp#L154.
> > We found it unintuitive that it does not support parenthesis and does
> > not follow arithmetic order of operations. If we try to improve it to
> > support parenthesis and arithmetic order of operations, it will break
> > compatibility if we don't watch it carefully.
>
> Yes, it's not a real parser, but if you look at the commit for the
> problem it was fixing (massively duplicated config files for power
> supplies because of minor changes) then it starts to make more sense
> that what's there is better than what came before.  If it's important
> to you, then put together a patch to add a real parser?  Remember that
> the relevant config files are checked into that repo, so you can
> actually dump every single config statement and flush it through your
> parser to test that it gives the same result, and in the cases it
> doesn't, add parenthesis where required to get the same result.  I
> would really only expect the quad mode power supply files to even be
> effected, and I believe (based on how their expression is parsed) they
> won't be.

For the concern of compatibility, we worry that other companies are
also using these features downstream. FYI, we are heavily relying on
it right now, even though we find out it's not following arithmetic
order of operations.

- Alex Qiu


More information about the openbmc mailing list