Proposing changes to the OpenBMC tree (to make upstreaming easier)

Thu Apr 7 06:06:52 AEST 2022

On Mon, Apr 04, 2022 at 11:28:06AM -0700, Ed Tanous wrote:
> The OpenBMC development process as it stands is difficult for people
> new to the project to understand, which severely limits our ability to
> onboard new maintainers, developers, and groups which would otherwise
> contribute major features to upstream, but don't have the technical
> expertise to do so.  

This is, to me, a rather bold and surprising statement.

You're saying there are people out there who can work on an embedded
system like the BMC at layers of the stack where they might be modifying
everything from the kernel up to REST APIs to "contribute major
features" and yet they can't figure out how to manipulate multiple git
repositories?

I get that I've heard "Yocto has a big learning curve" over and over,
but you're not actually proposing moving away from Yocto and I admin
that it is quite likely the "Yocto has a big learning curve" is actually a
conflation of some of these points that you're raising along with Yocto
itself, but still, this is worded rather boldly.

> This initiative, much like others before it[1] is
> attempting to reduce the toil and OpenBMC-specific processes of
> passing changes amongst the community, and move things to being more
> like other projects that have largely solved this problem already.

I'm not sure what is OpenBMC-specific about this process though.

There are some large successful projects that work in a mono-repo and some
which work in a micro-repo model.  Android and OpenStack are both micro-repo
models that use Gerrit, just like us, and have similar "how do you put the
dependencies together" problems, which they have solved: Android with `repo`
and OpenStack with `zuul`.  `repo` and `Yocto recipes` are pretty similar
mechanics to me (and I would like to see something more `zuul`-like in our
CI).

Many open source communities work in a micro-repo model; the challenge
is always about pulling dependencies together.  Newer languages like
Rust and JS make this easier, but again, there isn't much fundamentally
different than Yocto recipes out there.

The mechanics of how we maintain our dependencies might be different,
because we're built on Yocto, but I would argue that this proposal is
itself _more_ OpenBMC-specific and not less.  I say this because we are
deviating from how every other Yocto project I know of works and how
most other "Linux Distribution" projects work (which is what we are).

> - The tree would be easily shareable amongst the various people
> working on OpenBMC, without having to rely on a single-source Gerrit
> instance.  

What do you mean by "rely on a single-source Gerrit instance"?

> Git is designed to be distributed, but if our recipe files
> point at other repositories, it largely defeats a lot of this
> capability.  Today, if you want to share a tree that has a change in
> it, you have to fork the main tree, then fork every single subproject
> you've made modifications to, then update the main tree to point to
> your forks.  This gets very onerous over time, especially for simple
> commits.  Having maintained several different companies forks
> personally, and spoken to many others having problems with the same,
> adding major features are difficult to test and rebase because of
> this.  Moving the code to a single tree makes a lot of the toil of
> tagging and modifying local trees a lot more manageable, as a series
> of well-documented git commands (generally git rebase[2]).  It also
> increases the likelihood that someone pulls down the fork to test it
> if it's highly likely that they can apply it to their own tree in a
> single command.

I'm almost certain that nobody would do a git-rebase of their fork even
if everything were in one tree.

Why do we want to improve outside testing of forks?  I don't understand
why that is an advantage.

Most of what you're describing here is making it easier for people to
live without getting their code upstreamed, which doesn't seem like a
good thing to the project as a whole, and seems to actively work against
the title of "to make upstreaming easier".

If you're maintaining a few fixes in a backports tree, maintain them as
Yocto-supported patch files.  If you're maintaining whole features
elsewhere, I have no interest in making this easier for you (if it is
worse in other ways for the project).

> - There would be a reduction in reviews.  Today, anytime a person
> wants to make a change that would involve any part of the tree,
> there's at least 2 code reviews, one for the commit, and one for the
> recipe bump.  Compared to a single tree, this at least doubles the
> number of reviews we need to process.

The typical bump commit is trivial to review and nobody does it except
Andrew G or me anyhow.

You could argue that there is actually some velocity advantage of decoupling
the feature support in the code repo while figuring out the minor
implications at a recipe level separately.  Having to have every code
repo commit pass on hardware before it can get +1 Verified is going to
greatly increase our CI requirements, plus you're going to get lots more
"I don't understand why this failed" because of a flaky Yocto-wget-fetch
or Romulus QEMU failure or ...

We also would need to implement everything we have in code-repo level CI
in the recipe-level CI.  I've spoken to this later as well, but I think
while decreasing the number of reviews it will also decrease the
velocity of the reviews even more.  I don't think that the bump commits
are what are decreasing our velocity here.

> For changes that want to make
> any change to a few subsystems, as is the case when developing a
> feature, they require 2 X <number of project changes> reviews, all of
> which need to be synchronized.  

I think there is a fundamental problem of how we develop features here.
"All of which need to be synchronized" implies that every new feature we
develop has hard co-reqs between repositories.  If that is the case,
something is broken in our architecture.  We shouldn't patch around it
by manipulating the code layout.

> There is a well documented problem
> where we have no official way to synchronize merging of changes to
> userspace applications within a bump without manual human
> intervention.  This would largely render that problem moot.

This is a solved problem in projects with a similar layout to ours, such
as OpenStack.  I've previously offered to add topic-based testing[1] to our
CI framework and, maybe I misread the attitudes, but it didn't seem like
it was interesting to the project.

I _did_ add some amount of dependency-based testing to the repositories
that are included in our base Docker image already.  If you change one
of those repositories, they get included in the Docker image used for
testing your code as well.  This has identified problems in the past
because of, say, a change in sdbusplus that broke phosphor-logging's
compile.

> - It would allow most developers to not need to understand Yocto at
> all to do their day to day work on existing applications.  No more
> "devtool modify", and related SRCREV bumps.  This will help most of
> the new developers on the project with a lower mental load, which will
> mean people are able to ramp up faster..

I think there is something more fundamental going on here which I'll
speak about later in "## End-to-end features"

> - It would give an opportunity for individuals and companies to "own"
> well-supported public forks (ie Redhat) of the codebase, which would
> increase participation in the project overall.  This already happens
> quite a bit, but in practice, the forks that do it squash history,
> making it nearly impossible to get their changes upstreamed from an
> outside entity.

I don't see how this improves the "forks that squash history" situation.
Rebases are hard to pull off with how most companies handle their
internal processes, so the best case here is merge commits.

> - It would centralize the bug databases.  Today, bugs filed against
> sub projects tend to not get answered.  Having all the bugs in
> openbmc/openbmc would help in the future to avoid duplicating bugs
> across projects.

We could do this without changing any code.  Just turn off the issue
tracking on all our code repos[2].

> - Would increase the likelihood that someone contributes a patch,
> especially a patch written by someone else.  If contributing a patch
> was just a matter of cherry-picking a tree of commits and submitting
> it to gerrit, it's a lot more likely that people would do it.

I'm not following why this can't be done today.  Aren't you just
cherry-picking commits at a different level?  

I think there is likely legal apprehension about trying to take code
from a fork that isn't your own and trying to contribute it upstream.
With the current CLA structure, the SOB in "fork/repo" doesn't have the
same meaning legally as our SOB+CLA, so I cannot comfortably add my own
SOB on code I picked up from "fork/repo".  No change to the repository
layout will fix this.

> - Greatly increases the ease with which stats are collected.
> Questions like: How many patches were submitted last year?  How many
> lines of code changed between commit A and commit B?  Where was this
> regression injected (ie git bisect)?  How much of our codebase is C++?
> How many users of the dbus Sensor.Value interface are there?  Are all
> easily answered in one liner git commands once this change is done.

I'm not saying some of these aren't positives, but most of them can
already be answered today with existing tools.  Either github or grok
search or gerrit queries can answer almost all of these except "how many
lines of code changed between A and B", if you're expecting to count all
the code in subordinate repos, but I don't see that as a particularly
interesting question.

> - New features no longer require single-point-of-contact core
> maintainer processes (ie, creating a repo for changes, setting up
> maintainer groups, ect) and can just be submitted as a series of
> patches to openbmc/openbmc.

If the single-point-of-contact is the issue, let's solve that.  I don't
think it is though.  I think the bulk of the issue with new repos is
disagreement on if something belongs in a new repo.  Submitting as a
series makes _that_ situation worse because it doesn't force the
discussion upfront and instead someone is upset they spent a bunch of
time working on a new daemon that is rejected.

> - Tree-wide changes (c++ standard, yocto updates, formatting, ect) are
> much easier to accomplish in a small number of patches, or a series of
> patches that is easy to pull and test as a unit.

Patches that are also much more difficult to revert if they break one
particular area...

Why would we want these pulled together and tested as a unit anyhow?  If
I update the formatting or the C++ standard used of repo A, that doesn't
affect repo B.  

I've been involved in almost every difficult Yocto subtree update and the only
case I can think where we couldn't apply the changes to the older Yocto version
was the OpenSSL3 changes, which we had to #define check around based on an
OpenSSL version they export.  Even if all the code were in one repo
would we have wanted to cram all that into a single "Yocto update plus
fix all the code" commit?  I suspect not.  Doing the #define was
appropriate no matter how the code was laid out.

> - Inclusive guidelines: To make progress toward an unrelated but
> important goal at the same time, I'm recommending that the
> openbmc/master branch will be left as-is, and the newly-created sha1
> will be pushed to the branch openbmc/openbmc:main, to retain peoples
> links to previous commits on master, and retain the exact project
> history while at the same time moving the project to having more
> inclusive naming, as has been documented previously[3].  At some point
> in the future the master branch could be renamed and deprecated, but
> this is considered out of scope for this specific change.

This is a separate topic and should be tackled separately.  I guess it
is simpler if you're pushing all the code into one repo to only deal
with it there.  If this is something we want to emphasize now, I think
the Yocto bits are in place that we could just do it relatively quickly.
The only painful part would be all the existing commits in Gerrit that
are unmerged and targetting `refs/for/master` but we'd have to tackle
that with this proposed move as well.

> - Each individual sub-project will be given a folder within
> openbmc/openbmc based on their current repository name.  While there
> is an opportunity to reorganize in more specific ways (ie, put all
> ipmi-oem handler repos in a folder) this proposal intentionally
> doesn't, under the proposition that once this change is made, any sort
> of folder rearranging will be much easier to accomplish, and to keep
> the scope limited.

At a minimum I'd like these all put into a subdirectory off the root.
It is bad enough with how many meta-layers we have, but we shouldn't
then add a hundred top-level subdirectories for the code.

> - Yocto recipes will be changed to point to their path equivalent, and
> inherit externalsrc bbclass[4].  This workflow is exactly the workflow
> devtool uses to point to local repositories during a "devtool modify",
> so it's unlikely we will have incremental build-consistency issues
> with this approach, as was a concern in the past.

Are you sure this works?  I thought externalsrc required the code to be
in an absolute directory and not a relative one?

    if externalsrc and not externalsrc.startswith("/"):
        bb.error("EXTERNALSRC must be an absolute path")
    if externalsrcbuild and not externalsrcbuild.startswith("/"):
        bb.error("EXTERNALSRC_BUILD must be an absolute path")

The original facebook/openbmc codebase kept all of the code in the repo
and we simply appended the directories to the SRC_URI.  It is somewhat
of a pain to maintain the SRC_URI lists, so maybe externalsrc is better
in that regard.  We also ran into issues with getting lots of
pseudo-abort issues, as if Yocto didn't really support source-in-tree in
the latest code.  In order to avoid the pseudo-abort issues I had to do
this rather ugly hack in our code[3].  I don't know how we sanity test
your proposal to ensure it doesn't have this issue.

> - Subprojects that are intended to be reused outside of OpenBMC (ex
> sdbusplus) will retain their previous commit, history, and trees, such
> that they are usable outside the project.  This is intended to make
> sure that the code that should be reusable by others remains so.

How do we identify all of these?  (I spoke about this in another part of
the chain, so we don't need to expand on it here.)

> - The above intentionally makes no changes to our subtree update
> process, which would remain the same process as is currently.  The
> openbmc-specific autobump job in Jenkins would be disabled considering
> it's no longer required in this approach.

Wouldn't it still be handy for the above-mentioned repos?

> Let me know what you think.

In general, I'm not a fan of mono-repo style.  Both the mono-repo and
micro-repo style have issues.  I think we need to have an adequate
discussion of what the issues are that would be _introduced_ by moving
to a mono-repo in our case as well.  I'm not currently convinced that
this proposal is optimizing in the way that is most beneficial to the
project.

## Reviews

You've mentioned that this will make reviews easier and I think the
opposite is far more likely to be true.  OWNERS + Gerrit is not
sufficient in a mono-repo (and Github doesn't solve this either).

The biggest complaint, as I've heard it, has been the review cycle
velocity.  I myself have ran into sub-repos where it takes weeks to get
a trivial change in because the maintainers just don't stay on top of
it.  This proposal will make the problem exponentially worse.

Our most proficient and active reviewers tend to be OWNERS higher up in
the "merge chain".  In 2022H2, I think the top 5 reviewers handled more
reviews than everyone else put together.  The only way I can stay somewhat on
top of what needs my attention is by having all my Gerrit notifications
going into a folder and deleting them as I've taken action.  With the
current OWNERS I'm already having to delete almost half of these just by
skimming the title (ex. "meta-not-my-company: ..." commits).  If you
start throwing every single commit in Gerrit at me because I matched in
OWNERS, I'll have absolutely no idea how to know what needs my attention and
what doesn't.  How are you planning on handling this as you are a
top-level OWNER as well?

## Out-of-Yocto builds

The biggest impediment to my development tends to be the repositories
that are not using Meson and haven't properly Supported meson
subprojects, because it makes it almost impossible to make changes to
those without invoking Yocto which is a much slower process.  I've
written about this in the past[5].  If all the code is in a single
repository, I fear there will be almost no effort put into supporting
out-of-Yocto builds because at that point, why support them?  I wrote
about the time involved in that post but you're taking activities that
take seconds with Meson subprojects and turning them into 5 minutes.

If we are doing something that slows down the most active developers, we
need to make sure that the increase in contributions from other
developers is going to more than offset it.  Based on the kind of people
affected by the problems you're describing, I'm really not convinced it
will.

## End-to-End Features

You used the concept of an "end-to-end feature" a few times in this
proposal but talking about different things.  I'm going to specifically
talk about a feature that requires changes across multiple existing
repositories.

A bold statement: nobody here actually implements end-to-end features.

I've never witnessed a single person implement a new feature in the
kernel, add userspace support to interact with that kernel work, add
the Redfish APIs to interact with the userspace support they added,
implement the WebUI to control the Redfish APIs, and then wrote system
integration tests in phosphor-test-automation.  We draw somewhat
arbitrary boundaries already and call something "end-to-end" because it
happens to reside within those boundaries.  For a typical feature, for
many developers, that boundary seems to be "Redfish + some other
userspace app(s)".  Admittedly, this proposal does help (a little) with
_that_ particular arbitrary boundary, but it doesn't help with anything
outside of it.

There is a bigger problem that this is exposing though, in my mind at
least: many developers can't or don't work at the component level.

Let's consider a "simple" change that requires:
    - Adding a new DBus API.
    - Adding support in App-A for said DBus API.
    - Adding Redfish support for said DBus API in bmcweb.

You're suggesting that this is hard to accomplish today, end-to-end, and maybe
it is.  But, the fact that anyone is attempting to solve it end-to-end
means that the resulting product is pretty terrible from a future
development and maintenance stand-point:

   * Repositories aren't using meson subproject wrap files which makes
     developing against a change to phosphor-dbus-interfaces trivially
     easy.

   * Developers are not developing the changes to App-A with unit-tests.

   * Developers are not confirming that their changes to App-A are sound
     at a dbus-level, so the code is tightly coupled with however
     they've changed bmcweb to interact with it and often crumbles when
     another application interacts with it (ex. PLDM).

   * The changes to bmcweb have no unit testing and/or mocking of the DBus APIs.

   * There is not a single integration test added to
     phosphor-test-automation to make sure nobody breaks this feature in
     the future.

Combining all the code together completely throws away the necessity to treat
the software as components, which certainly doesn't improve all of the
above.

Your proposal seems to be optimizing for the "I need to hack at code
across a few repos and throw it all together into a BMC binary image so
I can test drive it" case, but I would suggest this case should rarely even
be done by most of our developers.  The fact that this is even a "regular
development case" to begin with is a problem.

Everyone should be able to add a new DBus API and at least 95% of the support in
App-A without ever touching a BMC.  You then have the other 5% that
maybe needs some confirmation on hardware (you mocked out how you
_think_ the hardware behaves already, right?).  Once that is done you
should be able to develop the whole bmcweb feature without touching
hardware as that is _completely_ software-based (mock out all the dbus
interfaces for your new feature please, and please add a test case).  If
all these pieces are working independently, you can, only then, spend a
little time throwing it all into a single image with something like [6]
and test-driving it, but better yet is if you also add code to
phosphor-test-automation.

If there are pieces of what I just wrote above that are difficult today,
let's fix them, but combining all the code into a mono-repo is, to me,
just a band-aid over these problems.

## Tightly coupled software

I've worked on a large BMC-like product that had a mono-repo.  The
result was a tightly-coupled mess of code that was impossible to work
in, so there ended up being arbitrary "component boundaries" put in
place and nobody worked outside their "component boundary" silo.  We
already have a bit of this "silo" mentality here but at least we don't
have the tightly-coupled aspect of it.  We have an architecture that
allows the pieces to, mostly, move independent from each other.  How
would we ensure that a mono-repo doesn't devolve into tightly-coupled
code because the frictions that stop it are removed?

I think a very likely outcome of this proposal is we end up with more
"utility" libraries that are going to increase the coupling and
library-dependency structure, which is worse for performance.

## Reverting and large reviews

I already see quite often large commits even within a single repository,
which are difficult to review and difficult to revert.  It is currently
on the maintainers to push back on these and request smaller commits.
Having the code in a mono-repo seems to encourage cross-package changes
(and in fact was listed as a selling point here), which means more
likely that a bug introduced by one small piece of the change needs to
have the whole change reverted.

This proposal is likely to increase the average size of a commit since
it is more likely to include cross-package changes.  That means we also
need more people to give feedback on an individual change and as a
reviewer I have to sift through all the pieces that aren't even relevant
to me.  Both of these slow the review process down even more, not increase it.

## "To make upstreaming easier"

You started with the topic of making upstreaming easier and I'm really
not convinced that our repo structure is the major impediment to
upstreaming.  Most of the advantages you talked to seemed to be around
_development_ and not _upstreaming_.  Which is it we are solving?  Do we
really have data that indicates upstreaming is hard because we have
multiple repositories?  I don't think I've ever heard this.  I have
heard that development is hard[er].

---

Some of these issues I've raised could certainly be solved by stronger
worded "guidelines" than we currently have in place (and somehow
ensuring they are followed).  I am worried about the overall code-smell that
will come from a monorepo (based on my past experiences with them).

The biggest concern I have is with the [negative] impact to code reviews and
I don't really have any way to solve them except for completely changing our
code review process at the same time.  Something like the Linux maintainer
owned subtree model maybe where a maintainer owns a fork for their part of
the tree and we bless them at a higher level; it doesn't sound appealing.
Maybe there is some better Gerrit tooling than what we're currently
getting from OWNERS or maybe some other review tool.

[1]: https://lore.kernel.org/openbmc/20191119003509.GA80304@patrickw3-mbp.dhcp.thefacebook.com/
[2]: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/disabling-issues
[3]: https://github.com/facebook/openbmc/commit/e418bfbcf382185fdffb768a012e762a4600ae63#diff-b0fea89439c1004ad5229cb7058b4740ee1c542b32cfb0d2165788f9020b5da0R1
[4]: https://github.com/williamspatrick/openbmc-tof-election-data/blob/995f0d73184db7c25446284261cc023af611e7c4/2021H2/data/report.json#L1
[5]: https://www.stwcx.xyz/blog/2021/04/18/meson-subprojects.html
[6]: https://github.com/williamspatrick/dotfiles/commit/df180ac2b74f2b7fcb6ae91302f0211bc49cb2e9
-- 
Patrick Williams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20220406/fb712f8c/attachment-0001.sig>