Pain points in Git's patch flow

Mon Apr 19 18:26:18 AEST 2021

On Mon, Apr 19 2021, Sebastian Schuberth wrote:

> On Sun, Apr 18, 2021 at 10:54 PM Ævar Arnfjörð Bjarmason
> <avarab at gmail.com> wrote:
>
>> And thank you for participating in the discussion. I think it's
>> especially valuable to get a viewpoint like yours, i.e. someone who (per
>> this E-Mail below) gave up in frustration with the current development
>> flow.
>
> To be fair, Git's contribution flow isn't the only reason why I chose
> to stop contributing. Another reason is the very lengthy and tedious
> discussions that too often spark from rather small changes.
>
> Also, I wouldn't say I "gave up in frustration". It was a mostly
> unemotional decision on which of the many OSS projects I contribute to
> my rare spare time is spent best.
>
>> I think it's important not to conflate tooling issues with social
>> issues. It's not that we e.g. couldn't whip up a quick script to
>
> I'm not sure I agree completely here, because some tools make it
> easier to overcome social issues than others.

Indeed, to clarify I'm not dismissing that. E.g. is an easier UX or
other "prodding" going to result in better collaboration? Maybe. I'm
just pointing out that it may not mostly/entirely be a tooling issue.

>> Rather it's that it's a volunteer project and people work on what
>> they're interested in.
>
> Exactly. That's why I believe tooling should allow people to subscribe
> to changes in code areas they're interested in, rather than a
> contributor having to know which subsystem maintainer to put in CC
> (e.g. for gitk changes). At least at the time when I contributed it
> was sometimes hard to move things forward if you didn't reach out to
> the right people.
>
>> So maybe having assigned reviewers would help move things along. But I
>> wonder if it wouldn't also lead down the rut of PRs/MRs languishing for
>> months, because the reviewers just want to spend their time in some
>> other way.
>
> Having default assignees for reviews / code owners / however you want
> to call it does not mean that only these people should review, or that
> something cannot be merged without their review. It just makes it more
> clear who's opinion would be the best to get, and who should execute a
> "word of command" if things do not move forward.
>
>> I.e. the design of many of these tools in this regard assumes you have a
>> workforce, not the cat-herding problem of volunteers working on whatever
>> strikes their fancy.
>
> E.g. GitHub makes the distinction between "reviewers" for a PR and
> "assignees" for a PR, and the former can be configured from a
> CODEOWNERS file[...]

CODEOWNERS and default assignment etc. is something that tends to
over-assign things, which is fine for a workforce you're paying, but
it's important to realize that the practice in the git project is to put
the onus on the submitter of the change to manually find who they should
CC.

Because it's not a problem you can really solve automatically without
the a trade-off of feeding potentially uninterested parties a bunch of
patches they're not interested in, which could be another thing that
makes them give up contributing.

So for example I've got a lot of code in git I consider that I "own" in
the sense that I'm responsible for creating it, have been involved in
past design discussions about it etc.

But very little of that cleanly maps to a file as the CODEOWNERS
workflow expects. E.g. there's parts of grep.c that I'd definitely like
to be CC'd on, but others not. Even a "git blame" of the specific lines
you're touching isn't always what you want.

Having used CODEOWNERS in a corporate setting I think it's most useful
for e.g. when you have a monorepo with different subdirectories that do
(at least mostly) map o different teams or peope, think drivers/usb/* or
arch/s390/*.

>> I think characterizing E-Mail as a "legacy" workflow isn't accurate. All
>
> I admit it was a deliberately provocative choice of words, well
> knowing it's not reflecting the current state, to underline how I'm
> feeling about the workflow. E-mail is great. Also plain text e-mail is
> great (I've configured all my client to only send plain text), but
> please, not for sending around code patches.
>
> If you send around code patches by mail instead of directly working on
> Git repos plus some UI, that feels to me like serializing a data class
> instance to JSON, printing the JSON string to paper, taking that sheet
> of paper to another PC with a scanner, using OCR to scan it into a
> JSON string, and then deserialize it again to a new data class
> instance, when you could have just a REST API to push the data from on
> PC to the other.

That's not inherent with the E-Mail workflow, e.g. Linus on the LKML
also pulls from remotes.

It does ensure that e.g. if someone submits patches and then deletes
their GitHub account the patches are still on the ML.

>> of these proposed alternatives involve moving away from something that's
>> a distributed system today (E-Mail infrastructure, local clients), to
>> what's essentially some website run by a centralized entity, in some
>> cases proprietary.
>
> That's a good point, I admit I haven't thought of that. Probably
> because I also don't care much. So *does* it really matter? What
> exactly concerns you about a "centralized entity"? Is it the technical
> aspect of a single point of failure, or the political / social aspect
> of being dependent on someone you do not want to get influenced by? I
> guess it's a bit of both.

To begin with if we'd have used the bugtracker solution from the
beginning we'd probably be talking about moving away from Bugzilla
now. I.e. using those things means your data becomes entangled with the
their opinionated data models.

> While these concerns could probably be addressed somewhat e.g. by
> multiple independently operated Gerrit servers that are kept in sync,
> I was curious and quickly search for more fitting "truly
> decentralized" solutions, and came across radicle [1]. Just FYI.

That's interesting, but I haven't looked into that tool. Browsing their
documentation earlier many of the links were 404s.

>> So really basic things that are comparatively trivial with E-Mail
>> (e.g. "I think the search sucks, try another client") run up against a
>> brick wall with those tools.
>
> Not necessarily. As many of these tools have (REST) APIs, also
> different API clients exist that you could try.

API use that usually (always?) requires an account/EULA with some entity
holding the data, and as a practical concern getting all the data is
usually some huge number of API requests.

>> And to e.g. as one good example to use (as is the common convention on
>> this list) git-range-diff to display a diff to the "last rebased
>> revision" would mean some long feature cycle in those tools, if they're
>> even interested in implementing such a thing at all.
>
> AFAIK Gerrit can already do that.

Sure, FWIW the point was that you needed Gerrit to implement that, and
to suggest what if they weren't interested. Would you need to maintain a
forked Gerrit?

Not to say that's a dealbreaker, just trying to bridge the understanding
of why some people prefer the E-Mail workflow.

Anyway, as before don't take any of the above as arguing, FWIW I
wouldn't mind using one of these websites overall if it helped
development velocity in the project.

Ultimately those things are up to Junio though, which these discussions
always come down to.

I just wanted to help bridge the gap between the distributed E-Mail v.s
centralized website flow.