[PATCH] Add a source_tree field to Project.

Dirk Wallenstein halsmit at t-online.de
Fri Apr 1 22:01:36 EST 2011


On Wed, Mar 30, 2011 at 06:25:12PM -0300, Guilherme Salgado wrote:
> On Wed, 2011-03-30 at 12:10 +0800, Jeremy Kerr wrote:
> > Hi Guilherme,
> > 
> > >  apps/patchwork/models.py                      |    1 +
> > >  lib/sql/migration/008-project-source-tree.sql |    3 +++
> > >  2 files changed, 4 insertions(+), 0 deletions(-)
> > >  create mode 100644 lib/sql/migration/008-project-source-tree.sql
> > 
> > Looks good. I'd like to wait until there is a user of this field before 
> > merging the change though - or are you using this for linaro-internal things?
> 
> I'm using it on a script I'm writing to fetch the git history of every
> project and scan that looking for patches that have been committed. Just
> like the existing patchwork-update-commits script does but this one is
> fully automated, and to make it more easily testable I'm experimenting
> with python-dulwich to scan the git history. 
> 
> I haven't submitted it for feedback yet because it's still very much a
> work in progress, but I would certainly do so. (I'm including what I
> have so far here, though)
> 
> Oh, there's one extra DB schema change I'm considering... That will
> allow us to keep track of the last seen commit ref so that we can scan
> the git history incrementally.  I've added it as another field to
> Project, named last_seen_commit_ref.  How does that sound to you?
> 
> 
> -- 
> Guilherme Salgado <https://launchpad.net/~salgado>

> commit dd10a245c50f651b38e7074d49413e91b2d82e14
> Author: Guilherme Salgado <guilherme.salgado at linaro.org>
> Date:   Fri Mar 25 15:59:28 2011 -0300
> 
>     Adds a script which goes through all registered projects looking for patches that have been committed already
>     
>     It does that by checking out the project's source code from its VCS of choice
>     (currently only git is supported, though), scanning the commits there and
>     comparing them to the patches in Patchwork.
> 
> diff --git a/apps/patchwork/bin/update-committed-patches.py b/apps/patchwork/bin/update-committed-patches.py
> new file mode 100755
> index 0000000..a444783
> --- /dev/null
> +++ b/apps/patchwork/bin/update-committed-patches.py
> @@ -0,0 +1,35 @@
> +#!/usr/bin/python
> +
> +import _pythonpath
> +from patchwork.models import Patch, Project, State
> +from patchwork.utils import (
> +    ensure_source_checkout_for_project, get_hashes_for_commits)
> +
> +
> +#for project in Project.objects.all():
> +for project in Project.objects.filter(linkname='linux-kernel'):
> +    if project.source_tree is None:
> +        continue
> +
> +    repo = ensure_source_checkout_for_project(project)
> +    if repo is None:
> +        print ("Skipping %s as we couldn't get a source checkout" %
> +               project.name)
> +        continue
> +
> +    hashes = get_hashes_for_commits(repo,
> +                                    stop_at=project.last_seen_commit_ref)
> +    for commit_id, patch_hash in hashes:
> +        # There may be multiple patches with the same hash. That's usually
> +        # the case when a second version of a patch series is submitted
> +        # and some of the patches in the series are identical in both
> +        # series.
> +        for patch in Patch.objects.filter(project=project, hash=patch_hash):
> +            patch.state = State.objects.get(name='Accepted')
> +            patch.commit_ref = commit_id
> +            print patch, patch.state
> +    else:
> +        print "No new %s commits to parse" % project.name
> +
> +    project.last_seen_commit_ref = repo.head()
> +    project.save()
> diff --git a/apps/patchwork/tests/__init__.py b/apps/patchwork/tests/__init__.py
> index 68fe563..e79331b 100644
> --- a/apps/patchwork/tests/__init__.py
> +++ b/apps/patchwork/tests/__init__.py
> @@ -23,3 +23,4 @@ from patchwork.tests.bundles import *
>  from patchwork.tests.mboxviews import *
>  from patchwork.tests.updates import *
>  from patchwork.tests.filters import *
> +from patchwork.tests.test_utils import *
> diff --git a/apps/patchwork/tests/test_utils.py b/apps/patchwork/tests/test_utils.py
> new file mode 100644
> index 0000000..0c60e74
> --- /dev/null
> +++ b/apps/patchwork/tests/test_utils.py
> @@ -0,0 +1,74 @@
> +
> +import tempfile
> +from time import time
> +from unittest import TestCase
> +
> +from dulwich.objects import Blob, Commit, parse_timezone, Tree
> +from dulwich.repo import Repo
> +
> +from patchwork.utils import get_hashes_for_commits
> +
> +
> +class TestGetHashesForCommits(TestCase):
> +
> +    def test_one_commit(self):
> +        repo = self.create_git_repo()
> +        commit = self.add_file_and_commit(repo, 'foo', 'Content1')
> +        # Here there are no hashes because get_hashes_for_commits() skips the
> +        # first one as it's unlikely to be of any interest to us.
> +        self.assertEqual(
> +            [], list(get_hashes_for_commits(repo, stop_at=None)))
> +
> +    def test_two_commits(self):
> +        repo = self.create_git_repo()
> +        commit = self.add_file_and_commit(repo, 'foo', 'Content1')
> +        commit2 = self.add_file_and_commit(repo, 'bar', 'Content2', commit)
> +        self.assertEqual(
> +            [(commit2.id, '5c010402c5673981ee3e1712e6a037de3ff9cae4')],
> +            list(get_hashes_for_commits(repo, stop_at=None)))
> +
> +    def test_empty_patch(self):
> +        repo = self.create_git_repo()
> +        commit = self.add_file_and_commit(repo, 'foo', 'Content1')
> +        head = self.add_file_and_commit(repo, 'bar', '', commit)
> +        self.assertEqual(
> +            [], list(get_hashes_for_commits(repo, stop_at=None)))
> +
> +    def test_stop_at(self):
> +        repo = self.create_git_repo()
> +        commit = self.add_file_and_commit(repo, 'foo', 'Content1')
> +        commit2 = self.add_file_and_commit(repo, 'bar', 'Content2', commit)
> +        commit3 = self.add_file_and_commit(repo, 'baz', 'Content3', commit2)
> +        self.assertEqual(
> +            [(commit3.id, '11d22fa0986b3bb341baa76b8a6a757a46a2f916')],
> +            list(get_hashes_for_commits(repo, stop_at=commit2.id)))
> +
> +    def create_git_repo(self):
> +        tmpdir = tempfile.mkdtemp()
> +        repo = Repo.init(tmpdir)
> +        return repo
> +
> +    def add_file_and_commit(self, repo, filename, data, parent=None):
> +        blob = Blob.from_string(data)
> +        parents = []
> +        tree = Tree()
> +        if parent is not None:
> +            tree = repo[parent.tree]
> +            parents = [parent.id]
> +        tree.add(0100644, filename, blob.id)
> +        commit = Commit()
> +        commit.tree = tree.id
> +        author = 'You <you at example.com>'
> +        commit.author = commit.committer = author
> +        commit.commit_time = commit.author_time = int(time())
> +        tz = parse_timezone('-0200')[0]
> +        commit.commit_timezone = commit.author_timezone = tz
> +        commit.encoding = "UTF-8"
> +        commit.message = "A commit"
> +        commit.parents = parents
> +        object_store = repo.object_store
> +        object_store.add_object(blob)
> +        object_store.add_object(tree)
> +        object_store.add_object(commit)
> +        repo.refs['refs/heads/master'] = commit.id
> +        return commit
> diff --git a/apps/patchwork/utils.py b/apps/patchwork/utils.py
> index e41ffb6..353147e 100644
> --- a/apps/patchwork/utils.py
> +++ b/apps/patchwork/utils.py
> @@ -17,8 +17,15 @@
>  # along with Patchwork; if not, write to the Free Software
>  # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>  
> +import os
> +from StringIO import StringIO
>  
> -from patchwork.models import Bundle, Project, BundlePatch
> +from dulwich.client import get_transport_and_path
> +from dulwich.patch import write_tree_diff
> +from dulwich.repo import Repo
> +
> +from patchwork.parser import hash_patch, parse_patch
> +from patchwork.models import Bundle, BundlePatch
>  from django.shortcuts import get_object_or_404
>  
>  def get_patch_ids(d, prefix = 'patch_id'):
> @@ -137,3 +144,64 @@ def set_bundle(user, project, action, data, patches, context):
>      bundle.save()
>  
>      return []
> +
> +
> +def ensure_source_checkout_for_project(project):
> +    forest = '/home/salgado/src' # This is where we store the trees we checkout
> +    root = os.path.join(forest, project.linkname)
> +    if not os.path.exists(root):
> +        repo = Repo.init(root, mkdir=True)
> +    else:
> +        repo = Repo(root)
> +
> +    transport, path = get_transport_and_path(project.source_tree)
> +    refs = transport.fetch(path, repo)
> +    # XXX: Is this the appropriate thing to do?  will there always be a master
> +    # branch?
> +    repo.refs['refs/heads/master'] = refs['HEAD']
> +    return repo
> +
> +
> +def get_hashes_for_commits(repo, stop_at):
> +    # We don't care about the first commit, but if needed it should be
> +    # possible to diff it against an empty tree and yield its hash as well.
> +    commit = repo['HEAD']
> +
> +    while len(commit.parents) > 0:
> +        commit_id = commit.id
> +        if commit_id == stop_at:
> +            break
> +
> +        parent = repo[commit.parents[0]]

What about possible other parents?  I would say every merged branch has
to be inspected until the merge-base of the immediate predecessors --
maybe recursion works here.  I forgot to mention Git-Python in the other
mail.  It has a wrapper to execute git commands with python syntax.
That can facilitate such things.

https://github.com/gitpython-developers/GitPython

> +        diff = StringIO()
> +        # In the case of merges, this won't have the same behavior as 'git
> +        # show', which seems to omit files not changed since any of the
> +        # parents (thanks to 'git tree-diff --cc'), but I think this is not a
> +        # big deal as patches in Patchwork would never be identical to the
> +        # diff of a merge anyway, would they?
> +        try:
> +            write_tree_diff(
> +                diff, repo.object_store, parent.tree, commit.tree)
> +        except KeyError, e:
> +            # XXX: This happens in qemu because there's a commit that is
> +            # actually on a submodule
> +            # (8b06c62ae48b67b320f7420dcd4854c5559e1532) and the old commit_id
> +            # used for the submodule
> +            # (06d0bdd9e2e20377b3180e4986b14c8549b393e4) is gone (possibly
> +            # because of a rebase?), so dulwich crashes with a KeyError.
> +            raise
> +
> +        commit = parent
> +        diff.seek(0)
> +        try:
> +            patch, _ = parse_patch(diff.read().decode('utf-8'))
> +        except UnicodeDecodeError:
> +            # TODO: Need to find out why this happens and see if we really
> +            # need to skip such commits.
> +            print "Skipping %s" % commit.id
> +            continue
> +
> +        # When commits just add files or change permissions the diff will be
> +        # empty and thus parse_patch() will return None.
> +        if patch is not None:
> +            yield commit_id, hash_patch(patch).hexdigest()

-- 
Cheers,
  Dirk


More information about the Patchwork mailing list