[PATCH] Improve pull request URL matching regex

Konstantin Ryabitsev konstantin at linuxfoundation.org
Tue Nov 12 09:27:41 AEDT 2019


Existing regex was missing several important use cases, such as:

- tag/branch info wrapping to the next line, e.g.:

----
are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/
tags/v5.4-next-soc

----
(see example: https://patchwork.kernel.org/patch/11236893/)

- tag/branch info being wrapped to the next line with a backslash, e.g.:

----
are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/matthias.bgg/linux.git/ \
  tags/v5.4-next-soc

----
(no example, but I've seen this before)

The proposed change deals with these edge-cases.

Signed-off-by: Konstantin Ryabitsev <konstantin at linuxfoundation.org>
---
 patchwork/parser.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index c794f09..d25c0df 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -939,11 +939,11 @@ def parse_patch(content):
 def parse_pull_request(content):
     git_re = re.compile(r'^The following changes since commit.*'
                         r'^are available in the git repository at:\n'
-                        r'^\s*([\S]+://[^\n]+)$',
+                        r'^\s*([\w+-]+(?:://|@)[\w/.@:~-]+[\s\\]*[\w/._-]*)\s*$',
                         re.DOTALL | re.MULTILINE | re.IGNORECASE)
     match = git_re.search(content)
     if match:
-        return match.group(1)
+        return re.sub('\s+', ' ', match.group(1)).strip()
     return None
 
 

base-commit: 239fbd2ca1bf140bc61fdee922944624b23c812c
-- 
2.23.0



More information about the Patchwork mailing list