qualcomm · gidonstudinski-qc · Apr 13, 2026
diff --git a/REWRITE-README.md b/REWRITE-README.md
@@ -0,0 +1,140 @@
+# Git History Rewrite for Open-Sourcing
+
+This directory contains scripts to rewrite git history for open-sourcing the h2 repository.
+
+## What It Does
+
+The rewrite process performs the following transformations on **all commits** in the repository history:
+
+1. **Adds BSD-3-Clause-Clear copyright headers** to all source files (.c, .h, .S, .py, .sh, .pl, etc.)
+2. **Normalizes email addresses** to @qti.qualcomm.com
+3. **Removes internal references** from commit messages (github.qualcomm.com, Q6Auto, JIRA)
+4. **Adds Signed-off-by lines** to all commit messages
+5. **Sets committer = author** for all commits
+
+## Files
+
+- **rewrite-history.sh** - Master script that orchestrates the entire rewrite
+- **add-copyright-file-callback.py** - Adds copyright headers to source files
+- **email-fixes.py** - Normalizes email addresses
+- **commit-callback.py** - Fixes author/committer names and adds Signed-off-by
+- **sanitize-commit-messages.py** - Removes internal references from commit messages
+- **git-filter-repo** - The git-filter-repo tool
+
+## Usage
+
+### For a Single Branch
+
+```bash
+# 1. Clone the repository (or checkout the branch you want to rewrite)
+git clone <repo-url> h2-rewrite
+cd h2-rewrite
+
+# 2. Copy all the rewrite scripts to the repository root
+cp /path/to/scripts/* .
+
+# 3. Run the rewrite script
+./rewrite-history.sh
+
+# Or skip confirmation prompt:
+./rewrite-history.sh --force
+```
+
+### For Multiple Branches
+
+To rewrite multiple branches, you need to run the script on each branch separately:
+
+```bash
+# Method 1: Rewrite each branch in a separate clone
+for branch in work develop feature-x; do
+    echo "Processing branch: $branch"
+    git clone <repo-url> h2-$branch
+    cd h2-$branch
+    git checkout $branch
+    cp /path/to/scripts/* .
+    ./rewrite-history.sh --force
+    cd ..
+done
+
+# Method 2: Rewrite all branches in one go (advanced)
+# This rewrites ALL branches at once since git-filter-repo processes all refs
+git clone <repo-url> h2-all-branches
+cd h2-all-branches
+cp /path/to/scripts/* .
+./rewrite-history.sh --force
+# All branches will be rewritten
+```
+
+## Important Notes
+
+### Before Running
+
+1. **Make a backup!** This operation rewrites git history and cannot be easily undone
+2. **Use a fresh clone** - Don't run this on your working repository
+3. **Ensure all required files are present** - The script will check for this
+
+### After Running
+
+1. The `origin` remote will be removed (this is normal for git-filter-repo)
+2. You'll need to add a new remote and force-push:
+   ```bash
+   git remote add new-origin <new-repo-url>
+   git push new-origin --all --force
+   git push new-origin --tags --force
+   ```
+
+### Expected Results
+
+- **Commit count**: May be slightly less than original (4-5 commits typically lost due to phantom references)
+- **Copyright headers**: Present in all source files throughout entire history
+- **Internal references**: Completely removed from commit messages
+- **Email addresses**: All normalized to @qti.qualcomm.com
+
+## Validation
+
+The script automatically validates the rewrite and reports:
+- ✓ Copyright headers present
+- ✓ No internal references found
+- ✓ Number of unique committers
+
+You can also manually check:
+
+```bash
+# Check copyright in a file
+git show HEAD:path/to/file.c | head -10
+
+# Check for internal references
+git log --all --format='%s' | grep -i 'github.qualcomm.com'
+
+# List all committers
+git log --all --format='%cn <%ce>' | sort -u
+```
+
+## Troubleshooting
+
+### "Not in a git repository"
+Make sure you're in the root of a git repository.
+
+### "Required file not found"
+Ensure all script files are in the current directory.
+
+### "origin remote removed"
+This is expected. Add a new remote to push to the new repository.
+
+### Commit count decreased
+This is normal. A few commits (typically 4-5) are filtered out because they are phantom references to non-existent commits in merge messages.
+
+## Technical Details
+
+The rewrite uses `git-filter-repo` with multiple callbacks:
+
+1. **file-info-callback**: Modifies file contents to add copyright headers
+2. **email-callback**: Normalizes email addresses
+3. **commit-callback**: Fixes names and adds Signed-off-by
+4. **message-callback**: Sanitizes commit messages
+
+Each callback is applied to every commit in the repository history, ensuring consistent transformations throughout.
+
+## Copyright
+
+All scripts include the BSD-3-Clause-Clear copyright header that will be added to source files.
diff --git a/add-copyright-blob-callback.py b/add-copyright-blob-callback.py
@@ -0,0 +1,77 @@
+#!/usr/bin/env python3
+"""
+Blob callback for git-filter-repo to add copyright headers to source files.
+This modifies file contents in git history to add BSD-3-Clause-Clear headers.
+"""
+
+# Copyright text
+COPYRIGHT_TEXT = b"""Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+SPDX-License-Identifier: BSD-3-Clause-Clear"""
+
+# File extensions and their comment styles (as bytes)
+COMMENT_STYLES = {
+    b'.c': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.h': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.cpp': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.hpp': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.cc': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.S': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.s': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.py': (b'# ', b'\n# ', b'\n\n'),
+    b'.sh': (b'# ', b'\n# ', b'\n\n'),
+    b'.pl': (b'# ', b'\n# ', b'\n\n'),
+    b'.java': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.js': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+}
+
+def has_copyright(content):
+    """Check if file already has a copyright header."""
+    first_part = content[:500].lower()
+    return b'copyright' in first_part or b'spdx-license-identifier' in first_part
+
+def get_file_extension(filename):
+    """Get file extension as bytes."""
+    if b'.' not in filename:
+        return None
+    return b'.' + filename.rsplit(b'.', 1)[1]
+
+def add_copyright_to_blob(blob):
+    """Add copyright header to blob content."""
+    # Get filename from blob
+    filename = blob.filename if hasattr(blob, 'filename') else b''
+
+    # Get file extension
+    ext = get_file_extension(filename)
+    if ext not in COMMENT_STYLES:
+        return  # Not a file type we handle
+
+    # Get original content
+    original_data = blob.data
+
+    # Check if already has copyright
+    if has_copyright(original_data):
+        return  # Already has copyright
+
+    # Get comment style
+    start, middle, end = COMMENT_STYLES[ext]
+
+    # Handle shebang for scripts
+    shebang = b""
+    content = original_data
+    if content.startswith(b'#!'):
+        lines = content.split(b'\n', 1)
+        shebang = lines[0] + b'\n'
+        content = lines[1] if len(lines) > 1 else b""
+
+    # Create copyright header
+    copyright_lines = COPYRIGHT_TEXT.split(b'\n')
+    header = start + middle.join(copyright_lines) + end
+
+    # Combine: shebang + copyright + original content
+    new_data = shebang + header + content
+
+    # Update blob data
+    blob.data = new_data
+
+# This is the callback function that git-filter-repo will call
+add_copyright_to_blob(blob)
diff --git a/add-copyright-file-callback.py b/add-copyright-file-callback.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+"""
+File-info callback for git-filter-repo to add copyright headers to source files.
+This modifies file contents in git history to add BSD-3-Clause-Clear headers.
+"""
+
+import re
+
+# Copyright text (use explicit newline to avoid indentation issues)
+COPYRIGHT_TEXT = b"Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.\nSPDX-License-Identifier: BSD-3-Clause-Clear"
+
+# File extensions and their comment styles (as bytes)
+COMMENT_STYLES = {
+    b'.c': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.h': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.cpp': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.hpp': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.cc': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.S': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.s': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.py': (b'# ', b'\n# ', b'\n\n'),
+    b'.sh': (b'# ', b'\n# ', b'\n\n'),
+    b'.pl': (b'# ', b'\n# ', b'\n\n'),
+    b'.java': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+    b'.js': (b'/*\n * ', b'\n * ', b'\n */\n\n'),
+}
+
+def has_new_copyright(content):
+    """Check if file already has the NEW copyright header."""
+    first_part = content[:500]
+    return b'SPDX-License-Identifier: BSD-3-Clause-Clear' in first_part
+
+def remove_old_copyright(content):
+    """Remove old copyright headers from content."""
+    # Pattern 1: Old Qualcomm copyright blocks with ====== borders
+    # These typically start with /*====== and end with ======*/
+    import re
+
+    # Remove old copyright blocks (the ones with ====== borders)
+    # Match from /*====== to the closing ======*/
+    pattern1 = rb'/\*={5,}.*?={5,}\*/'
+    content = re.sub(pattern1, b'', content, flags=re.DOTALL)
+
+    # Pattern 2: Simple copyright lines like "Copyright (c) 2013 by Qualcomm..."
+    # Remove standalone copyright comments
+    pattern2 = rb'/\*\s*Copyright \(c\).*?\*/'
+    content = re.sub(pattern2, b'', content, flags=re.DOTALL)
+
+    # Clean up multiple blank lines that may result
+    content = re.sub(rb'\n\n\n+', b'\n\n', content)
+
+    # Remove leading blank lines
+    content = content.lstrip(b'\n')
+
+    return content
+
+def get_file_extension(filename):
+    """Get file extension as bytes."""
+    if b'.' not in filename:
+        return None
+    return b'.' + filename.rsplit(b'.', 1)[1]
+
+# Skip symbolic links (mode 120000 in octal)
+if mode == b'120000':
+    return (filename, mode, blob_id)
+
+# Get file extension
+ext = get_file_extension(filename)
+
+# Only process files with known extensions
+if ext in COMMENT_STYLES:
+    # Get original content
+    original_data = value.get_contents_by_identifier(blob_id)
+
+    # Check if already has the NEW copyright
+    if not has_new_copyright(original_data):
+        # Remove any old copyright headers first
+        content = remove_old_copyright(original_data)
+
+        # Get comment style
+        start, middle, end = COMMENT_STYLES[ext]
+
+        # Handle shebang for scripts
+        shebang = b""
+        if content.startswith(b'#!'):
+            lines = content.split(b'\n', 1)
+            shebang = lines[0] + b'\n'
+            content = lines[1] if len(lines) > 1 else b""
+
+        # Create copyright header
+        copyright_lines = COPYRIGHT_TEXT.split(b'\n')
+        header = start + middle.join(copyright_lines) + end
+
+        # Combine: shebang + copyright + original content
+        new_data = shebang + header + content
+
+        # Insert new blob and get new blob_id
+        blob_id = value.insert_file_with_contents(new_data)
+
+# Return the (possibly modified) file info
+return (filename, mode, blob_id)