Skip to content

[SECURITY] Strengthen GitHub URL validation and sanitization #12

@jonasyr

Description

@jonasyr

Description

The current GitHub URL validation only checks for URL prefix (https://github.com/) but doesn't validate the actual repository format or prevent potentially malicious URLs.

Current Behavior

def is_github_repo(self):
    return (self.input_path.startswith("https://github.com/") or
            self.input_path.startswith("git@github.com:"))

Security Concerns

  • No validation of repository name format
  • Accepts any URL starting with https://github.com/
  • No protection against URL injection attacks
  • Missing support for authentication options

Expected Behavior

  • Strict validation of GitHub repository URL format
  • Sanitization of repository names
  • Clear error messages for invalid URLs
  • Optional support for authentication tokens

Implementation Requirements

def _validate_github_url(self, url):
    """Validate and sanitize GitHub repository URL"""
    import re
    
    # Support both HTTPS and SSH formats
    https_pattern = r'^https://github\.com/([a-zA-Z0-9._-]+)/([a-zA-Z0-9._-]+)(?:\.git)?/?$'
    ssh_pattern = r'^git@github\.com:([a-zA-Z0-9._-]+)/([a-zA-Z0-9._-]+)(?:\.git)?$'
    
    if re.match(https_pattern, url) or re.match(ssh_pattern, url):
        return True
    return False

Files Affected

  • codebase_to_text/codebase_to_text.py (lines 586-588, 567-570)

Acceptance Criteria

  • Implement strict URL validation with regex patterns
  • Add support for .git suffix in URLs
  • Validate repository and username formats
  • Add comprehensive tests for valid/invalid URLs
  • Update documentation with supported URL formats
  • Add clear error messages for validation failures

Test Cases to Add

  • Valid: https://github.com/user/repo, https://github.com/user/repo.git
  • Invalid: https://github.com/, https://github.com/user/, https://github.com/../malicious

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions