-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
Description
Description
The current GitHub URL validation only checks for URL prefix (https://github.com/) but doesn't validate the actual repository format or prevent potentially malicious URLs.
Current Behavior
def is_github_repo(self):
return (self.input_path.startswith("https://github.com/") or
self.input_path.startswith("git@github.com:"))Security Concerns
- No validation of repository name format
- Accepts any URL starting with
https://github.com/ - No protection against URL injection attacks
- Missing support for authentication options
Expected Behavior
- Strict validation of GitHub repository URL format
- Sanitization of repository names
- Clear error messages for invalid URLs
- Optional support for authentication tokens
Implementation Requirements
def _validate_github_url(self, url):
"""Validate and sanitize GitHub repository URL"""
import re
# Support both HTTPS and SSH formats
https_pattern = r'^https://github\.com/([a-zA-Z0-9._-]+)/([a-zA-Z0-9._-]+)(?:\.git)?/?$'
ssh_pattern = r'^git@github\.com:([a-zA-Z0-9._-]+)/([a-zA-Z0-9._-]+)(?:\.git)?$'
if re.match(https_pattern, url) or re.match(ssh_pattern, url):
return True
return FalseFiles Affected
codebase_to_text/codebase_to_text.py(lines 586-588, 567-570)
Acceptance Criteria
- Implement strict URL validation with regex patterns
- Add support for
.gitsuffix in URLs - Validate repository and username formats
- Add comprehensive tests for valid/invalid URLs
- Update documentation with supported URL formats
- Add clear error messages for validation failures
Test Cases to Add
- Valid:
https://github.com/user/repo,https://github.com/user/repo.git - Invalid:
https://github.com/,https://github.com/user/,https://github.com/../malicious
Reactions are currently unavailable