Skip to content

[SECURITY] Add path traversal protection to prevent malicious file access #10

@jonasyr

Description

@jonasyr

Description

The current file processing code doesn't validate file paths, making it vulnerable to directory traversal attacks. Malicious repositories could include files with paths like ../../../etc/passwd that could access files outside the intended directory.

Current Behavior

  • Files are processed without path validation
  • os.path.join() and os.path.relpath() are used without security checks
  • Symlinks and relative paths are not sanitized

Expected Behavior

  • All file paths should be validated to ensure they stay within the base directory
  • Symlinks pointing outside the base directory should be rejected
  • Clear logging when potentially unsafe paths are encountered

Files Affected

  • codebase_to_text/codebase_to_text.py (lines 366-399, 420-450)

Implementation Suggestions

def _validate_file_path(self, file_path, base_path):
    """Validate file path to prevent directory traversal attacks"""
    try:
        abs_file = os.path.abspath(file_path)
        abs_base = os.path.abspath(base_path)
        common_path = os.path.commonpath([abs_file, abs_base])
        return common_path == abs_base
    except (ValueError, OSError):
        return False

Acceptance Criteria

  • Add path validation function that prevents directory traversal
  • Integrate validation into _process_single_file method
  • Add verbose logging for rejected paths
  • Write tests for malicious path attempts
  • Document security considerations in README

Definition of Done

  • Code passes security review
  • Tests demonstrate protection against common traversal attacks
  • No existing functionality is broken
  • Performance impact is minimal (<5% overhead)

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingsecurity

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions