Skip to content

[REFACTOR] Break down CodebaseToText god class into focused components #13

@jonasyr

Description

@jonasyr

Description

The CodebaseToText class has grown to 600+ lines and handles multiple responsibilities (file traversal, pattern matching, content processing, output generation). This violates the Single Responsibility Principle and makes the code hard to maintain and test.

Current Issues

  • Single class with 25+ methods
  • Mixed concerns: file I/O, pattern matching, formatting, CLI handling
  • Difficult to unit test individual components
  • Hard to extend with new output formats or processing modes

Proposed Architecture

CodebaseToText (orchestrator)
├── FileTreeGenerator (directory traversal and tree generation)
├── PatternMatcher (exclusion pattern logic)
├── ContentProcessor (file reading and processing)
├── OutputFormatter (text/docx generation)
└── GitHubHandler (repository cloning and cleanup)

Implementation Plan

  1. Extract PatternMatcher class first (lowest risk)
  2. Extract OutputFormatter for text/docx generation
  3. Extract FileTreeGenerator for directory operations
  4. Extract GitHubHandler for Git operations
  5. Refactor main class to use composition

Files Affected

  • codebase_to_text/codebase_to_text.py (entire file)
  • New files to create:
    • pattern_matcher.py
    • output_formatter.py
    • file_tree_generator.py
    • github_handler.py

Acceptance Criteria

  • Each new class has a single, clear responsibility
  • All existing functionality is preserved
  • Code coverage remains ≥90%
  • API compatibility is maintained for public methods
  • Each component is individually testable
  • Documentation updated for new architecture

Benefits

  • Easier to add new output formats (PDF, HTML, etc.)
  • Simpler unit testing of individual components
  • Better separation of concerns
  • Easier to modify pattern matching logic
  • Cleaner code organization

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions