-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Labels
Description
Description
The CodebaseToText class has grown to 600+ lines and handles multiple responsibilities (file traversal, pattern matching, content processing, output generation). This violates the Single Responsibility Principle and makes the code hard to maintain and test.
Current Issues
- Single class with 25+ methods
- Mixed concerns: file I/O, pattern matching, formatting, CLI handling
- Difficult to unit test individual components
- Hard to extend with new output formats or processing modes
Proposed Architecture
CodebaseToText (orchestrator)
├── FileTreeGenerator (directory traversal and tree generation)
├── PatternMatcher (exclusion pattern logic)
├── ContentProcessor (file reading and processing)
├── OutputFormatter (text/docx generation)
└── GitHubHandler (repository cloning and cleanup)
Implementation Plan
- Extract
PatternMatcherclass first (lowest risk) - Extract
OutputFormatterfor text/docx generation - Extract
FileTreeGeneratorfor directory operations - Extract
GitHubHandlerfor Git operations - Refactor main class to use composition
Files Affected
codebase_to_text/codebase_to_text.py(entire file)- New files to create:
pattern_matcher.pyoutput_formatter.pyfile_tree_generator.pygithub_handler.py
Acceptance Criteria
- Each new class has a single, clear responsibility
- All existing functionality is preserved
- Code coverage remains ≥90%
- API compatibility is maintained for public methods
- Each component is individually testable
- Documentation updated for new architecture
Benefits
- Easier to add new output formats (PDF, HTML, etc.)
- Simpler unit testing of individual components
- Better separation of concerns
- Easier to modify pattern matching logic
- Cleaner code organization
Reactions are currently unavailable