This document describes the architecture, design decisions, and data flow of Git Summarizer.
Git Summarizer is designed with the following principles:
- Modularity: Each component has a single responsibility
- Extensibility: Easy to add new providers and analyzers
- Performance: Efficient processing of large repositories
- Usability: Beautiful terminal output with Rich
┌─────────────────────────────────────────────────────────────────────┐
│ USER │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ CLI Layer │ │
│ │ (cli.py + Typer)│ │
│ └────────┬────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ GitAnalyzer │ │RiskAnalyzer │ │ Contributor │ │
│ │ │ │ │ │ Analyzer │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ AnalysisResult │ │
│ │ (Pydantic) │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ LLM Provider │ │
│ │ (llm.py) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌─────────────┴─────────────┐ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ BaseProvider │ │ OpenAIProvider │ │
│ │ (Abstract) │◄────────│ │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ OUTPUT │
│ ┌─────────────┴─────────────┐ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Rich Terminal │ │ JSON Export │ │
│ │ Output │ │ │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ gitsum package │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ cli.py │───▶│ analyzer.py │───▶│ risk.py │ │
│ │ │ │ │ │ │ │
│ │ • Commands │ │ • Load repo │ │ • Score files │ │
│ │ • Options │ │ • Process │ │ • Identify factors │ │
│ │ • Output │ │ • Analyze │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────────────────┐ │
│ │ │ contributors.py │ │
│ │ │ │ │
│ │ │ • Track ownership │ │
│ │ │ • Calculate hotspots│ │
│ │ │ • Bus factor │ │
│ │ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌─────────────────────────────────────────┐ │
│ │ llm.py │───▶│ providers/ │ │
│ │ │ │ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ • Orchestrate│ │ │ base.py │ │ openai_provider │ │ │
│ │ • Summarize │ │ │ (Abstract) │◄─│ │ │ │
│ └─────────────┘ │ └──────────────┘ └──────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────────────────────────────────┐ │
│ │ utils.py │ │ models/ │ │
│ │ │ │ ┌──────────────────────────────────┐ │ │
│ │ • Helpers │ │ │ history_model.py │ │ │
│ │ • Stats │ │ │ │ │ │
│ │ • Formatting│ │ │ • CommitInfo │ │ │
│ └─────────────┘ │ │ • HistorySummary │ │ │
│ │ │ • RiskFile │ │ │
│ │ │ • ContributorInfo │ │ │
│ │ │ • AnalysisResult │ │ │
│ │ └──────────────────────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Responsibility: User interaction and output formatting
# Key components:
- app: Typer application instance
- summarize(): Main command handler
- print_*(): Output formatting functions
- save_json_output(): JSON exportDependencies:
typerfor CLI frameworkrichfor terminal formattingGitAnalyzer,LLMSummarizer
Responsibility: Repository analysis and data extraction
class GitAnalyzer:
def load() -> bool
def analyze() -> AnalysisResult
def _process_commits()
def _analyze_history() -> HistorySummary
def _find_major_changes() -> list[MajorChange]
def _analyze_commit_patterns() -> list[CommitPattern]
def _detect_anomalies() -> list[AnomalyInfo]Dependencies:
gitpythonfor Git accessRiskAnalyzer,ContributorAnalyzer
Responsibility: File risk scoring and factor identification
class RiskAnalyzer:
WEIGHTS = {
"change_frequency": 0.30,
"lines_changed": 0.20,
"author_count": 0.15,
"recency": 0.20,
"complexity": 0.15,
}
def analyze() -> list[RiskFile]
def _calculate_recency_score() -> float
def _identify_risk_factors() -> list[str]Responsibility: Developer tracking and knowledge mapping
class ContributorAnalyzer:
def analyze() -> list[ContributorInfo]
def get_file_ownership() -> dict
def get_knowledge_concentration() -> list[dict]
def get_contributor_timeline() -> list[dict]
def get_bus_factor(contributors, threshold) -> intResponsibility: Coordinate LLM-based summarization
class LLMSummarizer:
def summarize(analysis: AnalysisResult) -> str
def is_available() -> boolResponsibility: Abstraction layer for different LLM providers
# base.py
class BaseProvider(ABC):
@abstractmethod
def generate_summary(sections, style) -> str
@abstractmethod
def is_available() -> bool
def get_system_prompt(style) -> str
def format_sections_as_text(sections) -> str
def build_user_prompt(sections, style) -> str
# openai_provider.py
class OpenAIProvider(BaseProvider):
def generate_summary(sections, style) -> str
def _generate_fallback_summary(sections, style) -> strInput: Repository Path
│
▼
┌─────────────────┐
│ Load Repo │ GitPython reads .git
│ (GitAnalyzer) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Process Commits │ Extract CommitInfo for each commit
│ │ Track file changes
└────────┬────────┘
│
├─────────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Analyze History │ │ Detect Patterns │
│ • Timeline │ │ • Keywords │
│ • Statistics │ │ • Anomalies │
└────────┬────────┘ └────────┬────────┘
│ │
├───────────────────┘
│
├─────────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Risk Analysis │ │ Contributor │
│ • Score files │ │ Analysis │
│ • Find factors │ │ • Hotspots │
└────────┬────────┘ └────────┬────────┘
│ │
└───────┬───────────┘
▼
┌─────────────────┐
│ AnalysisResult │ Aggregated data model
└────────┬────────┘
│
▼
┌─────────────────┐
│ LLM Summary │ Optional AI enhancement
│ (if available) │
└────────┬────────┘
│
▼
┌───────────┐
│ Output │
│ • Rich UI │
│ • JSON │
└───────────┘
Git Commits (raw)
│
▼
┌──────────────────┐
│ CommitInfo │ Normalized commit data
│ • sha │
│ • author │
│ • date │
│ • lines_added │
│ • impact_score │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ File Changes │ Aggregated per file
│ • count │
│ • lines_added │
│ • authors │
│ • last_changed │
└────────┬─────────┘
│
┌────┴────┐
▼ ▼
┌────────┐ ┌──────────────┐
│RiskFile│ │ContributorInfo│
└────────┘ └──────────────┘
│ │
└──────┬──────┘
▼
┌───────────────┐
│AnalysisResult │
│ .to_sections()│───▶ LLM Input
└───────────────┘
class BaseProvider(ABC):
"""Abstract interface for LLM providers."""
def __init__(self, model: str = None, **kwargs):
self.model = model
self.config = kwargs
@abstractmethod
def generate_summary(
self,
sections: dict,
style: SummaryStyle,
) -> str:
"""Generate summary from analysis sections."""
pass
@abstractmethod
def is_available(self) -> bool:
"""Check if provider is configured."""
pass# providers/anthropic_provider.py
from gitsum.providers.base import BaseProvider, SummaryStyle
class AnthropicProvider(BaseProvider):
DEFAULT_MODEL = "claude-3-opus"
def __init__(self, model=None, api_key=None, **kwargs):
super().__init__(model or self.DEFAULT_MODEL)
self.api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
def generate_summary(self, sections, style):
# Implementation
pass
def is_available(self):
return bool(self.api_key)# providers/__init__.py
def get_provider(name: str, **kwargs) -> BaseProvider:
providers = {
"openai": OpenAIProvider,
"anthropic": AnthropicProvider, # Add new provider
}
return providers[name](**kwargs)┌─────────────────────────────────────────────────────────────┐
│ AnalysisResult │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ HistorySummary │ │
│ │ • total_commits • daily_avg_commits │ │
│ │ • first_commit_date • weekly_avg_commits │ │
│ │ • last_commit_date • longest_inactive_days │ │
│ │ • total_authors • total_lines_added/deleted │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ MajorChange │ │ RiskFile │ │ ContributorInfo │ │
│ │ • commit_sha │ │ • path │ │ • name │ │
│ │ • author │ │ • risk_score │ │ • commit_count │ │
│ │ • impact │ │ • factors │ │ • expertise │ │
│ └──────────────┘ └──────────────┘ └─────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │CommitPattern │ │ AnomalyInfo │ │
│ │ • keyword │ │ • type │ │
│ │ • count │ │ • severity │ │
│ │ • percentage │ │ • details │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
All data models use Pydantic for:
- Type validation
- JSON serialization
- Default values
- Documentation
Decision: Use GitPython instead of subprocess git calls
Rationale:
- Type safety and structured data
- Cross-platform compatibility
- Better error handling
- Easier testing
Decision: Abstract LLM providers behind a common interface
Rationale:
- Easy to swap providers
- Fallback when API unavailable
- Testing without API calls
- Future extensibility
Decision: Use Rich library for all terminal output
Rationale:
- Beautiful, consistent formatting
- Tables, panels, progress bars
- Color support
- Markdown rendering
Decision: Use Pydantic models for all data structures
Rationale:
- Type validation
- Easy JSON serialization
- Self-documenting
- IDE support
Decision: Use weighted factors for risk calculation
Rationale:
- Configurable importance
- Transparent scoring
- Easy to explain
- Adjustable thresholds
# Commit limiting
analyzer = GitAnalyzer(path, limit_commits=1000)
# Progress tracking
with Progress() as progress:
for commit in commits:
process(commit)
progress.update(task, advance=1)- Stream processing for commits
- Lazy loading of file stats
- Aggregate data as we go
- Don't store full diffs
- Parallel processing: Analyze multiple files concurrently
- Caching: Cache analysis results
- Incremental analysis: Only process new commits
- Sampling: Statistically sample for very large repos
class SecurityAnalyzer:
"""Analyze security-related patterns."""
def analyze(self, commits, file_changes):
# Look for security keywords
# Check for sensitive file patterns
# Return SecurityReport
passclass HTMLReporter:
"""Generate HTML report."""
def render(self, analysis: AnalysisResult) -> str:
# Render Jinja template
passclass SlackNotifier:
"""Send summary to Slack."""
def notify(self, analysis: AnalysisResult):
# Post to Slack webhook
pass- Model validation
- Risk score calculation
- Contributor aggregation
- Utility functions
- Full pipeline with real repo
- CLI command execution
- JSON output validation
- Provider API calls
- Git repository operations
- Multi-repo analysis: Compare across repositories
- Time-series tracking: Track metrics over time
- CI/CD integration: GitHub Actions, GitLab CI
- Web dashboard: Visual reporting interface
- Custom rules: User-defined risk factors