Description
CommitGate currently scans each staged file individually using Gitleaks. This means the entire contents of a staged file are scanned, even when only a few lines were modified.
While this approach is simple and reliable, it can become inefficient for large files and does not fully align with CommitGate's "Least Privilege" and "Fast Feedback" design principles.
Goal
Investigate scanning only staged content while preserving accurate secret detection and finding locations.
Potential approaches include:
- Scanning
git diff --cached via Gitleaks stdin mode
- Scanning the staged version of files directly from the Git index
- Hybrid approaches that preserve context for multi-line secrets
Challenges
Accurate Location Reporting
The current implementation provides:
- File path
- Start line
- End line
- Start column
- End column
If scanning diffs via stdin, findings may be reported relative to the diff instead of the source file.
Need a strategy to map findings back to:
- Original file
- Real line numbers
- Real columns (if possible)
Detection Accuracy
Scanning only changed lines may reduce context available to Gitleaks and impact detection of:
- Multi-line secrets
- Split credentials
- Context-dependent findings
Need to verify detection quality remains acceptable.
Acceptance Criteria
- Benchmark current file-based scanning against staged-content scanning
- Preserve or improve scan performance
- Preserve accurate file and line number reporting
- Validate that common secret types are still detected correctly
- Document performance vs accuracy tradeoffs
Priority
Low
Notes
The current implementation intentionally prioritizes detection accuracy over performance.
This issue is an optimization and should be evaluated after MVP and hackathon deliverables are complete.
Description
CommitGate currently scans each staged file individually using Gitleaks. This means the entire contents of a staged file are scanned, even when only a few lines were modified.
While this approach is simple and reliable, it can become inefficient for large files and does not fully align with CommitGate's "Least Privilege" and "Fast Feedback" design principles.
Goal
Investigate scanning only staged content while preserving accurate secret detection and finding locations.
Potential approaches include:
git diff --cachedvia Gitleaks stdin modeChallenges
Accurate Location Reporting
The current implementation provides:
If scanning diffs via stdin, findings may be reported relative to the diff instead of the source file.
Need a strategy to map findings back to:
Detection Accuracy
Scanning only changed lines may reduce context available to Gitleaks and impact detection of:
Need to verify detection quality remains acceptable.
Acceptance Criteria
Priority
Low
Notes
The current implementation intentionally prioritizes detection accuracy over performance.
This issue is an optimization and should be evaluated after MVP and hackathon deliverables are complete.