A powerful tool for scanning public GitHub repositories to identify exposed API tokens and secrets using regular expressions, entropy analysis, and GitHub's Code Search API.
Caution
This tool is developed strictly for educational purposes, authorized security research, and responsible disclosure.
- Security researchers performing authorized bug bounty work
- Organizations auditing their own public repositories for leaked credentials
- Developers verifying that their own API keys haven't been accidentally committed
- Educators demonstrating the risks of hardcoded secrets in public code
- ❌ DO NOT use discovered tokens or credentials for unauthorized access to any system
- ❌ DO NOT exploit, sell, share, or abuse any leaked credentials you find
- ❌ DO NOT use this tool to target individuals, organizations, or repositories without explicit authorization
- ❌ DO NOT use this tool in violation of GitHub's Terms of Service or Acceptable Use Policies
If you discover exposed credentials:
- Report them to the repository owner or organization immediately
- Revoke your own tokens if you find them exposed
- Follow responsible disclosure practices (e.g., via HackerOne or the vendor's security contact)
The authors, contributors, and maintainers of GitSentry accept NO responsibility for any misuse, damage, or legal consequences arising from the use of this tool. By using this software, you acknowledge that you are solely responsible for ensuring your actions comply with all applicable local, state, national, and international laws and regulations. Use at your own risk.
GitSentry helps security professionals and developers identify exposed API keys, tokens, and credentials in public GitHub repositories. It leverages GitHub's Code Search API to find potentially sensitive information using customizable regex patterns and Shannon entropy analysis.
| Feature | Description |
|---|---|
| 📋 200+ Token Patterns | Pre-configured regex for GitHub, AWS, Stripe, Groq, Cerebras, Gemini, and more |
| 🔍 Custom Regex | Search with your own patterns via UI or CLI |
| 🧮 Entropy Analysis | Shannon entropy scoring detects secrets that don't match known patterns |
| ⚡ Extended Search | Bypass GitHub's 1000-result limit with parallel filename-prefix partitioning |
| 🔄 Smart Rate Limiting | Reads X-RateLimit-* headers for proactive wait instead of blind retries |
| 🛑 Cancel Support | Stop any search mid-flight (extended searches stop between batches/pages) |
| 💾 Multi-Format Export | Download results as JSON (tokens-only or detailed) or CSV |
| 🖥️ CLI Mode | Run scans from the command line — no browser needed |
| 🔐 Token Rotation | Automatic rotation and pool management for multiple GitHub PATs |
| 📊 Risk Scoring | Each found token is scored: low / medium / high / critical |
- Python 3.8+
- One or more GitHub Personal Access Tokens (PATs) with
public_reposcope
# 1. Clone the repository
git clone https://github.com/Rkcr7/GitSentry.git
cd GitSentry
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure your tokens
cp .env.example .env
# Edit .env and add your GitHub token(s)Create a .env file (or copy from .env.example):
# Single token (normal search only)
GITHUB_TOKEN=ghp_yourTokenHere
# Multiple tokens (enables extended/parallel search)
GITHUB_TOKENS=ghp_token1,ghp_token2,ghp_token3,ghp_token4,ghp_token5How many tokens do you need?
| Mode | Min Tokens | Recommended | Parallel Workers |
|---|---|---|---|
| Normal search | 1 | 1 | 1 |
| Extended search | 2 | 5-6 | tokens - 1 |
Generate tokens at github.com/settings/tokens → Generate new token (classic) → select public_repo scope.
streamlit run app.pyOpen http://localhost:8501 in your browser.
Workflow:
- Select a Token Pattern from the sidebar (or use "Custom Pattern" with your own regex)
- Optionally add Additional Query qualifiers (e.g.,
language:python,org:example) - Set Result Limit and toggle Extended Search if needed
- Click 🚀 Start Scanning
- Watch real-time progress — cancel anytime with 🛑 Cancel Search
- Export results as JSON or CSV via the download buttons
# Scan for GitHub tokens
python cli.py --pattern "GitHub Token" --limit 100
# Scan with custom regex
python cli.py --regex "ghp_[a-zA-Z0-9]{36}" --limit 200 --extended
# Export as CSV
python cli.py --pattern "Groq API Key" --output csv --limit 500
# List all available patterns
python cli.py --list-patterns
# Full help
python cli.py --helpCLI Options:
| Flag | Description |
|---|---|
--pattern, -p |
Name of a predefined token pattern |
--regex, -r |
Custom regex pattern |
--query, -q |
Additional GitHub search qualifiers |
--limit, -l |
Max results (default: 100) |
--extended, -e |
Extended search (multi-query, bypasses 1000 limit) |
--cooldown, -c |
Cooldown between batches in seconds (default: 40) |
--output, -o |
Output format: json, csv, both, stdout |
--list-patterns |
List all available patterns |
- Uses a single query to GitHub's Code Search API
- Limited to 1,000 results per GitHub API constraints
- Fast — requires only 1 token
- Splits your query into 28 sub-queries using filename prefixes (a-z, 0-9,
.,_) - Processes batches in parallel using your token pool
- Cooldown between batches respects rate limits
- Results are deduplicated across all sub-queries
- Can find thousands of results beyond the normal 1,000 limit
Add patterns to token_patterns.json:
{
"My Service API Key": "myprefix_[a-zA-Z0-9]{32}",
"Another Pattern": "sk-[a-zA-Z0-9]{48}"
}Tips for good patterns:
- Include the token prefix (e.g.,
ghp_,sk_live_,gsk_,csk-) - Specify exact character sets and lengths
- Test your regex on regex101.com before scanning
GitSentry reads GitHub's rate limit headers (X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) to proactively wait before hitting limits. If rate limits are still hit:
- Tokens are rotated automatically
- Exponential backoff with jitter is applied
- Extended search cooldowns are configurable (10-120 seconds)
# Run all unit tests
python -m pytest tests/ -v
# 52 tests covering:
# - entropy.py (Shannon entropy, charset classification, risk scoring)
# - result_processor.py (regex matching, deduplication, CSV export)
# - search_query.py (query generation, keyword extraction)GitSentry/
├── app.py # Streamlit web UI
├── cli.py # Command-line interface
├── github_api.py # GitHub API client (framework-agnostic)
├── thread_safe_api.py # Thread-safe state bridge
├── config.py # Token rotation and pool management
├── search_query.py # Search query generation with GitHub qualifiers
├── token_patterns.py # Pattern loader with metadata support
├── token_patterns.json # 200+ regex patterns
├── result_processor.py # Result processing, dedup, CSV/JSON export
├── entropy.py # Shannon entropy analysis for secret detection
├── requirements.txt # Python dependencies
├── .env.example # Example environment configuration
├── conftest.py # Pytest configuration
└── tests/ # Unit test suite
├── test_entropy.py
├── test_result_processor.py
└── test_search_query.py
This project is licensed under the MIT License — see the LICENSE file for details.
Contributions are welcome! Please ensure any additions align with the ethical guidelines stated above.
- GitHub REST API for code search capabilities
- Streamlit for the interactive web interface
- All contributors to the token pattern database