Skip to content

Security tool that scans GitHub repositories for exposed API tokens and credentials using 200+ regex patterns. Features multi-threading, token rotation, and extended search capabilities to bypass API limits.

License

Notifications You must be signed in to change notification settings

Rkcr7/GitSentry

Repository files navigation

GitSentry

GitSentry Python Streamlit License

A powerful tool for scanning public GitHub repositories to identify exposed API tokens and secrets using regular expressions, entropy analysis, and GitHub's Code Search API.


⚠️ Ethical Usage, Disclaimer & Legal Notice

Caution

This tool is developed strictly for educational purposes, authorized security research, and responsible disclosure.

Intended Use Cases

  • Security researchers performing authorized bug bounty work
  • Organizations auditing their own public repositories for leaked credentials
  • Developers verifying that their own API keys haven't been accidentally committed
  • Educators demonstrating the risks of hardcoded secrets in public code

Prohibited Use

  • DO NOT use discovered tokens or credentials for unauthorized access to any system
  • DO NOT exploit, sell, share, or abuse any leaked credentials you find
  • DO NOT use this tool to target individuals, organizations, or repositories without explicit authorization
  • DO NOT use this tool in violation of GitHub's Terms of Service or Acceptable Use Policies

Responsible Disclosure

If you discover exposed credentials:

  1. Report them to the repository owner or organization immediately
  2. Revoke your own tokens if you find them exposed
  3. Follow responsible disclosure practices (e.g., via HackerOne or the vendor's security contact)

Liability

The authors, contributors, and maintainers of GitSentry accept NO responsibility for any misuse, damage, or legal consequences arising from the use of this tool. By using this software, you acknowledge that you are solely responsible for ensuring your actions comply with all applicable local, state, national, and international laws and regulations. Use at your own risk.


🔍 Overview

GitSentry helps security professionals and developers identify exposed API keys, tokens, and credentials in public GitHub repositories. It leverages GitHub's Code Search API to find potentially sensitive information using customizable regex patterns and Shannon entropy analysis.

Features

Feature Description
📋 200+ Token Patterns Pre-configured regex for GitHub, AWS, Stripe, Groq, Cerebras, Gemini, and more
🔍 Custom Regex Search with your own patterns via UI or CLI
🧮 Entropy Analysis Shannon entropy scoring detects secrets that don't match known patterns
Extended Search Bypass GitHub's 1000-result limit with parallel filename-prefix partitioning
🔄 Smart Rate Limiting Reads X-RateLimit-* headers for proactive wait instead of blind retries
🛑 Cancel Support Stop any search mid-flight (extended searches stop between batches/pages)
💾 Multi-Format Export Download results as JSON (tokens-only or detailed) or CSV
🖥️ CLI Mode Run scans from the command line — no browser needed
🔐 Token Rotation Automatic rotation and pool management for multiple GitHub PATs
📊 Risk Scoring Each found token is scored: low / medium / high / critical

🚀 Installation

Prerequisites

  • Python 3.8+
  • One or more GitHub Personal Access Tokens (PATs) with public_repo scope

Setup

# 1. Clone the repository
git clone https://github.com/Rkcr7/GitSentry.git
cd GitSentry

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure your tokens
cp .env.example .env
# Edit .env and add your GitHub token(s)

Token Configuration

Create a .env file (or copy from .env.example):

# Single token (normal search only)
GITHUB_TOKEN=ghp_yourTokenHere

# Multiple tokens (enables extended/parallel search)
GITHUB_TOKENS=ghp_token1,ghp_token2,ghp_token3,ghp_token4,ghp_token5

How many tokens do you need?

Mode Min Tokens Recommended Parallel Workers
Normal search 1 1 1
Extended search 2 5-6 tokens - 1

Generate tokens at github.com/settings/tokensGenerate new token (classic) → select public_repo scope.


💻 Usage

Web UI (Streamlit)

streamlit run app.py

Open http://localhost:8501 in your browser.

Workflow:

  1. Select a Token Pattern from the sidebar (or use "Custom Pattern" with your own regex)
  2. Optionally add Additional Query qualifiers (e.g., language:python, org:example)
  3. Set Result Limit and toggle Extended Search if needed
  4. Click 🚀 Start Scanning
  5. Watch real-time progress — cancel anytime with 🛑 Cancel Search
  6. Export results as JSON or CSV via the download buttons

CLI Mode

# Scan for GitHub tokens
python cli.py --pattern "GitHub Token" --limit 100

# Scan with custom regex
python cli.py --regex "ghp_[a-zA-Z0-9]{36}" --limit 200 --extended

# Export as CSV
python cli.py --pattern "Groq API Key" --output csv --limit 500

# List all available patterns
python cli.py --list-patterns

# Full help
python cli.py --help

CLI Options:

Flag Description
--pattern, -p Name of a predefined token pattern
--regex, -r Custom regex pattern
--query, -q Additional GitHub search qualifiers
--limit, -l Max results (default: 100)
--extended, -e Extended search (multi-query, bypasses 1000 limit)
--cooldown, -c Cooldown between batches in seconds (default: 40)
--output, -o Output format: json, csv, both, stdout
--list-patterns List all available patterns

📊 Understanding Search Modes

Standard Search

  • Uses a single query to GitHub's Code Search API
  • Limited to 1,000 results per GitHub API constraints
  • Fast — requires only 1 token

Extended Search

  • Splits your query into 28 sub-queries using filename prefixes (a-z, 0-9, ., _)
  • Processes batches in parallel using your token pool
  • Cooldown between batches respects rate limits
  • Results are deduplicated across all sub-queries
  • Can find thousands of results beyond the normal 1,000 limit

🔧 Configuration

Custom Token Patterns

Add patterns to token_patterns.json:

{
    "My Service API Key": "myprefix_[a-zA-Z0-9]{32}",
    "Another Pattern": "sk-[a-zA-Z0-9]{48}"
}

Tips for good patterns:

  • Include the token prefix (e.g., ghp_, sk_live_, gsk_, csk-)
  • Specify exact character sets and lengths
  • Test your regex on regex101.com before scanning

Rate Limit Handling

GitSentry reads GitHub's rate limit headers (X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) to proactively wait before hitting limits. If rate limits are still hit:

  • Tokens are rotated automatically
  • Exponential backoff with jitter is applied
  • Extended search cooldowns are configurable (10-120 seconds)

🧪 Testing

# Run all unit tests
python -m pytest tests/ -v

# 52 tests covering:
#   - entropy.py (Shannon entropy, charset classification, risk scoring)
#   - result_processor.py (regex matching, deduplication, CSV export)
#   - search_query.py (query generation, keyword extraction)

📁 Project Structure

GitSentry/
├── app.py                 # Streamlit web UI
├── cli.py                 # Command-line interface
├── github_api.py          # GitHub API client (framework-agnostic)
├── thread_safe_api.py     # Thread-safe state bridge
├── config.py              # Token rotation and pool management
├── search_query.py        # Search query generation with GitHub qualifiers
├── token_patterns.py      # Pattern loader with metadata support
├── token_patterns.json    # 200+ regex patterns
├── result_processor.py    # Result processing, dedup, CSV/JSON export
├── entropy.py             # Shannon entropy analysis for secret detection
├── requirements.txt       # Python dependencies
├── .env.example           # Example environment configuration
├── conftest.py            # Pytest configuration
└── tests/                 # Unit test suite
    ├── test_entropy.py
    ├── test_result_processor.py
    └── test_search_query.py

📝 License

This project is licensed under the MIT License — see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please ensure any additions align with the ethical guidelines stated above.

👏 Acknowledgements

  • GitHub REST API for code search capabilities
  • Streamlit for the interactive web interface
  • All contributors to the token pattern database

About

Security tool that scans GitHub repositories for exposed API tokens and credentials using 200+ regex patterns. Features multi-threading, token rotation, and extended search capabilities to bypass API limits.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages