Skip to content

SamSi0322/ai-secrets-scan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ai-secrets-scan

The secret scanner built for the AI era.

Version Python License Dependencies

Detect exposed API keys, tokens, and credentials across AI projects, MCP configurations, and LLM pipelines -- with 52 purpose-built patterns, entropy analysis, and zero dependencies.


Why This Exists

The AI tooling ecosystem has a credential sprawl problem, and traditional secret scanners were not designed for it.

  • 81.5% year-over-year increase in secret exposure across public repositories (GitGuardian 2024 State of Secrets Sprawl)
  • 24,000+ exposed AI API keys discovered in public codebases in a single year -- each one a direct path to billing abuse, data exfiltration, or model poisoning
  • 36.7% of MCP server configurations are vulnerable to SSRF attacks, often through credentials embedded directly in config files
  • New key formats every month -- OpenRouter, Groq, Fireworks, LangSmith, Perplexity, DeepSeek -- traditional scanners lag behind by quarters

ai-secrets-scan was built to close this gap: a focused, dependency-free scanner that understands the AI ecosystem natively.


Quick Start

pip install ai-secrets-scan

ai-secrets-scan ./my-project

That's it. No configuration required. It scans your project, detects secrets, and reports findings with severity levels and fix suggestions.


Features

Detection

  • 52 AI-specific secret patterns covering 20+ providers (OpenAI, Anthropic, Google, Groq, Mistral, and more)
  • Entropy-based detection using Shannon entropy analysis to catch novel or unknown key formats
  • Context-aware matching to reduce false positives -- low-specificity patterns only fire near AI/LLM-related code
  • MCP config scanning across Claude Desktop, Cursor, and other MCP client locations

Workflow Integration

  • Pre-commit hook -- one command to block secrets before they reach your repository
  • GitHub Actions workflow generation with SARIF upload to the Security tab
  • GitLab CI config generation for merge request scanning
  • Baseline/allowlist management for incremental adoption in existing projects

Output

  • SARIF v2.1.0 output for GitHub Code Scanning, Azure DevOps, and VS Code
  • JSON output for programmatic consumption and CI pipelines
  • Color-coded terminal output with severity indicators and redacted previews

Design

  • Zero dependencies -- pure Python standard library, installs in seconds
  • Local-first -- your code never leaves your machine
  • Cross-platform -- Windows, macOS, Linux
  • .gitignore-aware -- respects your existing ignore rules automatically

Supported Providers

AI / LLM Providers

Provider Patterns Key Format
OpenAI 3 sk-..., sk-proj-..., org-...
Anthropic 1 sk-ant-...
Google AI / Vertex 3 AIza..., service account JSON, OAuth secrets
Mistral AI 1 MISTRAL_API_KEY=...
Groq 1 gsk_...
Together AI 1 TOGETHER_API_KEY=...
Fireworks AI 1 fw_...
Perplexity 1 pplx-...
OpenRouter 1 sk-or-v1-...
DeepSeek 1 DEEPSEEK_API_KEY=sk-...
Stability AI 1 Context-aware sk-... detection
ElevenLabs 1 ELEVENLABS_API_KEY=...
Cohere 1 Context-aware 40-char token
HuggingFace 1 hf_...
Replicate 1 r8_...

Cloud (AI Services)

Provider Patterns Key Format
AWS (Bedrock, SageMaker) 3 AKIA..., secret key, session token
Azure OpenAI 1 Context-aware 32-char hex key

Vector Databases

Provider Patterns Key Format
Pinecone 2 pcsk_..., legacy UUID format
Weaviate 1 WEAVIATE_API_KEY=...
Qdrant 1 QDRANT_API_KEY=...
Supabase (pgvector) 2 JWT service role key, anon key

ML Observability & Experiment Tracking

Provider Patterns Key Format
Weights & Biases 1 WANDB_API_KEY=...
LangSmith 1 lsv2_...
LangChain 1 ls__...
Neptune.ai 1 NEPTUNE_API_TOKEN=...
Arize 1 ARIZE_API_KEY=...

Communication

Provider Patterns Key Format
Slack 3 xoxb-..., xoxp-..., webhook URLs
Discord 2 Bot tokens, webhook URLs
Telegram 1 Bot tokens (context-aware)

Source Control

Provider Patterns Key Format
GitHub 4 ghp_..., github_pat_..., gho_..., ghu_/ghs_...
GitLab 1 glpat-...

Payments

Provider Patterns Key Format
Stripe 2 sk_live_..., pk_live_...

Observability

Provider Patterns Key Format
Datadog 1 DD_API_KEY=...
Sentry 1 DSN URLs

Generic

Pattern Severity
Generic API Key Assignment Medium
Bearer Token High
Base64 Encoded Secret Medium
Database Connection String Critical
Private Key (RSA/EC/DSA/OpenSSH) Critical
JWT Token Medium
Generic Password Assignment High

Usage Examples

Basic Scan

$ ai-secrets-scan ./my-project

πŸ” AI Secrets Scanner v0.2.0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Scanning: ./my-project (23 files)

πŸ”΄ CRITICAL  config/settings.py:14
   OpenAI API Key: sk-R****...Qx

🟠 HIGH  .env:7
   HuggingFace Token: hf_a****...2f

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Results: 2 secrets found
  πŸ”΄ Critical: 1  🟠 High: 1

Files scanned: 23 | Time: 0.1s

Scan MCP Configurations

Scan Claude Desktop, Cursor, and other MCP client configs for embedded secrets:

ai-secrets-scan --mcp

Combine with a project scan:

ai-secrets-scan ./my-project --mcp

Severity Filtering

Only report high and critical findings:

ai-secrets-scan ./my-project --min-severity high

JSON Output

ai-secrets-scan ./my-project --format json
{
  "version": "0.2.0",
  "scan_path": "./my-project",
  "files_scanned": 23,
  "total_findings": 2,
  "findings": [
    {
      "file": "config/settings.py",
      "line": 14,
      "pattern_name": "OpenAI API Key",
      "severity": "critical",
      "provider": "openai",
      "matched_text": "sk-R****...Qx",
      "suggestion": "Use os.environ.get(\"OPENAI_API_KEY\") instead.",
      "source": "pattern"
    }
  ]
}

SARIF Output

Generate SARIF for GitHub Security tab integration:

ai-secrets-scan ./my-project --format sarif > results.sarif

Baseline Workflow

Adopt the scanner in an existing project without drowning in known findings:

# Step 1: Save current findings as a baseline
ai-secrets-scan ./my-project --baseline-save .secrets-baseline.json

# Step 2: Subsequent scans only report NEW secrets
ai-secrets-scan ./my-project --baseline .secrets-baseline.json

Pre-commit Hook

Prevent secrets from ever being committed:

# Install directly into .git/hooks/
ai-secrets-scan hook --install

# Or generate config for the pre-commit framework
ai-secrets-scan hook --generate

The hook scans staged files and blocks the commit if secrets are detected.

CI Integration

Generate ready-to-use CI pipeline configurations:

# GitHub Actions (with SARIF upload)
ai-secrets-scan ci --github
ai-secrets-scan ci --github -o .github/workflows/secrets-scan.yml

# GitLab CI
ai-secrets-scan ci --gitlab
ai-secrets-scan ci --gitlab -o .gitlab-ci-secrets.yml

Fix Suggestions

Get actionable remediation advice for each finding:

ai-secrets-scan ./my-project --fix

File Type Filtering

Scan only specific file types:

ai-secrets-scan ./my-project --types env,python,yaml

Supported types: env, mcp, python, yaml, notebook, json, toml, config, docker, terraform


Comparison with Other Tools

Feature ai-secrets-scan GitGuardian TruffleHog detect-secrets Gitleaks
AI/LLM-specific patterns 52 ~10 ~5 ~5 ~5
MCP config awareness Yes No No No No
Entropy-based detection Yes Yes Yes Yes No
Baseline/allowlist Yes Paid No Yes No
Pre-commit hooks Yes Yes Yes Yes Yes
SARIF output Yes Yes Yes No Yes
GitHub Actions generation Yes N/A N/A N/A N/A
Fix suggestions Yes Paid No No No
Zero dependencies Yes No No No N/A (Go)
Local-only (no SaaS) Yes No Yes Yes Yes
Free & open source Yes Freemium Yes Yes Yes

Configuration

Create a config file to standardize settings across your team:

ai-secrets-scan init

This creates .ai-secrets-scan.yml:

# AI Secrets Scanner configuration
---

# Minimum severity to report: critical, high, medium, low
min_severity: low

# Directories to exclude (added to defaults: node_modules, .git, __pycache__, venv)
exclude:
  - vendor
  - third_party

# File types to scan (omit to scan all supported types)
# file_types:
#   - env
#   - python
#   - mcp
#   - yaml
#   - notebook
#   - json

# Custom patterns (in addition to built-in patterns)
# custom_patterns:
#   - name: "Internal Service Token"
#     regex: "svc_[a-zA-Z0-9]{32}"
#     severity: critical
#     provider: internal

The scanner auto-detects this file in the project root. Override with --config path/to/config.yml.


API Usage

Use ai-secrets-scan as a Python library for custom integrations:

from ai_secrets_scan import SecretScanner, Reporter

# Initialize the scanner
scanner = SecretScanner(
    min_severity="medium",
    enable_entropy=True,
)

# Scan a directory
findings = scanner.scan_path("./my-project")

# Scan MCP configurations
mcp_findings = scanner.scan_mcp_configs()

# Process findings programmatically
for finding in findings:
    print(f"[{finding.severity}] {finding.pattern_name}")
    print(f"  File: {finding.file}:{finding.line}")
    print(f"  Provider: {finding.provider}")
    print(f"  Source: {finding.source}")  # "pattern" or "entropy"

# Use the reporter for formatted output
reporter = Reporter(fmt="json", show_fix=True)
reporter.report(findings, files_scanned=scanner.files_scanned)

# Baseline management
from ai_secrets_scan import save_baseline, filter_new_findings

save_baseline(findings, ".secrets-baseline.json")
new_only = filter_new_findings(findings, ".secrets-baseline.json")

# Entropy analysis
from ai_secrets_scan import shannon_entropy

entropy = shannon_entropy("sk-proj-abc123def456ghi789")
print(f"Entropy: {entropy:.2f} bits/char")

Contributing

Contributions are welcome. Here's how to get started:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new patterns or features
  5. Submit a pull request

Adding a New Secret Pattern

Add entries to ai_secrets_scan/patterns.py:

{
    "name": "NewProvider API Key",
    "regex": r"npk_[a-zA-Z0-9]{32,}",
    "severity": "critical",
    "provider": "newprovider",
},

For patterns that could match non-AI contexts, add "context_required": True to limit matches to lines near AI/LLM-related keywords.

Running Tests

python -m pytest tests/ -v

License

MIT License. See LICENSE for details.


Built for developers building with AI. If this tool saved you from a credential leak, consider giving it a star.

About

Scan AI project configurations for exposed secrets. 50+ patterns, entropy detection, MCP scanning, SARIF output, pre-commit hooks. Zero dependencies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages