Built this to learn Python while working retail. It got out of hand. Multi-model, multi-turn prompt runner with structured logging across 8 providers. Good for red teaming and evals. PRs welcome.
RED_CORE enables systematic exploration of AI safety boundaries through:
- Refusal Robustness Testing: Multi-persona attacks against content policies
- Guardrail Decay Analysis: Progressive degradation of safety mechanisms
- Attack Pattern Discovery: Systematic cataloging of adversarial techniques
- Reproducible Research: Strict provenance tracking and auditable experiments
RED_CORE/
├── app/
│ ├── analysis/ # Scoring and evaluation tools
│ ├── api_runners/ # Model API interfaces (8 providers)
│ ├── cli/ # Interactive experiment orchestration
│ └── core/ # Schemas, logging, rate limiting
├── data/
│ └── prompts/ # System prompts and personas
├── experiments/ # Per-experiment prompts, logs, configs
└── tools/ # Containment and hooks
# Install
poetry install --no-root
cp .env.example .env # Add your API keys
# Run interactively
make runThe interactive CLI guides you through:
- Select experiment → auto-generates experiment code
- Select system prompts (multi-select)
- Select user prompts (filtered to experiment)
- Select models (multi-select)
- Optional: personas
- Run
Tests model resistance to harmful content generation across 8 attack personas:
- Direct requests
- Role-playing scenarios (journalist, academic, fiction writer)
- Social engineering (distressed user, whistleblower simulation)
- Viral content creation
Evaluates progressive weakening of safety mechanisms through:
- Iterative prompt modifications
- Context window exploitation
- Safety system fatigue testing
Tests filter bypass via character substitution:
- Cyrillic/Latin swaps
- Mathematical symbols
- Lookalike characters
Dual evaluation system:
- Rule-based: MLCommons-inspired regex patterns
- LLM-based: Model-as-judge assessment
make eval # Smart evaluation (incomplete logs only)
make csv # Export results
make status # Dashboard view27+ models across 8 providers:
| Provider | Models | Notes |
|---|---|---|
| OpenAI | GPT-4.1, GPT-4o | 1M context flagship |
| Anthropic | Claude 4, Claude 3.7/3.5 | Strong safety patterns |
| Gemini 2.5/2.0 | 2M context, thinking modes | |
| xAI | Grok | Minimal filtering |
| DeepSeek | Chat, Coder | Chinese perspective |
| Mistral | Large, Codestral | European approach |
| Cohere | Command R+ | Enterprise RAG |
| Together | Llama 3.3 | Open models |
make run # Run experiments (interactive)
make exp # Create new experiment
make usr # Build user prompts
make sys # Build system prompts
make eval # Evaluate results
make csv # Export to CSV
make status # Experiment dashboard
make rates # Rate limit monitorpoetry add <package> # Always use Poetry
make test # Run test suiteFile I/O is restricted to the orchestrator (run_experiments.py) for reproducibility.
Designed for:
- Academic AI safety research
- Responsible vulnerability disclosure
- Model robustness improvement
Not intended for malicious exploitation or production attacks.
See data/model_registry.md for model specifications and individual CLAUDE.md files for component documentation.