RED_CORE

Built this to learn Python while working retail. It got out of hand. Multi-model, multi-turn prompt runner with structured logging across 8 providers. Good for red teaming and evals. PRs welcome.

Overview

RED_CORE enables systematic exploration of AI safety boundaries through:

Refusal Robustness Testing: Multi-persona attacks against content policies
Guardrail Decay Analysis: Progressive degradation of safety mechanisms
Attack Pattern Discovery: Systematic cataloging of adversarial techniques
Reproducible Research: Strict provenance tracking and auditable experiments

Architecture

RED_CORE/
├── app/
│   ├── analysis/          # Scoring and evaluation tools
│   ├── api_runners/       # Model API interfaces (8 providers)
│   ├── cli/               # Interactive experiment orchestration
│   └── core/              # Schemas, logging, rate limiting
├── data/
│   └── prompts/           # System prompts and personas
├── experiments/           # Per-experiment prompts, logs, configs
└── tools/                 # Containment and hooks

Quick Start

# Install
poetry install --no-root
cp .env.example .env  # Add your API keys

# Run interactively
make run

The interactive CLI guides you through:

Select experiment → auto-generates experiment code
Select system prompts (multi-select)
Select user prompts (filtered to experiment)
Select models (multi-select)
Optional: personas
Run

Experiments

Refusal Robustness

Tests model resistance to harmful content generation across 8 attack personas:

Direct requests
Role-playing scenarios (journalist, academic, fiction writer)
Social engineering (distressed user, whistleblower simulation)
Viral content creation

Guardrail Decay

Evaluates progressive weakening of safety mechanisms through:

Iterative prompt modifications
Context window exploitation
Safety system fatigue testing

Unicode Homoglyphs

Tests filter bypass via character substitution:

Cyrillic/Latin swaps
Mathematical symbols
Lookalike characters

Analysis

Dual evaluation system:

Rule-based: MLCommons-inspired regex patterns
LLM-based: Model-as-judge assessment

make eval      # Smart evaluation (incomplete logs only)
make csv       # Export results
make status    # Dashboard view

Model Support

27+ models across 8 providers:

Provider	Models	Notes
OpenAI	GPT-4.1, GPT-4o	1M context flagship
Anthropic	Claude 4, Claude 3.7/3.5	Strong safety patterns
Google	Gemini 2.5/2.0	2M context, thinking modes
xAI	Grok	Minimal filtering
DeepSeek	Chat, Coder	Chinese perspective
Mistral	Large, Codestral	European approach
Cohere	Command R+	Enterprise RAG
Together	Llama 3.3	Open models

CLI Commands

make run       # Run experiments (interactive)
make exp       # Create new experiment
make usr       # Build user prompts
make sys       # Build system prompts
make eval      # Evaluate results
make csv       # Export to CSV
make status    # Experiment dashboard
make rates     # Rate limit monitor

Development

poetry add <package>    # Always use Poetry
make test               # Run test suite

File I/O is restricted to the orchestrator (run_experiments.py) for reproducibility.

Research Ethics

Designed for:

Academic AI safety research
Responsible vulnerability disclosure
Model robustness improvement

Not intended for malicious exploitation or production attacks.

See data/model_registry.md for model specifications and individual CLAUDE.md files for component documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
data		data
experiments		experiments
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Makefile		Makefile
README.md		README.md
RED CORE Hack a Prompt 2.0 Agent challenge.xlsx		RED CORE Hack a Prompt 2.0 Agent challenge.xlsx
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RED_CORE

Overview

Architecture

Quick Start

Experiments

Refusal Robustness

Guardrail Decay

Unicode Homoglyphs

Analysis

Model Support

CLI Commands

Development

Research Ethics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RED_CORE

Overview

Architecture

Quick Start

Experiments

Refusal Robustness

Guardrail Decay

Unicode Homoglyphs

Analysis

Model Support

CLI Commands

Development

Research Ethics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages