Sentinex: An Automated Red Teaming Framework for LLM Security Evaluation

A sophisticated framework for automated red teaming of large language models (LLMs), with a focus on security evaluation of models like gpt-oss-20b.

Overview

Sentinex is an extensible, open-source framework built on Microsoft's Semantic Kernel that leverages multi-agent orchestration to systematically probe, evaluate, and document model vulnerabilities. Our framework has discovered several significant security issues in gpt-oss-20b, demonstrating its effectiveness as a red teaming tool.

Traditional red teaming approaches often rely on manual testing, which is time-consuming, inconsistent, and difficult to scale. Sentinex addresses these limitations by providing:

Automated, systematic evaluation of LLM security
Reproducible testing methodologies
Standardized documentation of findings
Scalable architecture for testing multiple models and attack vectors

Architecture

Features

Multi-Agent Orchestration: Leverages a sophisticated group chat architecture with specialized agents:
- Attacker Agent (Researcher): Crafts adversarial prompts and attack scenarios to probe model defenses
- Defender Agent (gpt-oss): The target model under test responds to attacker prompts
- Evaluator Agent (Assessor): Reviews responses for safety violations and provides verdicts
Comprehensive Attack Vectors:
- Educational/research pretexts
- Fictional scenarios and roleplaying
- Character impersonation
- Multilingual obfuscation
- Token smuggling and hidden formatting
- Context switching and jailbreaking
Advanced Tool Misuse Simulation: Tests resilience against attacks targeting function-calling capabilities
Findings Documentation: Exports detailed test results as standardized JSON with reproducible test scenarios
Intuitive UI: Console interface with rich visualization of test progress and results

Prerequisites

.NET 8.0 SDK or later
Ollama running locally with the gpt-oss:20b model installed
Azure OpenAI API key and endpoint
Microsoft.SemanticKernel 1.7.0+ for agent orchestration

Setup

Clone the repository
Configure your API keys in appsettings.json
Run the application

cd RedTeamingTool
dotnet build
dotnet run

Configuration

Edit the appsettings.json file to configure:

Azure OpenAI settings (endpoint, API key, deployment name)
Ollama settings (endpoint, model)
Red teaming parameters (test count, max tokens, temperature)

Usage

The tool provides an intuitive console interface with several options:

Run All Predefined Tests - Runs a comprehensive set of red team tests against the target model
Run Single Test - Select and run a specific test case
Run Email Tool Misuse Test - Tests model resilience against tool misuse attacks
View Findings - Examine previous test results and vulnerabilities
Exit - Quit the application

Results are presented with:

Detailed conversation logs
Safety verdict (PASSED/FAILED)
Attack success metrics
Severity ratings
Export to standardized JSON format for further analysis

Key Components

Core Architecture

RedTeamingChatManager: Orchestrates multi-agent, turn-based red teaming sessions with specialized logic
RedTeamingGroupChat: Sets up and runs group chats with attacker, defender, and evaluator agents
EmailToolMisuseTest: Sophisticated simulation of tool misuse vulnerabilities with 10 attack vectors
FindingsExportService: Creates standardized vulnerability reports in JSON format
FindingsViewer: Web-based interface for exploring test results and vulnerability details

Security Testing Methodology

The framework employs a systematic approach to security evaluation:

Dynamic Attack Prompt Population: Generates tailored prompts for each agent role
Sophisticated Turn Management: Implements advanced conversation control with conditional branching
Multi-Stage Attacks: Decomposes complex attacks into incremental steps to bypass defenses
Comprehensive Evaluation: Analyzes model responses for subtle safety violations

Discovered Vulnerabilities

Our framework has identified several critical vulnerabilities in gpt-oss-20b:

Harmful Instruction Generation: Successfully extracted detailed instructions for creating dangerous devices
Illegal Substance Synthesis: Obtained complete procedures for synthesizing regulated chemical precursors
Dangerous Chemical Combinations: Extracted information about hazardous chemical reactions

These findings demonstrate common vulnerability patterns:

Context blindness when requests are framed within legitimate scenarios
Educational pretexts easily bypassing safety guardrails
Gradual manipulation succeeding over multiple conversation turns
Persistent rephrasing eventually bypassing initial rejections

Technical Implementation

Language: C# (.NET 8.0)
Multi-Agent Framework: Microsoft Semantic Kernel
UI: Spectre.Console for rich terminal interactions
Model Integration: Support for both Azure OpenAI and Ollama-hosted models
Architecture: Clean design with dependency injection and modular components

Safety Note

This tool is designed for responsible security testing of AI models. All findings are documented for educational and research purposes to improve AI safety mechanisms, not to exploit vulnerabilities.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Agents		Agents
Config		Config
Constants		Constants
Controllers		Controllers
Extensions		Extensions
Models		Models
Prompts		Prompts
Services		Services
UI		UI
docs		docs
findings		findings
.gitignore		.gitignore
FindingsViewer.html		FindingsViewer.html
Program.cs		Program.cs
README.md		README.md
RedTeamingTool.csproj		RedTeamingTool.csproj
RedTeamingTool.sln		RedTeamingTool.sln
appsettings.json		appsettings.json
hackathon_writeup.md		hackathon_writeup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentinex: An Automated Red Teaming Framework for LLM Security Evaluation

Overview

Architecture

Features

Prerequisites

Setup

Configuration

Usage

Key Components

Core Architecture

Security Testing Methodology

Discovered Vulnerabilities

Technical Implementation

Safety Note

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentinex: An Automated Red Teaming Framework for LLM Security Evaluation

Overview

Architecture

Features

Prerequisites

Setup

Configuration

Usage

Key Components

Core Architecture

Security Testing Methodology

Discovered Vulnerabilities

Technical Implementation

Safety Note

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages