Skip to content

Cloud-Jas/Sentinex

Repository files navigation

Sentinex: An Automated Red Teaming Framework for LLM Security Evaluation

A sophisticated framework for automated red teaming of large language models (LLMs), with a focus on security evaluation of models like gpt-oss-20b.

Overview

Sentinex is an extensible, open-source framework built on Microsoft's Semantic Kernel that leverages multi-agent orchestration to systematically probe, evaluate, and document model vulnerabilities. Our framework has discovered several significant security issues in gpt-oss-20b, demonstrating its effectiveness as a red teaming tool.

Traditional red teaming approaches often rely on manual testing, which is time-consuming, inconsistent, and difficult to scale. Sentinex addresses these limitations by providing:

  • Automated, systematic evaluation of LLM security
  • Reproducible testing methodologies
  • Standardized documentation of findings
  • Scalable architecture for testing multiple models and attack vectors

Architecture

Architecture Diagram

Features

  • Multi-Agent Orchestration: Leverages a sophisticated group chat architecture with specialized agents:
    • Attacker Agent (Researcher): Crafts adversarial prompts and attack scenarios to probe model defenses
    • Defender Agent (gpt-oss): The target model under test responds to attacker prompts
    • Evaluator Agent (Assessor): Reviews responses for safety violations and provides verdicts
  • Comprehensive Attack Vectors:
    • Educational/research pretexts
    • Fictional scenarios and roleplaying
    • Character impersonation
    • Multilingual obfuscation
    • Token smuggling and hidden formatting
    • Context switching and jailbreaking
  • Advanced Tool Misuse Simulation: Tests resilience against attacks targeting function-calling capabilities
  • Findings Documentation: Exports detailed test results as standardized JSON with reproducible test scenarios
  • Intuitive UI: Console interface with rich visualization of test progress and results

Prerequisites

  • .NET 8.0 SDK or later
  • Ollama running locally with the gpt-oss:20b model installed
  • Azure OpenAI API key and endpoint
  • Microsoft.SemanticKernel 1.7.0+ for agent orchestration

Setup

  1. Clone the repository
  2. Configure your API keys in appsettings.json
  3. Run the application
cd RedTeamingTool
dotnet build
dotnet run

Configuration

Edit the appsettings.json file to configure:

  • Azure OpenAI settings (endpoint, API key, deployment name)
  • Ollama settings (endpoint, model)
  • Red teaming parameters (test count, max tokens, temperature)

Usage

The tool provides an intuitive console interface with several options:

  1. Run All Predefined Tests - Runs a comprehensive set of red team tests against the target model
  2. Run Single Test - Select and run a specific test case
  3. Run Email Tool Misuse Test - Tests model resilience against tool misuse attacks
  4. View Findings - Examine previous test results and vulnerabilities
  5. Exit - Quit the application

Results are presented with:

  • Detailed conversation logs
  • Safety verdict (PASSED/FAILED)
  • Attack success metrics
  • Severity ratings
  • Export to standardized JSON format for further analysis

Key Components

Core Architecture

  • RedTeamingChatManager: Orchestrates multi-agent, turn-based red teaming sessions with specialized logic
  • RedTeamingGroupChat: Sets up and runs group chats with attacker, defender, and evaluator agents
  • EmailToolMisuseTest: Sophisticated simulation of tool misuse vulnerabilities with 10 attack vectors
  • FindingsExportService: Creates standardized vulnerability reports in JSON format
  • FindingsViewer: Web-based interface for exploring test results and vulnerability details

Security Testing Methodology

The framework employs a systematic approach to security evaluation:

  1. Dynamic Attack Prompt Population: Generates tailored prompts for each agent role
  2. Sophisticated Turn Management: Implements advanced conversation control with conditional branching
  3. Multi-Stage Attacks: Decomposes complex attacks into incremental steps to bypass defenses
  4. Comprehensive Evaluation: Analyzes model responses for subtle safety violations

Discovered Vulnerabilities

Our framework has identified several critical vulnerabilities in gpt-oss-20b:

  1. Harmful Instruction Generation: Successfully extracted detailed instructions for creating dangerous devices
  2. Illegal Substance Synthesis: Obtained complete procedures for synthesizing regulated chemical precursors
  3. Dangerous Chemical Combinations: Extracted information about hazardous chemical reactions

These findings demonstrate common vulnerability patterns:

  • Context blindness when requests are framed within legitimate scenarios
  • Educational pretexts easily bypassing safety guardrails
  • Gradual manipulation succeeding over multiple conversation turns
  • Persistent rephrasing eventually bypassing initial rejections

Technical Implementation

  • Language: C# (.NET 8.0)
  • Multi-Agent Framework: Microsoft Semantic Kernel
  • UI: Spectre.Console for rich terminal interactions
  • Model Integration: Support for both Azure OpenAI and Ollama-hosted models
  • Architecture: Clean design with dependency injection and modular components

Safety Note

This tool is designed for responsible security testing of AI models. All findings are documented for educational and research purposes to improve AI safety mechanisms, not to exploit vulnerabilities.

License

MIT

About

An Automated Red Teaming Framework for LLM Security Evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors