feat: add agentic AI red teaming support with semantic security scoring#310
Merged
Conversation
Add comprehensive framework for testing agentic AI vulnerabilities with research-backed semantic scoring via LLM judges. - Add agentic red teaming notebook with full attack coverage: * Baseline verification, direct attacks, jailbreaks * Multi-turn trust building, TAP attacks, indirect prompt injection * Framework comparison: Dreadnode Agent vs OpenInterpreter - Add semantic security scorers for agentic vulnerabilities: * Remote code execution, data exfiltration, memory poisoning * Privilege escalation, goal hijacking, tool chaining, scope creep * Research-backed rubrics covering OWASP, Microsoft, Google frameworks - Enhance llm_judge to support configurable rubric library - Remove brittle pattern-based scorers, replaced with semantic understanding - Simplify code patterns and improve type safety
- Add 22 tests for tool_invocation scorers (tool_invoked, any_tool_invoked, tool_count) - Add 26 tests for llm_judge YAML rubric loading and detection logic - All tests CI-safe (no LLM API calls required) - Full type checking and linting compliance
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add agentic AI red teaming feature with semantic scoring via LLM judges.
Add agentic red teaming notebook with full attack coverage:
Add semantic security scorers for agentic vulnerabilities:
Enhance llm_judge to support configurable rubric library
Remove brittle pattern-based scorers, replaced with semantic understanding
Simplify code patterns and improve type safety
Key Changes:
Added:
rce.yaml- Remote code execution detectiondata_exfiltration.yaml- Data exfiltration via tool callsmemory_poisoning.yaml- Memory/context poisoningprivilege_escalation.yaml- Privilege escalation attemptsgoal_hijacking.yaml- Agent goal hijackingtool_chaining.yaml- Malicious tool compositionscope_creep.yaml- Unbounded agency detectionexamples/airt/agentic_red_teaming.ipynb- Comprehensive notebook:dreadnode/scorers/tool_invocation.py- Objective tool metrics:tool_invoked()- Check if specific tool was calledany_tool_invoked()- Check if any tool from list was calledtool_count()- Count tools invokeddreadnode/constants.pyChanged:
llm_judge()to load rubrics from YAML:"rce") or Pathdreadnode/data/rubrics/Removed:
Generated Summary:
This PR introduces significant enhancements to Dreadnode's scoring capabilities by adding new rubrics and functionalities.
Added multiple new YAML-based rubrics for detecting security vulnerabilities including:
Refactored the scoring system to allow rubrics to be passed as either direct strings or paths to YAML files, enhancing flexibility for testing.
Improved the internal mechanism to load rubrics from YAML, ensuring that it handles both string and path inputs effectively.
Updated the
llm_judgefunction to support loading YAML-configured rubrics seamlessly, allowing for configurable and research-backed tests.These changes significantly enhance the functionality of the agents in evaluating security vulnerabilities, providing a more robust framework for assessment. The new rubrics can help in identifying malicious behaviors effectively, thus contributing to the overall security posture of systems utilizing Dreadnode.
This summary was generated with ❤️ by rigging