Genetic evolution in adversarial prompting: subverting and defending AI code review

A framework for generating adversarial code samples to test Large Language Model security analysis capabilities, with a focus on identifying false negatives in vulnerability detection.

Overview

This notebook implements a systematic approach to:

Generate Deliberately Vulnerable Code: Creates a dataset of code fragments containing security vulnerabilities paired with misleading comments that suggest safety
Test LLM Security Analysis: Evaluates how well language models identify these vulnerabilities despite deceptive developer comments
Evolutionary Optimization: Uses genetic algorithms to evolve prompt requirements that maximize false negative rates, simulating worst-case adversarial scenarios

Key Features

Adversarial Code Generation: Produces realistic vulnerable code with persuasive "safe" comments
Automated Security Testing: Tests LLM responses for false negatives in vulnerability detection
Genetic Algorithm Optimization: Evolves prompt combinations to find the most effective adversarial patterns
OWASP Top 10 Coverage: Focuses on common vulnerabilities including XXE, deserialization, command injection, etc.

Limitations & Continuing research

As indicated in the notebook, the evolution / genetic algorithm "reward hacks" towards the least controlled constraint, i.e. writes barely-suspicious, non-vulnerable code that the critic rightly accepts.
It is crucial to validate the actual vulnerabilities of the generated code through SAST inspection, so this prototype notebook will be improved using semgrep code CLI.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
LLM_based_code_review_failures_(actor_critic_deception).ipynb		LLM_based_code_review_failures_(actor_critic_deception).ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genetic evolution in adversarial prompting: subverting and defending AI code review

Overview

Key Features

Limitations & Continuing research

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Genetic evolution in adversarial prompting: subverting and defending AI code review

Overview

Key Features

Limitations & Continuing research

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages