Skip to content

wbopan/liteevo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LiteEvo

LiteEvolve Banner

Python 3.12+ License: MIT GitHub Stars

Use Self-Evolution Finetuning to supercharge your LLM with only 1 prompt!

The Problem and What is LiteEvo?

  • LLM Agents are powerful but sometimes still need explicit learning to solve unfamiliar problems. However, finetuning and prompt engineering are expensive and time-consuming. Self-Evolution is an easy and powerful way to let an LLM teach itself and evolve to be better at your tasks.

  • LiteEvo is an easy-to-use tool for self-evolution tuning of your models. Instead of tuning model weights, it lets the model repeatedly attempt your task and evolve a playbook (skill) — a structured guidance document that captures successful strategies, common pitfalls, and learned heuristics.

  • LiteEvo Self-Evolution only requires an agent (Claude Code, Codex, or any LLM), 1–10 tasks, and <10 minutes to finetune a playbook that boosts LLM agent performance!

  • Over iterations, the playbook accumulates distilled knowledge that dramatically improves task performance — without changing a single model parameter.

LiteEvolve Illustration

One-minute Quickstart with LiteEvo

Prerequisites:

  • Python package manager uv to run the evolution.
  • A working LLM agent. This can be any CLI agent (supporting claude, codex, and gemini-cli) or an OpenAI-compatible API.
  • A specific task you want the LLM to be good at, with a criterion for what kind of output is expected.
# Install
git clone https://github.com/wbopan/liteevo.git && cd liteevo

# Evolve Claude Code
uv run evolve \
  --provider claude \
  --task "Generate a random number from 1-100" \
  --criterion "The number should be uniformly random" \
  --step-size 15 \
  --batch-size 5

Watch as the model discovers that it needs actual randomization strategies instead of picking "random-looking" numbers like 37 and 42!

LiteEvolve Task Comparison

How It Works

Core Concepts

Playbook: A structured JSON document containing guidelines, workflows, checklists, and examples. The model updates this document as it learns from successes and failures.

Evolution Loop: The iterative process of:

  1. Generate response using current playbook
  2. Accumulate results into batches
  3. Reflect on batch performance against criteria
  4. Update playbook with learned insights

Criteria: Success conditions that define what "good" looks like for each task. These drive the model's self-reflection.

Hyperparameters

Parameter Description Typical Value
step-size Total number of generation iterations 10-20
batch-size Steps between playbook updates 3-5
playbook schema Structure defining the playbook format See below
update prompt Template for playbook update instructions Customizable

Trade-offs:

  • Smaller batch → faster iteration, but risk of overfitting to individual examples
  • Larger batch → more stable learning, but slower improvement cycles

Playbook Schema

The playbook is a commented JSON structure:

{
  "playbook_version": 0,
  "title": "Task Strategy Guide",
  "description": "Brief description of what this playbook helps with",
  "sections": {
    "guidelines": [
      "General tips and principles (max 10)"
    ],
    "workflow": [
      "Step-by-step process (max 10)"
    ],
    "checklist": [
      "Items to verify before completing (max 10)"
    ],
    "examples": [
      "Illustrative examples (max 10)"
    ]
  },
  "logs": [
    "v0: Initial. v1: Added randomization strategy",
    "v1: Found pattern. v2: Added edge case handling"
  ]
}

Project Structure

liteevo/
├── prompts/
│   ├── UPDATE_PLAYBOOK.jinja2    # Playbook update template
│   ├── GENERATE_ANSWER.jinja2    # Generation template
│   └── PLAYBOOK_SCHEMA.txt       # Default playbook schema
├── src/liteevolve/
│   ├── __init__.py
│   ├── evolve.py                 # Core evolution loop
│   ├── provider.py               # LLM provider implementations
│   └── cli.py                    # Command-line interface
└── pyproject.toml

CLI Reference

uv run evolve [OPTIONS]

Required Options

Option Description
--provider LLM provider: claude, codex, gemini, openai, or cli
--task or --tasks Single task string or glob pattern for task files
--criterion or --criteria Single criterion string or glob pattern for criteria files

Optional Options

Option Default Description
--provider-args - Provider-specific arguments
--output-dir outputs/YYYY-MM-DD-HHMMSS/ Output directory
--step-size 10 Number of evolution steps
--batch-size 3 Steps per playbook update
--prompt-update-playbook prompts/UPDATE_PLAYBOOK.jinja2 Update template path
--prompt-generate-answer prompts/GENERATE_ANSWER.jinja2 Generation template path
--schema-playbook prompts/PLAYBOOK_SCHEMA.txt Playbook schema path

Provider Support

LiteEvolve supports multiple LLM providers out of the box:

Claude (Anthropic)

Uses Claude Code CLI. No API key required if Claude Code is already configured.

uv run evolve --provider claude \
  --task "Your task here" \
  --criterion "Success criterion"

With custom arguments:

uv run evolve --provider claude \
  --provider-args "--model claude-sonnet-4-20250514" \
  --task "Your task" --criterion "Criterion"

OpenAI / OpenAI-Compatible APIs

Supports OpenAI API and any compatible endpoint (Azure, local models, etc.).

# Using environment variable
export OPENAI_API_KEY="sk-..."
uv run evolve --provider openai \
  --provider-args "model=gpt-4" \
  --task "Your task" --criterion "Criterion"

# With explicit API key
uv run evolve --provider openai \
  --provider-args "model=gpt-4,api_key=sk-..." \
  --task "Your task" --criterion "Criterion"

# With custom endpoint (e.g., Azure, local)
uv run evolve --provider openai \
  --provider-args "model=gpt-4,base_url=https://your-endpoint.com/v1,api_key=..." \
  --task "Your task" --criterion "Criterion"

# With temperature control
uv run evolve --provider openai \
  --provider-args "model=gpt-4,temperature=0.5" \
  --task "Your task" --criterion "Criterion"

OpenAI provider-args format: key1=value1,key2=value2,...

Key Required Description
model Yes Model name (e.g., gpt-4, gpt-4o)
api_key No* API key (*uses OPENAI_API_KEY env var if not set)
base_url No Custom API endpoint (default: OpenAI)
temperature No Sampling temperature (default: 0.7)

Gemini (Google)

Uses Gemini CLI.

uv run evolve --provider gemini \
  --task "Your task" --criterion "Criterion"

Codex (OpenAI)

Uses Codex CLI (codex exec).

uv run evolve --provider codex \
  --task "Your task" --criterion "Criterion"

Custom CLI

Use any CLI tool that accepts a prompt and returns output to stdout.

uv run evolve --provider cli \
  --provider-args "/path/to/your/llm-cli" \
  --task "Your task" --criterion "Criterion"

The custom CLI must:

  • Accept a prompt as a command-line argument
  • Return the response to stdout
  • Return exit code 0 on success

Usage Examples

Single Task Evolution

Evolve a playbook for a single task with repeated iterations:

uv run evolve --provider claude \
  --task "Write a haiku about programming" \
  --criterion "Must follow 5-7-5 syllable structure strictly" \
  --step-size 15 --batch-size 3

Multi-Task Evolution

Use glob patterns to load multiple tasks and criteria:

# Assuming examples/ascii_digit/tasks/001.txt, 002.txt, ... and matching criteria files
uv run evolve --provider openai \
  --provider-args "model=gpt-4" \
  --tasks "examples/ascii_digit/tasks/*.txt" \
  --criteria "examples/ascii_digit/criteria/*.txt" \
  --step-size 30 --batch-size 5

Custom Templates

Use your own prompt templates:

uv run evolve --provider claude \
  --task "Solve math problems" \
  --criterion "Answer must be correct" \
  --prompt-update-playbook "./my-update-template.jinja2" \
  --prompt-generate-answer "./my-generate-template.jinja2" \
  --schema-playbook "./my-schema.txt"

Output Structure

After evolution, the output directory contains:

outputs/2024-01-15-143022/
├── playbooks/
│   ├── playbook_v1.txt
│   ├── playbook_v2.txt
│   └── playbook_v3.txt      # Final evolved playbook
└── generations/
    ├── 000_task000_v0.txt   # Step 0, task 0, playbook v0
    ├── 001_task000_v0.txt
    ├── 002_task000_v0.txt
    ├── playbook_v1.txt      # Full update response for v1
    ├── 003_task000_v1.txt   # Now using playbook v1
    └── ...

Template Variables

When customizing templates, the following variables are available:

Variable Type Description
config EvolutionConfig Contains step_size, batch_size, etc.
step_id int Current step (0-indexed)
tasks list[str] All task inputs
generations list[str] All generated outputs so far
criteria list[str] All success criteria
playbooks list[str] All playbook versions
current_task str Current task being processed
current_criterion str Current task's criterion
current_playbook str Latest playbook version

Notes

  • Playbook extraction looks for the last ```json or ```jsonc code block in the update response
  • Claude provider uses claude -p "prompt" to generate responses
  • Task and criteria counts must match when using glob patterns

About

Self-evolve your LLM's capabilities with just a task and a success criterion.

Topics

Resources

License

Stars

Watchers

Forks

Contributors