AlphaStack

AI-powered project generator that transforms natural language descriptions into complete, production-ready codebases, validated inside a CubeSandbox microVM.

📄 Paper submitted to ICML 2026
A novel approach to autonomous code generation using multi-agent systems with iterative self-healing and comprehensive validation across diverse programming paradigms.

🎯 Key Features

Intelligent Multi-Agent Architecture

Planning Agent: Analyzes errors and generates comprehensive fix strategies using tool-augmented reasoning
Correction Agent: Executes fixes with code understanding and validation
Iterative Self-Healing: Automatically detects and resolves dependency conflicts, build errors, and test failures

Comprehensive Code Generation

Natural language to production-ready code
Multi-file project generation with proper structure
Support for modern languages and frameworks
Intelligent dependency resolution
Best practices and design patterns

CubeSandbox-Based Validation

Every shell command from the planner runs inside a CubeSandbox microVM
Project tree mirrored to /workspace on session start; subsequent edits are written through
Drop-in e2b_code_interpreter SDK — no Docker daemon required
Sandbox is killed automatically when the pipeline exits

Extensive Evaluation Framework

40 Programming Challenges across 4 languages:
- CUDA: GPU computing and parallel algorithms (10 challenges)
- Go: Concurrent systems and distributed computing (10 challenges)
- Rust: Memory-safe systems programming (10 challenges)
- TypeScript: Type-safe applications and frameworks (10 challenges)
4-Tier Difficulty System: From fundamentals to production systems
Comprehensive benchmarking and metrics collection

How It Works

graptLR:> LR
    A[Natural Language Input] --> B[AI Analysis & Blueprint]
    B --> C[Multi-File Code Generation]
    C --> D[Dependency Resolution]
    D --> E[CubeSandbox Provisioning]
    E --> F[Build Validation]
    F --> G{Build Success?}
    G -->|No| H[Planning Agent]
    H --> I[Correction Agent]
    I --> F
    G -->|Yes| J[Test Execution]
    J --> K{Tests Pass?}
    K -->|No| H
    K -->|Yes| L[Production-Ready Project]

    style A fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff
    style B fill:#9B59B6,stroke:#6C3483,stroke-width:2px,color:#fff
    style C fill:#E67E22,stroke:#A04000,stroke-width:2px,color:#fff
    style D fill:#3498DB,stroke:#1F618D,stroke-width:2px,color:#fff
    style E fill:#1ABC9C,stroke:#117A65,stroke-width:2px,color:#fff
    style F fill:#E74C3C,stroke:#922B21,stroke-width:2px,color:#fff
    style L fill:#27AE60,stroke:#186A3B,stroke-width:2px,color:#fff

Architecture Components

Core Generation Pipeline:

Blueprint Generation: Analyzes requirements and creates software architecture
Folder Structure: Generates project hierarchy with proper organization
File Generation: Creates all necessary files with content (source, config, tests, docs)
Metadata Management: Tracks dependencies, entry points, and test commands

Intelligent Error Resolution:

Error Tracking: Monitors all errors across build and test phases
Tool-Augmented Planning: Uses file operations, command execution, and analysis tools
Context-Aware Fixes: Understands project structure and dependencies
Iterative Refinement: Continues until success or max iterations reached

Validation & Testing:

CubeSandbox Isolation: Every shell command runs inside a microVM with /workspace mirroring the project tree
Command Detection: Automatically identifies build/test commands
Log Analysis: Extracts and analyzes error messages
Success Verification: Validates complete pipeline execution

Installation

Requirements:

Python 3.9+
Google Gemini API Key
CubeSandbox (optional, for sandboxed validation)

# Clone and install
git clone https://github.com/HyperKuvid-Labs/alpha-stack.git
cd alpha-stack
pip install .

# Configure API key
alphastack setup

CubeSandbox Installation (Recommended):

# One-click install of the local CubeSandbox stack
curl -sL https://github.com/tencentcloud/CubeSandbox/raw/master/deploy/one-click/online-install.sh | bash

# Create a sandbox template based on the official code-runner image
cubemastercli tpl create-from-image \
    --image ccr.ccs.tencentyun.com/ags-image/sandbox-code:latest \
    --writable-layer-size 1G

# Wire the template id into AlphaStack (overrideable via env vars)
alphastack sandbox --template-id <id>

Sandbox environment variables:

Variable	Default	Purpose
`E2B_API_URL`	`http://127.0.0.1:3000`	CubeSandbox API endpoint
`E2B_API_KEY`	`dummy`	API key (CubeSandbox local mode does not enforce)
`CUBE_TEMPLATE_ID`	(unset)	Template id for the sandbox image

If CUBE_TEMPLATE_ID (or the saved config) is missing, the testing pipeline falls back to running shell commands directly on the host and prints instructions on how to set it up.

Usage

Interactive Mode:

alphastack
# Follow the interactive prompts to generate your project

Command Line:

# Generate a project
alphastack generate "A Flask REST API with user authentication and JWT tokens"

# Specify output directory
alphastack generate "Python CLI tool for file processing" -o /path/to/output

# Generate with custom name
alphastack generate "React TypeScript dashboard with charts"

# List generated projects
alphastack list

# Clean up projects
alphastack clean

Example Projects:

# Web Applications
alphastack generate "Express.js REST API with MongoDB and authentication"
alphastack generate "FastAPI service with PostgreSQL and async operations"

# CLI Tools
alphastack generate "Python CLI tool for image compression with progress bar"
alphastack generate "Go CLI for log analysis with concurrent processing"

# Data Processing
alphastack generate "Rust program for parallel CSV processing"
alphastack generate "Python script for web scraping with retry logic"

# System Programming
alphastack generate "CUDA kernel for matrix multiplication optimization"
alphastack generate "Go service with gRPC and protocol buffers"

🔬 Research & Evaluation

Evaluation Suite

AlphaStack includes a comprehensive evaluation framework with 40 carefully designed programming challenges across 4 modern languages, organized into 4 difficulty tiers:

CUDA (GPU Computing)

Focus: Parallel computing, memory management, kernel optimization
Challenges: Vector operations → Matrix operations → Sparse algorithms → Ray tracing engines
Tier 4 Example: Ray tracing engine with BVH acceleration structure

Go (Concurrent Systems)

Focus: Distributed systems, goroutines, channels, service architecture
Challenges: Worker pools → REST APIs → Load balancers → Raft consensus
Tier 4 Example: Full Raft consensus protocol implementation

Rust (Systems Programming)

Focus: Memory safety, ownership, lifetimes, zero-cost abstractions
Challenges: Custom iterators → HTTP parsers → Procedural macros → Custom allocators
Tier 4 Example: Custom bump allocator as global allocator with FFI

TypeScript (Type-Safe Applications)

Focus: Type system, generics, inference, compile-time safety
Challenges: Event emitters → Type-safe routers → DI containers → Full-stack RPC
Tier 4 Example: End-to-end type-safe RPC framework with inference

Difficulty Progression

Tier	Focus	Complexity	Lines of Code	Time
Tier 1	Fundamentals	Single concept, basic algorithms	150-400	2-4h
Tier 2	Architecture	Multiple modules, abstractions	400-700	4-8h
Tier 3	Advanced	Domain expertise, algorithms	500-900	8-16h
Tier 4	Production	Complete systems, optimization	800-1500	16-32h

Evaluation Metrics

Success Rate: Percentage of challenges solved correctly
Build Success: Projects that compile/build without errors
Test Pass Rate: Projects with passing test suites
Iteration Count: Average iterations needed for error resolution
Time to Solution: End-to-end generation time
Code Quality: Adherence to best practices and patterns

Evaluation Location: src/prompts/eval/ contains all challenge specifications and test cases.

🏗️ Project Structure

alpha-stack/
├── src/
│   ├── agents/                  # Multi-agent system
│   │   ├── planner.py          # Planning agent for error analysis
│   │   └── corrector.py        # Correction agent for fixes
│   ├── sandbox/                 # CubeSandbox integration
│   │   └── cube.py             # CubeSession + SandboxShellManager
│   ├── testing/                 # Planner-driven testing pipeline
│   │   ├── eval_generator.py   # Test-file blueprint + generator
│   │   └── testing.py          # TestingPipeline (sandbox lifecycle)
│   ├── prompts/                 # Jinja2 prompt templates
│   │   └── eval/               # Evaluation challenges
│   │       ├── cuda/           # 10 CUDA challenges
│   │       ├── go/             # 10 Go challenges
│   │       ├── rust/           # 10 Rust challenges
│   │       └── typescript/     # 10 TypeScript challenges
│   ├── utils/                   # Core utilities
│   │   ├── helpers.py          # Helper functions
│   │   ├── prompt_manager.py   # Template management
│   │   ├── error_tracker.py    # Error tracking
│   │   └── tools.py            # Tool definitions
│   ├── generator.py             # Main generation logic
│   ├── eval_generator.py        # Evaluation system
│   ├── cli.py                   # Command-line interface
│   ├── tui.py                   # Terminal UI
│   └── config.py                # Configuration management
├── website/                     # Project website
├── test_runner.py               # Development test runner
└── pyproject.toml              # Project metadata

🔧 Technical Details

AI Model

Primary Model: Google Gemini (configurable via MODEL_NAME)
Alternative Support: OpenRouter API for evaluation framework
Context Management: Intelligent prompt engineering with Jinja2 templates

Multi-Agent System

Planning Agent (src/agents/planner.py):

Analyzes build/test errors using structured error tracking
Generates comprehensive fix plans with tool-based reasoning
Maintains project structure cache for efficient planning
Supports different error types (dependency, docker, common errors)

Correction Agent (src/agents/corrector.py):

Executes planned fixes with code understanding
Validates code changes before application
Uses language-specific parsers for syntax validation
Tracks changes to prevent infinite loops

CubeSandbox Integration

Features:

Per-pipeline microVM provisioned from a configured template
Project tree mirrored to /workspace on session start; subsequent edits are written through
Shell commands stream stdout/stderr live so the planner can detect stalls
Sandbox is killed automatically when the pipeline exits (no leaked microVMs)
Falls back to host execution when no template is configured

Testing Framework (src/testing/testing.py + src/sandbox/cube.py):

CubeSession owns the sandbox handle and file mirroring
SandboxShellManager is a drop-in replacement for the host ShellManager
Real-time log capture and analysis
Iterative error resolution with max round limits
Success/failure validation with detailed reporting

Prompt Engineering

Template System:

Jinja2-based prompt templates for consistency
Context-aware prompt rendering
Specialized templates for different generation phases:
- Software blueprint generation
- Folder structure planning
- File content generation
- Error correction strategies
- Sandbox-aware planner instructions

📊 Performance & Capabilities

Generation Capabilities

Languages: Python, JavaScript/TypeScript, Go, Rust, Java, C/C++, CUDA, and more
Frameworks: Flask, FastAPI, Express.js, React, Vue, Next.js, etc.
Project Types: Web APIs, CLI tools, data processors, system utilities, GPU kernels
File Types: Source code, configuration, tests, documentation

Self-Healing Iterations

Dependency Resolution: Automatically resolves missing packages and version conflicts
Build Fixes: Corrects syntax errors, import issues, configuration problems
Test Fixes: Addresses failing tests, missing test dependencies, assertion errors
Max Iterations: Configurable (default: 5 per phase)

CubeSandbox Validation

Startup: Sub-second microVM provisioning per pipeline run
Test Execution: Isolated /workspace mirroring the project tree
Success Rate: High success rate on Tier 1-2 challenges (>80%)
Lifecycle: Single sandbox per project run, killed on completion

🎓 Academic Context

This work introduces a novel approach to autonomous code generation that addresses key challenges in AI-assisted software development:

Key Contributions

Multi-Agent Architecture: Separation of planning and correction concerns for better error resolution
Iterative Self-Healing: Autonomous error detection and correction without human intervention
Comprehensive Validation: End-to-end validation from build to test execution inside CubeSandbox microVMs
Cross-Language Evaluation: Diverse evaluation suite spanning different programming paradigms
Tool-Augmented Reasoning: Integration of file operations and command execution for context-aware fixes

Research Questions

How effectively can multi-agent systems autonomously resolve software errors?
What is the success rate across different programming paradigms and difficulty levels?
How many iterations are typically required for convergence to a working solution?
What types of errors can be automatically resolved vs. requiring human intervention?

Evaluation Methodology

The evaluation framework (src/prompts/eval/) provides a standardized benchmark with:

40 challenges across 4 languages and 4 difficulty tiers
Clear success criteria (build success, test pass rate)
Reproducible evaluation inside CubeSandbox microVMs
Metrics for iteration count, time to solution, and code quality

For more details on the evaluation suite, see src/prompts/eval/README.md

🤝 Contributing

We welcome contributions! Areas of interest:

Additional programming language support
New evaluation challenges
Performance optimizations
Documentation improvements
Bug fixes and error handling

📜 License

MIT License - see LICENSE file for details

🔗 Links

Repository: github.com/HyperKuvid-Labs/alpha-stack
Issues: github.com/HyperKuvid-Labs/alpha-stack/issues
Evaluation Suite: src/prompts/eval/

📧 Contact

For research collaborations or questions about the ICML 2026 submission, please open an issue or contact the AlphaStack Team.

AlphaStack - Transforming Ideas into Code

Submitted to ICML 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlphaStack

🎯 Key Features

Intelligent Multi-Agent Architecture

Comprehensive Code Generation

CubeSandbox-Based Validation

Extensive Evaluation Framework

How It Works

Architecture Components

Installation

Usage

🔬 Research & Evaluation

Evaluation Suite

CUDA (GPU Computing)

Go (Concurrent Systems)

Rust (Systems Programming)

TypeScript (Type-Safe Applications)

Difficulty Progression

Evaluation Metrics

🏗️ Project Structure

🔧 Technical Details

AI Model

Multi-Agent System

CubeSandbox Integration

Prompt Engineering

📊 Performance & Capabilities

Generation Capabilities

Self-Healing Iterations

CubeSandbox Validation

🎓 Academic Context

Key Contributions

Research Questions

Evaluation Methodology

🤝 Contributing

📜 License

🔗 Links

📧 Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AlphaStack

🎯 Key Features

Intelligent Multi-Agent Architecture

Comprehensive Code Generation

CubeSandbox-Based Validation

Extensive Evaluation Framework

How It Works

Architecture Components

Installation

Usage

🔬 Research & Evaluation

Evaluation Suite

CUDA (GPU Computing)

Go (Concurrent Systems)

Rust (Systems Programming)

TypeScript (Type-Safe Applications)

Difficulty Progression

Evaluation Metrics

🏗️ Project Structure

🔧 Technical Details

AI Model

Multi-Agent System

CubeSandbox Integration

Prompt Engineering

📊 Performance & Capabilities

Generation Capabilities

Self-Healing Iterations

CubeSandbox Validation

🎓 Academic Context

Key Contributions

Research Questions

Evaluation Methodology

🤝 Contributing

📜 License

🔗 Links

📧 Contact