STATUS: ✅ IMPLEMENTED
- SimpleOrchestrator: ✅ Browser-compatible orchestration
- RouterAgent: ✅ Intent classification
- VisionAgent: ✅ Screenshot analysis (minicpm-v)
- PromptEngineerAgent: ✅ Design spec generation
- ExecutorAgent: ✅ JSON element generation
- Multi-Provider Support: ✅ Ollama (local) + Gemini 2.0 Flash (cloud)
A 3-agent system for intelligent UI generation with specialized LLMs:
┌─────────────────────────────────────────────────────────────────┐
│ USER INPUT │
│ "Landing hero with Swiss typography" or [Screenshot paste] │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ROUTER / CLASSIFIER │
│ Determines: text-only → Prompt Engineer │
│ image → Vision Analyzer │
│ mixed → Both in sequence │
└──────────────────────────┬──────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ PROMPT ENGINEER │ │ VISION ANALYZER │ │ STYLE EXTRACTOR │
│ │ │ │ │ │
│ - Understands │ │ - OCR for text │ │ - Analyzes │
│ design intent │ │ - UI element │ │ existing │
│ - Combines │ │ detection │ │ elements │
│ layout+style │ │ - Color/spacing │ │ - Extracts │
│ - Creates │ │ extraction │ │ patterns │
│ detailed spec │ │ - Layout │ │ - Maintains │
│ │ │ recognition │ │ consistency │
│ Model: qwen2.5 │ │ Model: qwen2-vl │ │ Model: qwen2.5 │
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
└────────────────────┼────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ EXECUTOR LLM │
│ │
│ Takes: Detailed specification from above agents │
│ Outputs: Precise JSON with canvas elements │
│ Model: qwen2.5:7b (fast, good at structured output) │
│ │
│ Specialized for: │
│ - Exact positioning (respects device dimensions) │
│ - Color accuracy (hex values) │
│ - Typography consistency │
│ - Element relationships │
└──────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ CANVAS RENDERER │
│ Validates JSON → Creates CanvasElements → Updates Store │
└─────────────────────────────────────────────────────────────────┘
interface RouterInput {
text?: string;
image?: string; // base64
selectedElements?: CanvasElement[];
}
interface RouterOutput {
route: 'prompt-engineer' | 'vision' | 'style-extract' | 'combined';
context: {
hasText: boolean;
hasImage: boolean;
hasSelection: boolean;
intent: 'create' | 'modify' | 'analyze' | 'reproduce';
};
}Purpose: Transform vague user requests into detailed design specifications
Input Examples:
- "Landing hero Swiss style"
- "Login form minimalist Japanese"
- "Dashboard brutalist"
Output: Detailed specification
interface DesignSpec {
layout: {
type: 'hero' | 'form' | 'dashboard' | 'card' | 'list' | 'grid';
structure: string; // Detailed layout description
elementCount: number;
};
style: {
name: string; // "Swiss International", "Japanese Minimalism", etc.
colors: {
primary: string;
secondary: string;
background: string;
text: string;
accent: string;
};
typography: {
headingFont: string;
bodyFont: string;
scale: number[]; // Font size scale
};
spacing: {
base: number;
scale: 'linear' | 'phi' | 'modular';
};
borders: {
radius: number;
width: number;
style: 'none' | 'subtle' | 'bold';
};
};
elements: {
description: string;
type: string;
content?: string;
importance: 'primary' | 'secondary' | 'tertiary';
}[];
}Purpose: Extract UI structure from screenshots
Model: qwen2-vl, llava, or minicpm-v
Output:
interface VisionAnalysis {
layout: {
type: string;
gridColumns?: number;
spacing: number;
};
elements: {
type: string;
bounds: { x: number; y: number; width: number; height: number };
text?: string;
color?: string;
children?: VisionAnalysis['elements'];
}[];
colors: string[]; // Extracted palette
fonts: {
detected: string[];
sizes: number[];
};
styleGuess: string; // "Material Design", "iOS", "Custom", etc.
}Purpose: Analyze existing canvas elements for consistency
Input: Current canvas state Output: Style rules to maintain
Purpose: Generate precise JSON from specifications
Input: DesignSpec + VisionAnalysis + StyleRules Output: CanvasElement[]
import { StateGraph, END } from "@langchain/langgraph";
interface UIGenerationState {
// Input
userPrompt: string;
screenshot?: string;
canvasState: CanvasElement[];
device: DevicePreset;
// Routing
route: string;
// Intermediate
designSpec?: DesignSpec;
visionAnalysis?: VisionAnalysis;
styleRules?: StyleRules;
// Output
generatedElements?: CanvasElement[];
error?: string;
}
const workflow = new StateGraph<UIGenerationState>({
channels: {
userPrompt: { value: (x, y) => y ?? x },
screenshot: { value: (x, y) => y ?? x },
// ... etc
}
});
// Add nodes
workflow.addNode("router", routerNode);
workflow.addNode("promptEngineer", promptEngineerNode);
workflow.addNode("visionAnalyzer", visionAnalyzerNode);
workflow.addNode("styleExtractor", styleExtractorNode);
workflow.addNode("executor", executorNode);
workflow.addNode("validator", validatorNode);
// Add edges with conditions
workflow.addConditionalEdges("router", (state) => {
if (state.route === 'vision') return "visionAnalyzer";
if (state.route === 'combined') return "visionAnalyzer"; // Then to promptEngineer
return "promptEngineer";
});
workflow.addEdge("visionAnalyzer", "promptEngineer");
workflow.addEdge("promptEngineer", "styleExtractor");
workflow.addEdge("styleExtractor", "executor");
workflow.addEdge("executor", "validator");
workflow.addConditionalEdges("validator", (state) => {
if (state.error) return "executor"; // Retry
return END;
});
workflow.setEntryPoint("router");- Create
src/canvas/services/agents/directory - Implement base Agent class with Ollama integration
- Create RouterAgent
- Create PromptEngineerAgent
- Update VisionAgent (already have basic version)
- Create ExecutorAgent
- Add @langchain/langgraph dependency
- Implement state machine
- Add streaming support for progress updates
- Error handling and retry logic
- Update LLMPanel with workflow selector
- Add progress indicator for multi-step generation
- Show intermediate results (design spec preview)
- Allow editing of intermediate specs before execution
src/canvas/services/
├── llm.ts # Existing - keep for backward compat
├── agents/
│ ├── index.ts # Exports all agents
│ ├── types.ts # Shared types
│ ├── BaseAgent.ts # Abstract base class
│ ├── RouterAgent.ts # Input classification
│ ├── PromptEngineerAgent.ts # Design spec generation
│ ├── VisionAgent.ts # Screenshot analysis
│ ├── StyleExtractorAgent.ts # Canvas analysis
│ └── ExecutorAgent.ts # JSON generation
├── orchestration/
│ ├── index.ts
│ ├── UIGenerationGraph.ts # LangGraph state machine
│ └── workflows.ts # Pre-defined workflows
└── prompts/
├── promptEngineer.ts # System prompts
├── vision.ts
├── executor.ts
└── styles/ # Style-specific prompts
├── swiss.ts
├── brutalist.ts
├── japanese.ts
└── index.ts
// prompts/styles/swiss.ts
export const SWISS_STYLE = {
name: 'Swiss International',
description: 'Clean, grid-based, typographic hierarchy',
colors: {
primary: '#000000',
secondary: '#333333',
background: '#FFFFFF',
accent: '#FF0000', // Classic Swiss red
},
typography: {
heading: 'Helvetica Neue, Helvetica, Arial, sans-serif',
body: 'Helvetica Neue, Helvetica, Arial, sans-serif',
weights: [400, 700],
scale: [12, 14, 18, 24, 36, 48, 72],
},
spacing: {
base: 8,
method: 'modular', // 8, 16, 24, 32, 48, 64
},
borders: {
radius: 0,
width: 1,
color: '#000000',
},
characteristics: [
'Strong grid alignment',
'Asymmetric balance',
'High contrast',
'Generous whitespace',
'Sans-serif typography',
'Minimal decoration',
],
};- Separation of Concerns: Each agent has one job
- Better Quality: Specialized prompts for each task
- Debuggability: Can inspect intermediate states
- Flexibility: Easy to swap models per agent
- Scalability: Can add new agents (e.g., AccessibilityChecker)
- Consistency: Style extractor ensures coherent designs
| Agent | Recommended Model | Why |
|---|---|---|
| Router | qwen2.5:1.5b | Fast, simple classification |
| Prompt Engineer | qwen2.5:7b | Good reasoning, design knowledge |
| Vision | qwen2-vl:7b | Best open vision model |
| Style Extractor | qwen2.5:3b | Pattern recognition |
| Executor | qwen2.5:7b | Structured output, precision |
- Review this architecture
- Decide on Phase 1 scope
- Create agent infrastructure
- Test with simple workflows
- Add LangGraph when ready