Skip to content

Latest commit

 

History

History
353 lines (311 loc) · 12.6 KB

File metadata and controls

353 lines (311 loc) · 12.6 KB

Multi-LLM Orchestration Architecture

STATUS: ✅ IMPLEMENTED

  • SimpleOrchestrator: ✅ Browser-compatible orchestration
  • RouterAgent: ✅ Intent classification
  • VisionAgent: ✅ Screenshot analysis (minicpm-v)
  • PromptEngineerAgent: ✅ Design spec generation
  • ExecutorAgent: ✅ JSON element generation
  • Multi-Provider Support: ✅ Ollama (local) + Gemini 2.0 Flash (cloud)

Overview

A 3-agent system for intelligent UI generation with specialized LLMs:

┌─────────────────────────────────────────────────────────────────┐
│                        USER INPUT                                │
│  "Landing hero with Swiss typography" or [Screenshot paste]      │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ROUTER / CLASSIFIER                           │
│  Determines: text-only → Prompt Engineer                         │
│              image → Vision Analyzer                             │
│              mixed → Both in sequence                            │
└──────────────────────────┬──────────────────────────────────────┘
                           │
           ┌───────────────┼───────────────┐
           ▼               ▼               ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ PROMPT ENGINEER  │ │ VISION ANALYZER  │ │ STYLE EXTRACTOR  │
│                  │ │                  │ │                  │
│ - Understands    │ │ - OCR for text   │ │ - Analyzes       │
│   design intent  │ │ - UI element     │ │   existing       │
│ - Combines       │ │   detection      │ │   elements       │
│   layout+style   │ │ - Color/spacing  │ │ - Extracts       │
│ - Creates        │ │   extraction     │ │   patterns       │
│   detailed spec  │ │ - Layout         │ │ - Maintains      │
│                  │ │   recognition    │ │   consistency    │
│ Model: qwen2.5   │ │ Model: qwen2-vl  │ │ Model: qwen2.5   │
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
         │                    │                    │
         └────────────────────┼────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      EXECUTOR LLM                                │
│                                                                  │
│  Takes: Detailed specification from above agents                 │
│  Outputs: Precise JSON with canvas elements                      │
│  Model: qwen2.5:7b (fast, good at structured output)            │
│                                                                  │
│  Specialized for:                                                │
│  - Exact positioning (respects device dimensions)                │
│  - Color accuracy (hex values)                                   │
│  - Typography consistency                                        │
│  - Element relationships                                         │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    CANVAS RENDERER                               │
│  Validates JSON → Creates CanvasElements → Updates Store         │
└─────────────────────────────────────────────────────────────────┘

Agent Specifications

1. Router/Classifier (Simple, fast)

interface RouterInput {
  text?: string;
  image?: string; // base64
  selectedElements?: CanvasElement[];
}

interface RouterOutput {
  route: 'prompt-engineer' | 'vision' | 'style-extract' | 'combined';
  context: {
    hasText: boolean;
    hasImage: boolean;
    hasSelection: boolean;
    intent: 'create' | 'modify' | 'analyze' | 'reproduce';
  };
}

2. Prompt Engineer Agent

Purpose: Transform vague user requests into detailed design specifications

Input Examples:

  • "Landing hero Swiss style"
  • "Login form minimalist Japanese"
  • "Dashboard brutalist"

Output: Detailed specification

interface DesignSpec {
  layout: {
    type: 'hero' | 'form' | 'dashboard' | 'card' | 'list' | 'grid';
    structure: string; // Detailed layout description
    elementCount: number;
  };
  style: {
    name: string; // "Swiss International", "Japanese Minimalism", etc.
    colors: {
      primary: string;
      secondary: string;
      background: string;
      text: string;
      accent: string;
    };
    typography: {
      headingFont: string;
      bodyFont: string;
      scale: number[]; // Font size scale
    };
    spacing: {
      base: number;
      scale: 'linear' | 'phi' | 'modular';
    };
    borders: {
      radius: number;
      width: number;
      style: 'none' | 'subtle' | 'bold';
    };
  };
  elements: {
    description: string;
    type: string;
    content?: string;
    importance: 'primary' | 'secondary' | 'tertiary';
  }[];
}

3. Vision Analyzer Agent

Purpose: Extract UI structure from screenshots

Model: qwen2-vl, llava, or minicpm-v

Output:

interface VisionAnalysis {
  layout: {
    type: string;
    gridColumns?: number;
    spacing: number;
  };
  elements: {
    type: string;
    bounds: { x: number; y: number; width: number; height: number };
    text?: string;
    color?: string;
    children?: VisionAnalysis['elements'];
  }[];
  colors: string[]; // Extracted palette
  fonts: {
    detected: string[];
    sizes: number[];
  };
  styleGuess: string; // "Material Design", "iOS", "Custom", etc.
}

4. Style Extractor Agent

Purpose: Analyze existing canvas elements for consistency

Input: Current canvas state Output: Style rules to maintain

5. Executor Agent

Purpose: Generate precise JSON from specifications

Input: DesignSpec + VisionAnalysis + StyleRules Output: CanvasElement[]

LangGraph State Machine

import { StateGraph, END } from "@langchain/langgraph";

interface UIGenerationState {
  // Input
  userPrompt: string;
  screenshot?: string;
  canvasState: CanvasElement[];
  device: DevicePreset;

  // Routing
  route: string;

  // Intermediate
  designSpec?: DesignSpec;
  visionAnalysis?: VisionAnalysis;
  styleRules?: StyleRules;

  // Output
  generatedElements?: CanvasElement[];
  error?: string;
}

const workflow = new StateGraph<UIGenerationState>({
  channels: {
    userPrompt: { value: (x, y) => y ?? x },
    screenshot: { value: (x, y) => y ?? x },
    // ... etc
  }
});

// Add nodes
workflow.addNode("router", routerNode);
workflow.addNode("promptEngineer", promptEngineerNode);
workflow.addNode("visionAnalyzer", visionAnalyzerNode);
workflow.addNode("styleExtractor", styleExtractorNode);
workflow.addNode("executor", executorNode);
workflow.addNode("validator", validatorNode);

// Add edges with conditions
workflow.addConditionalEdges("router", (state) => {
  if (state.route === 'vision') return "visionAnalyzer";
  if (state.route === 'combined') return "visionAnalyzer"; // Then to promptEngineer
  return "promptEngineer";
});

workflow.addEdge("visionAnalyzer", "promptEngineer");
workflow.addEdge("promptEngineer", "styleExtractor");
workflow.addEdge("styleExtractor", "executor");
workflow.addEdge("executor", "validator");
workflow.addConditionalEdges("validator", (state) => {
  if (state.error) return "executor"; // Retry
  return END;
});

workflow.setEntryPoint("router");

Implementation Plan

Phase 1: Agent Infrastructure (Current Sprint)

  • Create src/canvas/services/agents/ directory
  • Implement base Agent class with Ollama integration
  • Create RouterAgent
  • Create PromptEngineerAgent
  • Update VisionAgent (already have basic version)
  • Create ExecutorAgent

Phase 2: LangGraph Integration (Next Sprint)

  • Add @langchain/langgraph dependency
  • Implement state machine
  • Add streaming support for progress updates
  • Error handling and retry logic

Phase 3: UI Integration

  • Update LLMPanel with workflow selector
  • Add progress indicator for multi-step generation
  • Show intermediate results (design spec preview)
  • Allow editing of intermediate specs before execution

File Structure

src/canvas/services/
├── llm.ts                    # Existing - keep for backward compat
├── agents/
│   ├── index.ts              # Exports all agents
│   ├── types.ts              # Shared types
│   ├── BaseAgent.ts          # Abstract base class
│   ├── RouterAgent.ts        # Input classification
│   ├── PromptEngineerAgent.ts # Design spec generation
│   ├── VisionAgent.ts        # Screenshot analysis
│   ├── StyleExtractorAgent.ts # Canvas analysis
│   └── ExecutorAgent.ts      # JSON generation
├── orchestration/
│   ├── index.ts
│   ├── UIGenerationGraph.ts  # LangGraph state machine
│   └── workflows.ts          # Pre-defined workflows
└── prompts/
    ├── promptEngineer.ts     # System prompts
    ├── vision.ts
    ├── executor.ts
    └── styles/               # Style-specific prompts
        ├── swiss.ts
        ├── brutalist.ts
        ├── japanese.ts
        └── index.ts

Style Prompt Templates

// prompts/styles/swiss.ts
export const SWISS_STYLE = {
  name: 'Swiss International',
  description: 'Clean, grid-based, typographic hierarchy',
  colors: {
    primary: '#000000',
    secondary: '#333333',
    background: '#FFFFFF',
    accent: '#FF0000', // Classic Swiss red
  },
  typography: {
    heading: 'Helvetica Neue, Helvetica, Arial, sans-serif',
    body: 'Helvetica Neue, Helvetica, Arial, sans-serif',
    weights: [400, 700],
    scale: [12, 14, 18, 24, 36, 48, 72],
  },
  spacing: {
    base: 8,
    method: 'modular', // 8, 16, 24, 32, 48, 64
  },
  borders: {
    radius: 0,
    width: 1,
    color: '#000000',
  },
  characteristics: [
    'Strong grid alignment',
    'Asymmetric balance',
    'High contrast',
    'Generous whitespace',
    'Sans-serif typography',
    'Minimal decoration',
  ],
};

Benefits of This Architecture

  1. Separation of Concerns: Each agent has one job
  2. Better Quality: Specialized prompts for each task
  3. Debuggability: Can inspect intermediate states
  4. Flexibility: Easy to swap models per agent
  5. Scalability: Can add new agents (e.g., AccessibilityChecker)
  6. Consistency: Style extractor ensures coherent designs

Model Recommendations

Agent Recommended Model Why
Router qwen2.5:1.5b Fast, simple classification
Prompt Engineer qwen2.5:7b Good reasoning, design knowledge
Vision qwen2-vl:7b Best open vision model
Style Extractor qwen2.5:3b Pattern recognition
Executor qwen2.5:7b Structured output, precision

Next Steps

  1. Review this architecture
  2. Decide on Phase 1 scope
  3. Create agent infrastructure
  4. Test with simple workflows
  5. Add LangGraph when ready