Technical Specification: Multi-Engine Browser OCR

Version: 1.1 Last Updated: 2026-01-15 Status: Ready for Implementation

1. Architectural Overview

The system uses a Strategy Pattern managed by a Factory Service. This decouples the User Interface from the specific implementation details of different OCR libraries.

Core Design Principles

Lazy Loading: Libraries and models are only downloaded when the specific engine is selected.
Resource Isolation: Each engine runs in a dedicated Web Worker to keep the UI responsive.
Memory Hygiene: A strict lifecycle ensures that when an engine is swapped, the previous engine's memory (WASM heap or GPU tensors) is explicitly cleared.

Implementation Scope

MVP (Phase 1):

Single engine: Tesseract.js v5.x
Desktop browsers only (Chrome 90+, Firefox 88+, Safari 15+)
ImageData input, plain string output
IndexedDB model caching

Post-MVP (Phase 2):

Second engine: Transformers.js with TrOCR model
Validates Strategy Pattern with multi-engine architecture
Adds WebGPU acceleration capability

2. The Implementation Stack

Layer	Technology	Purpose
Orchestrator	TypeScript / JavaScript	Manages engine switching and state.
Processing	WebAssembly (WASM)	Executes C++/Rust OCR logic at native speed.
Acceleration	WebGPU / ONNX Runtime	Uses the user's GPU for transformer-based models.
Storage	IndexedDB / OPFS	Caches heavy model files (10MB+) for instant reloads.

3. Class Structure & Logic

A. The Engine Interface

This "Contract" ensures that regardless of the library used, the application interacts with them in the exact same way.

/** Standard interface for all OCR drivers */
interface IOCREngine {
  id: string; // Unique identifier (e.g., "tesseract", "transformers")
  isLoading: boolean; // Current loading state for UI feedback
  load(): Promise<void>; // Setup: download assets & init worker
  process(data: ImageData): Promise<string>; // Execution: the actual OCR task
  destroy(): Promise<void>; // Cleanup: kill workers & clear RAM
}

Data Contracts (MVP):

Input: ImageData objects only. The UI layer handles file upload → canvas conversion.

Output: Plain string containing extracted text. Post-MVP can extend to structured format:

interface OCRResult {
  text: string;
  confidence?: number; // 0-100 quality score
  language?: string; // ISO 639-1 code (e.g., 'en')
}

B. The Factory Manager

The Manager handles the "One at a Time" constraint. It ensures that Engine A is fully disposed of before Engine B starts.

MVP Implementation (single engine):

class OCRManager {
  private activeEngine: IOCREngine | null = null;

  async setEngine(engineType: 'tesseract') {
    // 1. Cleanup existing engine to prevent memory leaks
    if (this.activeEngine) {
      console.log(`Disposing ${this.activeEngine.id}...`);
      await this.activeEngine.destroy();
      this.activeEngine = null;
    }

    // 2. Dynamic Import (Lazy Loading)
    const { TesseractEngine } = await import('./engines/tesseract.engine.js');

    // 3. Initialize new engine
    this.activeEngine = new TesseractEngine();
    await this.activeEngine.load();
  }

  async run(image: ImageData): Promise<string> {
    if (!this.activeEngine) throw new Error('Engine not initialized.');
    return await this.activeEngine.process(image);
  }

  getLoadingState(): boolean {
    return this.activeEngine?.isLoading ?? false;
  }
}

Post-MVP Implementation (multi-engine):

class OCRManager {
  private activeEngine: IOCREngine | null = null;

  async setEngine(engineType: 'tesseract' | 'transformers') {
    // 1. Cleanup existing engine to prevent memory leaks
    if (this.activeEngine) {
      console.log(`Disposing ${this.activeEngine.id}...`);
      await this.activeEngine.destroy();
      this.activeEngine = null;
    }

    // 2. Dynamic Import (Lazy Loading)
    let EngineClass;
    switch (engineType) {
      case 'tesseract':
        const tesseract = await import('./engines/tesseract.engine.js');
        EngineClass = tesseract.TesseractEngine;
        break;
      case 'transformers':
        const transformers = await import('./engines/transformers.engine.js');
        EngineClass = transformers.TransformersEngine;
        break;
    }

    // 3. Initialize new engine
    this.activeEngine = new EngineClass();
    await this.activeEngine.load();
  }

  async run(image: ImageData): Promise<string> {
    if (!this.activeEngine) throw new Error('Select an engine first.');
    return await this.activeEngine.process(image);
  }

  getLoadingState(): boolean {
    return this.activeEngine?.isLoading ?? false;
  }
}

4. Operational Workflow

MVP Workflow (single engine):

Initialization: On page load, feature detection checks for WASM, Web Workers, and IndexedDB support.
Engine Setup: Manager initializes Tesseract.js engine via setEngine('tesseract').
Model Loading: Tesseract.js downloads English traineddata (~4MB) from CDN.
Caching: Browser stores model in IndexedDB. Subsequent loads are instant (0ms network time).
Image Processing:
- User uploads image file
- UI converts to canvas, then extracts ImageData
- Manager sends ImageData to Tesseract.js worker (zero-copy transfer)
- Worker processes and returns plain text string
Display: UI shows extracted text to user.
Cleanup: On page unload, worker terminates automatically.

Post-MVP Workflow (multi-engine):

Selection: User selects engine from dropdown: "Fast (Tesseract)" or "High Accuracy (Transformers.js)".
Engine Switch: If switching engines:
- Manager calls destroy() on current engine
- Worker terminates, reclaiming 50MB–200MB of RAM
- Manager dynamically imports new engine class
Bootstrap: New engine loads its models (WebGPU/WASM binaries).
Caching: IndexedDB caches models per-engine.
Processing: Same as MVP workflow.
Cleanup: When switching engines, proper cleanup prevents memory leaks.

5. Critical Warnings & Best Practices

Important

WASM Memory Limits: Browsers usually limit WASM memory to 2GB or 4GB. If you don't call .destroy() when switching engines, users on mobile devices or low-RAM laptops will experience "Out of Memory" crashes after 3–4 switches.

Pre-processing is King: Most browser OCR engines perform significantly better if you convert the image to grayscale and increase contrast using a simple Canvas filter before passing it to the engine.
Progress Indicators: Because models can be large (20MB+), always implement a progress bar for the load() phase so the user knows the app is downloading data, not frozen.
WebGPU Fallback: If using modern transformer models, always check if (!navigator.gpu) and fallback to a WASM-only engine if the user's hardware is older.

6. Concrete Engine Implementations

A. Tesseract.js Engine (MVP)

Installation:

npm install tesseract.js

Implementation:

// engines/tesseract.engine.ts
import Tesseract, { Worker } from 'tesseract.js';

export class TesseractEngine implements IOCREngine {
  public readonly id = 'tesseract';
  public isLoading = false;
  private worker: Worker | null = null;

  async load(): Promise<void> {
    this.isLoading = true;
    try {
      // Create worker with automatic caching
      this.worker = await Tesseract.createWorker('eng', 1, {
        cacheMethod: 'refresh', // Use IndexedDB cache
        cachePath: '.', // Default cache location
      });
      console.log('Tesseract.js engine loaded');
    } finally {
      this.isLoading = false;
    }
  }

  async process(data: ImageData): Promise<string> {
    if (!this.worker) {
      throw new Error('Tesseract engine not loaded. Call load() first.');
    }

    const result = await this.worker.recognize(data);
    return result.data.text;
  }

  async destroy(): Promise<void> {
    if (this.worker) {
      await this.worker.terminate();
      this.worker = null;
      console.log('Tesseract.js engine destroyed');
    }
  }
}

Characteristics:

Bundle size: ~2MB core + ~4MB English model
Performance: 2-5 seconds for typical document
Browser support: All browsers with WASM (Chrome 90+, Firefox 88+, Safari 15+)
Memory usage: ~50-100MB during processing

B. Transformers.js Engine (Post-MVP)

Installation:

npm install @huggingface/transformers

Implementation:

// engines/transformers.engine.ts
import { pipeline, ImageToTextPipeline } from '@huggingface/transformers';

export class TransformersEngine implements IOCREngine {
  public readonly id = 'transformers';
  public isLoading = false;
  private ocr: ImageToTextPipeline | null = null;

  async load(): Promise<void> {
    this.isLoading = true;
    try {
      // Load TrOCR model (quantized for browser)
      // Model automatically cached in IndexedDB
      this.ocr = await pipeline('image-to-text', 'Xenova/trocr-small-printed', {
        quantized: true, // Use quantized model for smaller size
      });
      console.log('Transformers.js engine loaded');
    } finally {
      this.isLoading = false;
    }
  }

  async process(data: ImageData): Promise<string> {
    if (!this.ocr) {
      throw new Error('Transformers engine not loaded. Call load() first.');
    }

    // Convert ImageData to format expected by Transformers.js
    const result = await this.ocr(data);
    return result[0].generated_text;
  }

  async destroy(): Promise<void> {
    // Transformers.js handles cleanup internally
    this.ocr = null;
    console.log('Transformers.js engine destroyed');
  }
}

Characteristics:

Bundle size: Variable (model-dependent, typically 50-150MB quantized)
Performance: 1-3 seconds with WebGPU, 5-10 seconds CPU fallback
Browser support: WebGPU recommended (Chrome 113+, limited Safari/Firefox)
Memory usage: ~200-500MB during processing
Accuracy: Higher than Tesseract on printed text

WebGPU Detection:

// Check before loading Transformers engine
if (!navigator.gpu) {
  console.warn('WebGPU not available. Transformers.js will use CPU (slower).');
  // Consider showing user a warning or defaulting to Tesseract.js
}

C. EasyOCR.js Engine (Post-MVP)

Installation:

npm install @qduc/easyocr-core @qduc/easyocr-web onnxruntime-web

Implementation:

// engines/easyocr-engine.ts
import { loadDetectorModel, loadRecognizerModel, recognize } from '@qduc/easyocr-web';

export class EasyOCREngine implements IOCREngine {
  public id = 'easyocr';
  // ... implementation details ...
}

Characteristics:

Bundle size: ~110MB for detection + recognition models
Performance: 2-5 seconds (WASM/WebGL)
Browser support: Extensive (WASM + ONNX Runtime)
Memory usage: ~300-600MB during processing
Accuracy: Very high for multilingual text

7. Feature Detection & Browser Support

Required Features Check:

// utils/feature-detection.ts
export function checkBrowserSupport(): { supported: boolean; missing: string[] } {
  const missing: string[] = [];

  if (typeof WebAssembly === 'undefined') {
    missing.push('WebAssembly');
  }

  if (typeof Worker === 'undefined') {
    missing.push('Web Workers');
  }

  if (typeof indexedDB === 'undefined') {
    missing.push('IndexedDB');
  }

  return {
    supported: missing.length === 0,
    missing,
  };
}

// In app initialization:
const { supported, missing } = checkBrowserSupport();
if (!supported) {
  throw new Error(
    `Browser not supported. Missing: ${missing.join(', ')}. ` +
      `Please use Chrome 90+, Firefox 88+, or Safari 15+.`
  );
}

Minimum Browser Versions:

Browser	Version	WASM	Workers	IndexedDB	WebGPU
Chrome/Edge	90+	✅	✅	✅	113+
Firefox	88+	✅	✅	✅	141+ (limited)
Safari	15+	✅	✅	✅	26+ (limited)
Mobile	Post-MVP	-	-	-	-

8. Implementation Checklist

MVP (Tesseract.js only):

Post-MVP (Add Transformers.js and EasyOCR.js):

Install dependencies: @huggingface/transformers, @qduc/easyocr-web
Implement TransformersEngine class
Implement EasyOCREngine class
Add WebGPU feature detection
Update OCRManager to support multiple engines
Add engine selection UI (dropdown)
Test engine switching (memory cleanup)
Performance comparison (Tesseract vs Transformers vs EasyOCR)
Document WebGPU browser support limitations

9. Translation System (Bergamot Integration)

The translation system uses the Bergamot engine, which is a WASM-based port of the Marian NMT framework, as used in Firefox Translations.

Architecture

Translator Service: Wrapped in ITextTranslator interface.
Worker-based: Runs in a separate worker to avoid blocking the main thread.
Memory Isolation: Requires SharedArrayBuffer, which mandates crossOriginIsolated headers (handled by coi-serviceworker.js).
Model Management: Models are downloaded from Mozilla's storage and cached in IndexedDB.

Components

src/translation/bergamot-translator.ts: Core translation logic.
src/translation-controller.ts: Manages the translation UI, language selection, and result display.
src/utils/image-writeback.ts: Renders translated text back onto the image, matching original positions.
src/utils/paragraph-grouping.ts: Groups individual OCR items into semantic paragraphs for better translation quality.

Write-Back Rendering Profiles

Write-back rendering supports three quality presets:

fast: Uses fill-based erase without inpainting for speed.
balanced: Attempts OpenCV inpainting when available, then falls back to fill-based erase.
high-quality: Uses inpainting fallback logic plus halo text stroke for readability on noisy backgrounds.

Language Registry

The language registry is generated via scripts/generate-bergamot-registry.mjs, which fetches the latest model list from Mozilla and prepares it for the browser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical Specification: Multi-Engine Browser OCR

1. Architectural Overview

Core Design Principles

Implementation Scope

2. The Implementation Stack

3. Class Structure & Logic

A. The Engine Interface

B. The Factory Manager

4. Operational Workflow

5. Critical Warnings & Best Practices

6. Concrete Engine Implementations

A. Tesseract.js Engine (MVP)

B. Transformers.js Engine (Post-MVP)

C. EasyOCR.js Engine (Post-MVP)

7. Feature Detection & Browser Support

8. Implementation Checklist

9. Translation System (Bergamot Integration)

Architecture

Components

Write-Back Rendering Profiles

Language Registry

FilesExpand file tree

SPECIFICATION.md

Latest commit

History

SPECIFICATION.md

File metadata and controls

Technical Specification: Multi-Engine Browser OCR

1. Architectural Overview

Core Design Principles

Implementation Scope

2. The Implementation Stack

3. Class Structure & Logic

A. The Engine Interface

B. The Factory Manager

4. Operational Workflow

5. Critical Warnings & Best Practices

6. Concrete Engine Implementations

A. Tesseract.js Engine (MVP)

B. Transformers.js Engine (Post-MVP)

C. EasyOCR.js Engine (Post-MVP)

7. Feature Detection & Browser Support

8. Implementation Checklist

9. Translation System (Bergamot Integration)

Architecture

Components

Write-Back Rendering Profiles

Language Registry