Vision-Core

Backend-first image embedding library. The frontend captures images and sends them to the backend, where all preprocessing and inference happens.

What is Vision-Core?

Vision-Core converts images into embedding vectors — lists of numbers that capture visual features. Two similar images produce similar embeddings, enabling:

Similarity Search — "Find products that look like this photo"
Classification — Compare embeddings against reference categories
Clustering — Group images by visual similarity
Face Recognition — Compare face embeddings to verify identity
Duplicate Detection — Find near-identical images in large datasets

Architecture

Frontend (web/mobile)          Backend (Node.js)
─────────────────────         ──────────────────────────────
Capture image ──── send bytes ──→ Decode (e.g. sharp)
                                    │
                                    ▼
                                  VisionCore.embed(image)
                                    ├─ Resize (bilinear interpolation)
                                    ├─ Normalize (mean/std per channel)
                                    └─ Engine inference (ONNX Runtime)
                                    │
                                    ▼
                                  EmbeddingResult
                                    ├─ embedding: Float32Array
                                    ├─ dimensions: number
                                    └─ modelId: string

All image processing runs on the backend. The frontend only needs to capture and send the image.

Packages

Package	Description
`@vision-core/types`	Shared type definitions (`ImageInput`, `EmbeddingEngine`, `ModelConfig`, etc.)
`@vision-core/core`	`VisionCore` class — preprocessing + inference orchestration
`@vision-core/engine-onnx`	ONNX Runtime embedding engine implementation

Quick Start

import { createVisionCore } from '@vision-core/core';
import { createOnnxEngine } from '@vision-core/engine-onnx';
import type { ImageInput, ModelConfig } from '@vision-core/types';
import * as ort from 'onnxruntime-node';
import sharp from 'sharp';
import fs from 'fs/promises';

// 1. Configure the model
const config: ModelConfig = {
  modelSource: './models/mobilenet.onnx',
  modelLoader: (source) => fs.readFile(source).then(b => b.buffer),
  inputTensorName: 'input',
  outputTensorName: 'output',
  inputWidth: 224,
  inputHeight: 224,
  channels: 3,
  channelOrder: 'CHW',
  normalization: {
    mean: [0.485, 0.456, 0.406],
    std: [0.229, 0.224, 0.225],
  },
};

// 2. Create engine and vision core
const engine = createOnnxEngine(ort);
const vc = createVisionCore(engine);
await vc.initialize(config);

// 3. Receive image from frontend, decode to RGBA pixels
const imageBuffer = await receiveImageFromFrontend(); // however your API receives it
const { data, info } = await sharp(imageBuffer)
  .ensureAlpha()
  .raw()
  .toBuffer({ resolveWithObject: true });

const image: ImageInput = {
  data: new Uint8Array(data.buffer),
  width: info.width,
  height: info.height,
};

// 4. Get embedding
const result = await vc.embed(image);
console.log(result.embedding);  // Float32Array of the embedding vector
console.log(result.dimensions); // e.g. 512

// 5. Clean up when done
await vc.dispose();

How It Works

Step 1: Frontend sends image bytes

The frontend (web browser, React Native app, etc.) captures an image and sends the raw bytes (JPEG, PNG, etc.) to your backend API.

Step 2: Backend decodes to RGBA pixels

Use a library like sharp (Node.js) to decode the image into raw RGBA pixel data:

const { data, info } = await sharp(jpegBuffer)
  .ensureAlpha()
  .raw()
  .toBuffer({ resolveWithObject: true });

const image: ImageInput = {
  data: new Uint8Array(data.buffer),
  width: info.width,
  height: info.height,
};

Step 3: VisionCore preprocesses and runs inference

When you call vc.embed(image), VisionCore:

Resizes the image to the model's expected dimensions (e.g. 224x224) using bilinear interpolation
Normalizes pixel values: converts 0-255 to 0-1, then applies per-channel mean/std normalization
Arranges data in the model's expected channel order (CHW or HWC)
Runs inference through the ONNX engine to produce the embedding vector

Step 4: Use the embedding

The returned EmbeddingResult contains:

embedding — Float32Array of the raw embedding vector
dimensions — length of the embedding
modelId — identifier of the model that produced it

Use l2Normalize from @vision-core/types before comparing embeddings:

import { l2Normalize } from '@vision-core/types';

const normalized = l2Normalize(result.embedding);

ModelConfig Reference

Field	Type	Description
`modelSource`	`string`	Path or URL to the ONNX model file
`modelLoader`	`(source: string) => Promise<ArrayBuffer>`	Function to load the model binary
`inputTensorName`	`string`	Name of the model's input tensor (check with Netron)
`outputTensorName`	`string`	Name of the model's output tensor
`inputWidth`	`number`	Expected input width in pixels (e.g. 224)
`inputHeight`	`number`	Expected input height in pixels (e.g. 224)
`channels`	`3`	Always 3 (RGB)
`channelOrder`	`'CHW' \| 'HWC'`	Channel layout — most models use CHW
`normalization.mean`	`[number, number, number]`	Per-channel mean for normalization
`normalization.std`	`[number, number, number]`	Per-channel std for normalization

Common Model Configs

MobileNetV3 / EfficientNet (ImageNet)

{ mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225] }
// inputWidth: 224, inputHeight: 224, channelOrder: 'CHW'

CLIP (OpenAI)

{ mean: [0.48145466, 0.4578275, 0.40821073], std: [0.26862954, 0.26130258, 0.27577711] }
// inputWidth: 224, inputHeight: 224, channelOrder: 'CHW'

No normalization (raw 0-1)

{ mean: [0, 0, 0], std: [1, 1, 1] }

Error Handling

Error	When
`EngineNotInitializedError`	`embed()` called before `initialize()`
`InferenceError`	ONNX engine fails during inference
`InvalidInputError`	Tensor validation fails (wrong shape/size)

Development

# Install dependencies
yarn install

# Build all packages
yarn build

# Run all tests
yarn test

# Type check
yarn lint

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
packages		packages
.gitignore		.gitignore
README.md		README.md
package.json		package.json
tsconfig.base.json		tsconfig.base.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Core

What is Vision-Core?

Architecture

Packages

Quick Start

How It Works

Step 1: Frontend sends image bytes

Step 2: Backend decodes to RGBA pixels

Step 3: VisionCore preprocesses and runs inference

Step 4: Use the embedding

ModelConfig Reference

Common Model Configs

Error Handling

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision-Core

What is Vision-Core?

Architecture

Packages

Quick Start

How It Works

Step 1: Frontend sends image bytes

Step 2: Backend decodes to RGBA pixels

Step 3: VisionCore preprocesses and runs inference

Step 4: Use the embedding

ModelConfig Reference

Common Model Configs

Error Handling

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages