Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 50 additions & 2 deletions src/providers/embedding/local.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import type { EmbeddingProvider } from "../../types.js";
import { getEnvVar } from "../../config.js";

type Pipeline = (
task: string,
Expand All @@ -10,11 +11,57 @@ type Pipeline = (
) => Promise<{ tolist: () => number[][] }>
>;

/** 已知模型的嵌入维度映射表(未列出的模型需通过 OPENAI_EMBEDDING_DIMENSIONS 指定) */
const KNOWN_DIMS: Record<string, number> = {
// MiniLM 系列(英文)
"Xenova/all-MiniLM-L6-v2": 384,
// BGE 中文系列
"Xenova/bge-large-zh-v1.5": 1024,
"Xenova/bge-base-zh-v1.5": 768,
"Xenova/bge-small-zh-v1.5": 512,
// BGE 多语言系列
"Xenova/bge-m3": 1024,
// 多语言 MiniLM
"Xenova/paraphrase-multilingual-MiniLM-L12-v2": 384,
// E5 多语言系列
"Xenova/multilingual-e5-large": 1024,
"Xenova/multilingual-e5-base": 768,
"Xenova/multilingual-e5-small": 384,
};

const DEFAULT_MODEL = "Xenova/all-MiniLM-L6-v2";
const DEFAULT_DIMS = 384;

function resolveDimensions(
modelName: string,
override: string | undefined,
): number {
if (override !== undefined && override.trim().length > 0) {
const parsed = parseInt(override, 10);
if (!Number.isFinite(parsed) || parsed <= 0) {
throw new Error(
`OPENAI_EMBEDDING_DIMENSIONS must be a positive integer, got: ${override}`,
);
}
return parsed;
}
return KNOWN_DIMS[modelName] ?? DEFAULT_DIMS;
}
Comment on lines +35 to +49

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current behavior and contract evidence in code/docs.
rg -n -C3 'function resolveDimensions|parseInt\(|KNOWN_DIMS\[modelName\] \?\? DEFAULT_DIMS' src/providers/embedding/local.ts
rg -n -C2 'OPENAI_EMBEDDING_DIMENSIONS.*Required when the model is not in the known-models table' README.md

Repository: rohitg00/agentmemory

Length of output: 1006


🏁 Script executed:

# Check KNOWN_DIMS definition and what models are included
rg -n -B5 -A10 'const KNOWN_DIMS|KNOWN_DIMS\s*=' src/providers/embedding/local.ts

Repository: rohitg00/agentmemory

Length of output: 605


🏁 Script executed:

# Check if provider.dimensions is used elsewhere and could cause breakage
rg -n 'provider\.dimensions|\.dimensions' src/ --type ts

Repository: rohitg00/agentmemory

Length of output: 1440


🏁 Script executed:

# Search for any TODO, FIXME, or issue comments about embedding dimensions
rg -n 'TODO|FIXME|BUG.*dimension|dimension.*error|unknown.*model' src/providers/embedding/local.ts

Repository: rohitg00/agentmemory

Length of output: 46


🏁 Script executed:

# Verify parseInt behavior with edge cases
node <<'JS'
console.log("parseInt('123abc', 10):", parseInt('123abc', 10));
console.log("parseInt('12.34', 10):", parseInt('12.34', 10));
console.log("parseInt('  456  ', 10):", parseInt('  456  ', 10));
console.log("Number.isFinite(parseInt('123abc', 10)):", Number.isFinite(parseInt('123abc', 10)));
console.log("Number('123abc'):", Number('123abc'));
console.log("Number('456'):", Number('456'));
JS

Repository: rohitg00/agentmemory

Length of output: 234


Dimension resolution accepts malformed overrides and silently defaults unknown models, breaking vector search contracts.

The current implementation has two issues:

  1. parseInt accepts partial numeric strings (e.g., "123abc"123) and decimals (e.g., "12.34"12), which bypasses validation since Number.isFinite() returns true for the parsed integer result.

  2. Line 48 silently defaults to 384 for unknown models, contradicting the README which documents that OPENAI_EMBEDDING_DIMENSIONS is "Required when the model is not in the known-models table". This causes runtime failures in src/functions/search.ts and src/functions/migrate-vector-index.ts, which validate that embedding.length === provider.dimensions.

Use strict regex validation for the override and throw an error for unknown models instead of falling back to a default dimension value.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/providers/embedding/local.ts` around lines 35 - 49, The resolveDimensions
function has two issues that need fixing. First, replace the parseInt and
Number.isFinite validation for the override parameter with strict regex
validation that ensures the entire override string is a positive integer with no
partial matches or decimals allowed (reject strings like "123abc" or "12.34").
Second, replace the fallback logic at the end where KNOWN_DIMS[modelName] ??
DEFAULT_DIMS returns a default dimension value for unknown models—instead,
explicitly check if the model exists in KNOWN_DIMS and throw an error if it does
not, since unknown models require the OPENAI_EMBEDDING_DIMENSIONS override to be
set.


export class LocalEmbeddingProvider implements EmbeddingProvider {
readonly name = "local";
readonly dimensions = 384;
readonly dimensions: number;
private modelName: string;
private extractor: Awaited<ReturnType<Pipeline>> | null = null;

constructor() {
this.modelName = getEnvVar("EMBEDDING_MODEL") || DEFAULT_MODEL;
this.dimensions = resolveDimensions(
this.modelName,
getEnvVar("OPENAI_EMBEDDING_DIMENSIONS"),
);
}

async embed(text: string): Promise<Float32Array> {
const [result] = await this.embedBatch([text]);
return result;
Expand Down Expand Up @@ -45,7 +92,8 @@ export class LocalEmbeddingProvider implements EmbeddingProvider {

this.extractor = await transformers.pipeline(
"feature-extraction",
"Xenova/all-MiniLM-L6-v2",
this.modelName,
{ local_files_only: true, quantized: false },
);
return this.extractor;
}
Expand Down