Skip to content

feat(plugin-openai): add rate limiter, improve embeddings, and update…#24

Open
odilitime wants to merge 1 commit into1.xfrom
odi-dev
Open

feat(plugin-openai): add rate limiter, improve embeddings, and update…#24
odilitime wants to merge 1 commit into1.xfrom
odi-dev

Conversation

@odilitime
Copy link
Member

@odilitime odilitime commented Feb 14, 2026

… types

Add rate-limiter utility with per-endpoint backoff and retry logic. Improve embedding handler with better error handling and fallback support. Refactor init, audio, image, object, and text model handlers for @elizaos/core API compatibility. Expand types and add banner. Update README with comprehensive documentation.


Note

Medium Risk
Touches all OpenAI model call paths and introduces new retry/throttling and 429 classification logic, which could impact latency and error handling behavior under load. Startup behavior and TypeScript typing are also changed, but mostly in a controlled, additive way.

Overview
Adds a new process-wide rate limiter (withRateLimit/acquireRateLimit) with per-category sliding-window throttling, exponential backoff using Retry-After, one-shot tier logging, and fail-fast billing/quota 429 detection via QuotaExceededError.

Updates embeddings/images/audio/object/text handlers to use the limiter and to await throwIfRateLimited() for typed 429 handling; embeddings also gain synthetic fast-paths (null/empty/"test") to avoid startup probe API calls and return fallback vectors on dimension mismatch.

Changes plugin init to be non-blocking and purely synchronous (removes eager GET /models validation), adds a startup config banner (src/banner.ts), introduces local core-compat streaming shims (StreamingTextParams/TextStreamResult), DRYs model registration with a Plugin["models"] type assertion, switches @elizaos/core dependency to workspace:*, and expands docs with a new CHANGELOG.md and README sections for banner/rate limiting/compatibility.

Written by Cursor Bugbot for commit 7542c67. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features

    • Startup banner displaying API key status, base URL, and models
    • Automatic account tier detection on first API call
    • Built-in rate limiting with automatic retries and exponential backoff
    • Billing quota error detection with direct dashboard link
    • Optimized embeddings to avoid unnecessary API calls
  • Bug Fixes

    • Faster startup by removing unnecessary API validation
  • Documentation

    • Expanded README with rate limiting details, billing handling, configuration options, examples, and architecture overview

Greptile Overview

Greptile Summary

This PR adds a comprehensive rate limiting system to prevent 429 errors from OpenAI's API and improves model handlers with better error handling and caching.

Major Changes:

  • New sliding-window rate limiter with per-category tracking (embeddings, chat, images, audio) and exponential backoff with automatic retry logic
  • Removed eager API validation during initialization to prevent race conditions with embedding dimension probes
  • Enhanced embedding handler with synthetic vectors for test probes, avoiding unnecessary API calls during startup
  • Added persistent caching for image descriptions and transcriptions with content-based hashing
  • Refactored all model handlers to use rate limiting wrapper with quota detection
  • Added forward-compatibility type shims for streaming support across @elizaos/core versions
  • New startup banner displaying configuration status

Technical Highlights:

  • Rate limiter uses module-level singleton pattern to preserve state across agent reinitializations
  • Image description caching includes ETag/Last-Modified headers for content-aware cache keys
  • Transcription caching uses optimized blob sampling (start/middle/end slices) to minimize memory usage
  • Quota exhaustion (billing) errors are distinguished from transient rate limit errors to fail fast
  • Comprehensive documentation with detailed rationale comments throughout

Confidence Score: 4/5

  • Safe to merge with one minor logic issue in rate-limiter retry-after header parsing
  • Comprehensive implementation with excellent documentation and clear architectural decisions. One minor bug in retry-after header parsing that could pass NaN to backoff logic. All other code is well-structured with proper error handling, caching strategies, and type safety
  • Pay attention to src/utils/rate-limiter.ts:338 - the retry-after header parsing needs the parentheses fix to handle invalid header values correctly

Important Files Changed

Filename Overview
src/utils/rate-limiter.ts New rate limiter with sliding window, exponential backoff, and per-category tracking. Well-documented implementation with quota detection.
src/models/embedding.ts Enhanced with rate limiting and improved error handling. Synthetic embeddings for test probes avoid unnecessary API calls.
src/models/image.ts Added rate limiting, caching with content-aware keys, and size validation. Comprehensive error handling for image operations.
src/models/text.ts Refactored to use rate limiter with different strategies for streaming vs non-streaming calls. Clean implementation.
src/models/audio.ts Added transcription caching with content-based hashing and rate limiting. Efficient blob sampling for cache keys.
src/init.ts Removed eager API validation to avoid startup race conditions. Synchronous config validation only.

Flowchart

flowchart TD
    Start[API Call Request] --> CheckCategory{Determine Category<br/>chat/embeddings/images/audio}
    CheckCategory --> AcquireSlot[Acquire Rate Limit Slot]
    
    AcquireSlot --> CheckBackoff{Active Backoff?}
    CheckBackoff -->|Yes| WaitBackoff[Wait for Backoff]
    CheckBackoff -->|No| CheckWindow{Within RPM Limit?}
    
    WaitBackoff --> CheckWindow
    CheckWindow -->|No| WaitWindow[Wait for Window Slot]
    CheckWindow -->|Yes| RecordRequest[Record Request Timestamp]
    
    WaitWindow --> RecordRequest
    RecordRequest --> MakeCall[Execute API Call]
    
    MakeCall --> CheckResponse{Response Status}
    CheckResponse -->|200 OK| Success[Return Result]
    CheckResponse -->|429 Quota| QuotaError[Throw QuotaExceededError<br/>No Retry]
    CheckResponse -->|429 Rate Limit| RateError{Retries Left?}
    CheckResponse -->|Other Error| OtherError[Throw Error]
    
    RateError -->|Yes| RecordBackoff[Record Backoff Deadline<br/>Exponential + Jitter]
    RateError -->|No| MaxRetries[Throw Max Retries Error]
    
    RecordBackoff --> Sleep[Sleep with Backoff]
    Sleep --> AcquireSlot
    
    Success --> End[Complete]
    QuotaError --> End
    OtherError --> End
    MaxRetries --> End
Loading

Last reviewed commit: 7542c67

… types

Add rate-limiter utility with per-endpoint backoff and retry logic.
Improve embedding handler with better error handling and fallback support.
Refactor init, audio, image, object, and text model handlers for
@elizaos/core API compatibility. Expand types and add banner. Update README
with comprehensive documentation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings February 14, 2026 22:30
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 14, 2026

Walkthrough

This PR introduces a comprehensive rate-limiting system with per-category sliding-window RPM tracking, billing error detection and fail-fast behavior, startup diagnostics including a configuration banner and tier detection, non-blocking initialization to reduce startup latency, synthetic embedding fast-paths, forward/backward compatibility type shims for core versions, and a DRY model registration refactor replacing wrapper lambdas with direct function references.

Changes

Cohort / File(s) Summary
Rate Limiting & Billing Error Detection
src/utils/rate-limiter.ts, src/utils/index.ts
New rate-limiter module with RateLimitError/QuotaExceededError classes, per-category sliding-window RPM tracking, exponential backoff with Retry-After support, withRateLimit/acquireRateLimit wrappers, throwIfRateLimited for 429 classification, and environment-based configuration (OPENAI_RATE_LIMIT_RPM, OPENAI_RATE_LIMIT_MAX_RETRIES).
Startup & Diagnostics
src/banner.ts, src/init.ts
New banner module displaying masked API key, base URL, models, and billing dashboard link. Removed eager /models validation fetch; initialization now synchronous with printOpenAiBanner call and debug logging. Added logTierOnce for rate-limit header parsing.
Model Handler Integration
src/models/embedding.ts, src/models/audio.ts, src/models/image.ts, src/models/object.ts, src/models/text.ts
All handlers now wrapped with withRateLimit or acquireRateLimit; added throwIfRateLimited and logTierOnce calls. Embedding handlers include synthetic fast-paths (null params, empty text, "test" keyword) returning zero-filled vectors with markers to avoid API calls. Text handlers now use local StreamingTextParams/TextStreamResult shims instead of core types.
Type System & Compatibility
src/types/index.ts
Added StreamingTextParams and TextStreamResult forward/backward compatibility shims extending/mirroring GenerateTextParams and streaming result structures to support older core versions.
Plugin Registration & Refactor
src/index.ts
Model mapping refactored from per-model async wrapper lambdas to direct function references (e.g., [ModelType.TEXT_EMBEDDING]: handleTextEmbedding) with type assertion workaround. Updated streaming test flow to use StreamingTextParams variable to bypass excess-property checks.
Dependency & Documentation
package.json, CHANGELOG.md, README.md
Updated @elizaos/core from "^1.7.0" to "workspace:*". Added extensive CHANGELOG entries for all new features. README updated with sections on startup banner, tier detection, rate limiting, configuration options, examples, and architecture overview.

Sequence Diagram

sequenceDiagram
    participant Client as Client Request
    participant Handler as Model Handler
    participant Limiter as Rate Limiter
    participant API as OpenAI API
    participant Tier as Tier Detection

    Client->>Handler: Call handleTextEmbedding(params)
    Handler->>Limiter: withRateLimit("embeddings", fn)
    Limiter->>Limiter: Check sliding window<br/>& backoff deadline
    Limiter->>Handler: Proceed to API call
    Handler->>API: POST /embeddings
    API-->>Handler: 200 OK + headers
    Handler->>Tier: logTierOnce(response)
    Tier-->>Tier: Parse x-ratelimit-limit-*<br/>Log tier once
    Handler-->>Client: Return embedding vector
    
    note over Limiter: On 429: throwIfRateLimited<br/>classifies as quota (fail-fast)<br/>or rate-limit (retry with backoff)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A rabbit's delight in this pull so grand,
With rate limits sliding across the land,
Embeddings fast via synthetic cheer,
No startup delays, just banners so clear!
Billing errors caught with a billing-wise leap,
Threading through handlers, the changes run deep. 🎉

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(plugin-openai): add rate limiter, improve embeddings, and update types' is partially related to the changeset. While it captures some key changes (rate limiter and embeddings improvements), it misses significant aspects like the startup banner, tier detection, billing error handling, synthetic embeddings fast-paths, and non-blocking plugin initialization that represent substantial portions of the work.
Docstring Coverage ✅ Passed Docstring coverage is 86.36% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into 1.x

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch odi-dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

}

const retryAfter = response.headers.get("retry-after");
const retryMs = retryAfter ? parseFloat(retryAfter) * 1000 || undefined : undefined;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseFloat on retry-after can return NaN if header value is invalid, then multiplying by 1000 still gives NaN, and the || undefined fallback never triggers (NaN is truthy in JS)

Suggested change
const retryMs = retryAfter ? parseFloat(retryAfter) * 1000 || undefined : undefined;
const retryMs = retryAfter ? (parseFloat(retryAfter) * 1000 || undefined) : undefined;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/models/image.ts (1)

269-271: ⚠️ Potential issue | 🟡 Minor

Consider future-proofing the model version check with a regex pattern.

The current hardcoded checks for gpt-5, o1, and o3 will miss future o-series models (e.g., o4, o5). Per OpenAI's API documentation, all o-series reasoning models and GPT-5 models require max_completion_tokens instead of the deprecated max_tokens. Update the check to use a pattern that matches all future versions:

-  const useMaxCompletionTokens = modelName.startsWith("gpt-5") ||
-    modelName.startsWith("o1") ||
-    modelName.startsWith("o3");
+  // Models requiring max_completion_tokens instead of max_tokens (o-series and gpt-5)
+  const useMaxCompletionTokens = /^(gpt-5|o[0-9]+)/.test(modelName);
src/index.ts (1)

332-365: ⚠️ Potential issue | 🔴 Critical

Streaming test is broken: stream: true is missing, and onStreamChunk callback is never invoked.

streamParams at line 343 sets prompt and onStreamChunk but does not set stream: true. In generateTextByModelType (src/models/text.ts line 66), the check if (params.stream) gates the streaming path. Since stream is undefined, the non-streaming generateText path is taken, which never invokes onStreamChunk. Additionally, onStreamChunk is not forwarded to generateParams (lines 44–54), so it would not be passed to the underlying Vercel AI SDK functions even if the streaming path were entered.

The assertion at line 356–358 (chunks.length === 0 → throw) will always fail because the chunks array remains empty.

Proposed fix
              const streamParams: StreamingTextParams = {
                prompt: "Count from 1 to 5.",
+               stream: true,
                onStreamChunk: (chunk: string) => {
                  chunks.push(chunk);
                },
              };

With stream: true added, result will be a TextStreamResult (not a string), requiring the downstream code to consume result.textStream as an AsyncIterable<string> rather than using the onStreamChunk callback pattern. The current assertions casting to string and calling .substring() will fail.

🤖 Fix all issues with AI agents
In `@package.json`:
- Line 47: The package.json currently lists the dependency "@elizaos/core":
"workspace:*", which is a pnpm workspace-only spec and will break installs for
consumers—replace the "workspace:*" value with a concrete semver range (e.g.
"^1.7.2" or "^0.1.9") for the "@elizaos/core" dependency entry in package.json,
update the lockfile by reinstalling (npm install or bun install) and commit the
changed package.json and lockfile; ensure any CI/publish workflows reference the
updated version rather than a workspace protocol.

In `@src/models/embedding.ts`:
- Around line 153-164: The current check in the embedding validation
(variables/functions: embedding, embeddingDimension, logger.error,
syntheticEmbedding) logs a mismatch and returns a synthetic vector which
silently corrupts the embedding store; change this to fail fast by throwing a
clear, descriptive error (including actual length and expected
embeddingDimension) instead of returning synthetic data, or make this behavior
configurable (e.g., add an option/flag like throwOnDimensionMismatch to the
embedding creation/validation path) so callers can opt into strict validation;
update callers/tests to handle the thrown error or pass the new option as
appropriate.
- Around line 101-106: The current short-circuit "if (text === 'test')" in the
embedding creation path returns a synthetic embedding for any literal "test"
input; change this to only trigger for the probe sentinel by verifying the
original param is the special probe object shape (not just the string).
Concretely, update the condition around where text is extracted (the code that
handles string vs object params) to check that the incoming arg is an object and
contains the probe marker (e.g., a dedicated boolean/string sentinel property
used by ensureEmbeddingDimension), and only then call
syntheticEmbedding(embeddingDimension, 0.1); otherwise continue to call the real
embedding path. Reference symbols: syntheticEmbedding, embeddingDimension, and
ensureEmbeddingDimension probe.
🧹 Nitpick comments (7)
src/banner.ts (2)

84-88: Long values will overflow the table row and break alignment.

padEnd doesn't truncate — if ${value} ${status} exceeds COL2_WIDTH - 1 (41 chars), the row will be wider than the table borders. A custom base URL like https://my-company-proxy.internal.example.com/v1 would break the layout.

Consider truncating or ellipsizing the value to fit:

Proposed fix
   function row(label: string, value: string, status: string): string {
-    const rightContent = `${value} ${status}`.padEnd(COL2_WIDTH - 1);
+    const maxValueLen = COL2_WIDTH - 1 - status.length - 1; // 1 for space before status
+    const truncatedValue = value.length > maxValueLen
+      ? value.slice(0, maxValueLen - 1) + "…"
+      : value;
+    const rightContent = `${truncatedValue} ${status}`.padEnd(COL2_WIDTH - 1);
     const leftContent = label.padEnd(COL1_WIDTH - 1);
     return `| ${leftContent}| ${rightContent}|`;
   }

93-104: Consider using logger instead of console.log for consistency with the rest of the codebase.

Direct console.log bypasses any structured logging, log-level filtering, or log routing the host application may have configured. If the logger adds unwanted prefixes that break table formatting, this is a reasonable trade-off — but worth a brief inline comment explaining why console.log is used intentionally.

src/models/embedding.ts (1)

141-149: Type assertion on response JSON lacks full runtime validation.

The as { data: [...] } cast at line 141 trusts the response shape. The check at line 146 only validates data.data[0].embedding exists but doesn't verify it's actually a number[]. If the API returns an unexpected shape (e.g., embedding is a string), this would propagate garbage downstream silently.

This is acceptable risk given OpenAI's stable API, but worth noting.

src/models/image.ts (1)

326-333: Logging full response headers in error context could leak sensitive data in some proxy configurations.

Object.fromEntries(response.headers.entries()) logs all response headers. While OpenAI response headers are typically safe, custom proxies or intermediaries might inject sensitive headers. Consider filtering to known-useful headers.

README.md (1)

75-87: Add language specifiers to fenced code blocks flagged by markdownlint.

Lines 75, 93, 118, and 269 have fenced code blocks without a language specifier (markdownlint MD040). Use text for plaintext output blocks:

Fix
- ```
+ ```text
  +----------------------------------------------------------------------+
- ```
+ ```text
  [OpenAI] Account tier detected: 10000 RPM, 2000000 TPM
- ```
+ ```text
  [OpenAI] Quota exceeded -- your API key has no remaining credits.
- ```
+ ```text
  src/

Also applies to: 93-95, 118-121, 269-289

src/utils/rate-limiter.ts (2)

154-188: Minor race condition in acquire() under concurrent calls for the same category.

When multiple async calls to acquire(category) interleave (e.g., embeddings burst), each call creates its own recent array from filter() at line 170, checks capacity, pushes a timestamp, and writes back via this.windows.set(). The last writer wins, so earlier timestamps can be lost — allowing slightly more requests than the RPM limit.

For a client-side rate limiter this is acceptable (the real enforcement is server-side), but if you want tighter accuracy, consider mutating the stored array in-place rather than replacing it:

Optional tightening
    const maxRPM = this.getRPM(category);
    const cutoff = Date.now() - WINDOW_MS;
-   const timestamps = this.windows.get(category) ?? [];
-   const recent = timestamps.filter((t) => t > cutoff);
+   let recent = this.windows.get(category);
+   if (!recent) {
+     recent = [];
+     this.windows.set(category, recent);
+   }
+   // Prune in-place
+   let writeIdx = 0;
+   for (let i = 0; i < recent.length; i++) {
+     if (recent[i] > cutoff) recent[writeIdx++] = recent[i];
+   }
+   recent.length = writeIdx;

    if (recent.length >= maxRPM) {
      const oldestInWindow = recent[0];
      const waitMs = oldestInWindow + WINDOW_MS - Date.now() + 50;
      if (waitMs > 0) {
        logger.warn(
          `[OpenAI:RateLimit] ${category}: ${recent.length}/${maxRPM} RPM, throttling ${waitMs}ms`,
        );
        await sleep(waitMs);
      }
    }

    recent.push(Date.now());
-   this.windows.set(category, recent);

357-384: String-matching fallback in extractRateLimitInfo could produce false positives on unrelated errors containing "429".

An error message like "Server returned 4290 items" would match msg.includes("429"). This is unlikely with OpenAI errors but could trip on custom proxy error messages. A word-boundary check would be more precise:

Optional tightening
    if (
-     msg.includes("429") ||
+     /\b429\b/.test(msg) ||
      msg.includes("rate limit") ||
      msg.includes("too many requests") ||
      isBilling
    ) {

"dependencies": {
"@ai-sdk/openai": "^2.0.32",
"@elizaos/core": "^1.7.0",
"@elizaos/core": "workspace:*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for workspace configuration at repo root
echo "=== Checking for workspace config files ==="
ls -la | grep -E "(pnpm-workspace|lerna|yarn|package-lock)" || echo "No standard workspace config files at root"

echo -e "\n=== Checking root package.json ==="
if [ -f "package.json" ]; then
  head -50 package.json
fi

echo -e "\n=== Looking for pnpm-workspace.yaml ==="
find . -maxdepth 2 -name "pnpm-workspace.yaml" -o -name "pnpm-workspace.yml" | head -5

echo -e "\n=== Checking if `@elizaos/core` exists as local package ==="
find . -maxdepth 4 -type f -name "package.json" -exec grep -l '"name".*@elizaos/core' {} \;

echo -e "\n=== Checking for CI/publish scripts ==="
find . -maxdepth 2 -type f \( -name "*.yml" -o -name "*.yaml" \) | grep -E "(github|gitlab|ci|publish|release)" | head -10

Repository: elizaos-plugins/plugin-openai

Length of output: 1771


🏁 Script executed:

#!/bin/bash
echo "=== Checking for GitHub workflows ==="
find . -path "./.git" -prune -o -type f -name "*.yml" -o -name "*.yaml" | grep -E "(workflow|ci|publish)" | head -20

echo -e "\n=== Checking .github directory ==="
ls -la .github/ 2>/dev/null || echo "No .github directory"
find .github -type f -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -10

echo -e "\n=== Checking for any root-level monorepo indicators ==="
ls -la | grep -E "(workspace|lerna|monorepo|pnpm)"

echo -e "\n=== Checking if `@elizaos/core` is on npm ==="
curl -s "https://registry.npmjs.org/@elizaos/core" 2>/dev/null | head -100

echo -e "\n=== Checking package-lock.json or pnpm-lock.yaml ==="
ls -la | grep -E "lock|pnpm"

echo -e "\n=== Recent git log for package.json changes ==="
git log --oneline -10 -- package.json 2>/dev/null | head -5

Repository: elizaos-plugins/plugin-openai

Length of output: 50387


🏁 Script executed:

cat .github/workflows/npm-deploy.yml

Repository: elizaos-plugins/plugin-openai

Length of output: 3697


🏁 Script executed:

#!/bin/bash
echo "=== Checking bunfig or any bun configuration ==="
ls -la bunfig* 2>/dev/null || echo "No bunfig found"

echo -e "\n=== Checking if there's a parent directory monorepo structure ==="
cd .. && ls -la | head -20

echo -e "\n=== Checking .npmrc or similar ==="
cd - > /dev/null
find . -maxdepth 2 -name ".npmrc" -o -name ".yarnrc" | head -5

echo -e "\n=== Searching for workspace references in comments/docs ==="
grep -r "workspace:\*" . --include="*.md" --include="*.txt" 2>/dev/null | head -5

Repository: elizaos-plugins/plugin-openai

Length of output: 880


workspace:* is incompatible with this standalone repository and will break installations for consumers.

The package uses the workspace:* protocol for @elizaos/core, which is a pnpm-specific monorepo feature. This repository has no pnpm-workspace.yaml or workspace configuration, and the publish workflow uses bun publish (which delegates to npm), not pnpm publish. When published, npm will not rewrite workspace:* to a concrete version—consumers will receive an unresolvable dependency that fails npm install.

Change to a concrete version range for @elizaos/core (e.g., ^1.7.2 or ^0.1.9) to match the latest published version.

🤖 Prompt for AI Agents
In `@package.json` at line 47, The package.json currently lists the dependency
"@elizaos/core": "workspace:*", which is a pnpm workspace-only spec and will
break installs for consumers—replace the "workspace:*" value with a concrete
semver range (e.g. "^1.7.2" or "^0.1.9") for the "@elizaos/core" dependency
entry in package.json, update the lockfile by reinstalling (npm install or bun
install) and commit the changed package.json and lockfile; ensure any CI/publish
workflows reference the updated version rather than a workspace protocol.

Comment on lines +101 to +106
if (text === "test") {
logger.debug(
`[OpenAI] Returning synthetic embedding for dimension probe (dim=${embeddingDimension})`,
);
return syntheticEmbedding(embeddingDimension, 0.1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's look at the embedding.ts file to understand the context
head -150 src/models/embedding.ts | tail -60

Repository: elizaos-plugins/plugin-openai

Length of output: 2359


🏁 Script executed:

#!/bin/bash
# Search for ensureEmbeddingDimension to see how it calls the embedding function
rg -n 'ensureEmbeddingDimension' --type=ts -C 5

Repository: elizaos-plugins/plugin-openai

Length of output: 1739


🏁 Script executed:

#!/bin/bash
# Look for the actual embedding function signature and its parameters
rg -n 'function.*embedding|const.*embedding.*=' src/models/embedding.ts -A 3

Repository: elizaos-plugins/plugin-openai

Length of output: 656


🏁 Script executed:

#!/bin/bash
# Search for any legitimate uses of embedding the string "test"
rg -n '"test"' src/models/embedding.ts -C 2

Repository: elizaos-plugins/plugin-openai

Length of output: 582


🏁 Script executed:

#!/bin/bash
# Find the syntheticEmbedding function definition
rg -n 'syntheticEmbedding' src/models/embedding.ts -B 2 -A 10

Repository: elizaos-plugins/plugin-openai

Length of output: 2779


🏁 Script executed:

#!/bin/bash
# Get the full function signature of the embedding function
rg -n 'export.*function|export.*const.*=' src/models/embedding.ts -A 8 | head -40

Repository: elizaos-plugins/plugin-openai

Length of output: 357


🏁 Script executed:

#!/bin/bash
# Search the entire codebase for embedText or similar to see how this function is called
rg -n 'embedText|embedding.*(' --type=ts | grep -v node_modules | head -20

Repository: elizaos-plugins/plugin-openai

Length of output: 167


🏁 Script executed:

#!/bin/bash
# Search for how handleTextEmbedding is called
rg -n 'handleTextEmbedding|TEXT_EMBEDDING' --type=ts | head -30

Repository: elizaos-plugins/plugin-openai

Length of output: 908


🏁 Script executed:

#!/bin/bash
# Look for test files that might legitimately embed "test" string
find . -path ./node_modules -prune -o -name '*.test.ts' -o -name '*.spec.ts' | head -20

Repository: elizaos-plugins/plugin-openai

Length of output: 99


🏁 Script executed:

#!/bin/bash
# Search for "test" in test files to see if it's used as embedding text
rg -n '"test"' --type=ts -l | grep -i test | head -10

Repository: elizaos-plugins/plugin-openai

Length of output: 55


🏁 Script executed:

#!/bin/bash
# Check the interface/type definition for TextEmbeddingParams
rg -n 'TextEmbeddingParams' --type=ts -A 5

Repository: elizaos-plugins/plugin-openai

Length of output: 1495


🏁 Script executed:

#!/bin/bash
# Check the test file to see if it legitimately embeds "test"
cat -n __tests__/plugin.test.ts

Repository: elizaos-plugins/plugin-openai

Length of output: 8611


🏁 Script executed:

#!/bin/bash
# Also search for any core imports or references that might show how ensureEmbeddingDimension works
rg -n 'ensureEmbedding|dimension.*probe' --type=ts

Repository: elizaos-plugins/plugin-openai

Length of output: 593


Short-circuiting on text === "test" will incorrectly handle any production embedding request for the literal string "test", returning a synthetic zero-vector instead of a real embedding.

While the code comments explain this is intentional for the core runtime's startup ensureEmbeddingDimension() probe, the check is too broad. Any user code that happens to embed "test" will silently get a synthetic vector (zero-filled with marker 0.1), corrupting similarity search results.

Since the code already extracts text from both string and object params (lines 78–81), tighten the short-circuit to match the documented object shape:

-  if (text === "test") {
+  // Only short-circuit the exact probe shape the core sends at startup: { text: 'test' }
+  if (text === "test" && typeof params === "object" && params !== null) {
     logger.debug(
       `[OpenAI] Returning synthetic embedding for dimension probe (dim=${embeddingDimension})`,
     );
     return syntheticEmbedding(embeddingDimension, 0.1);
   }

Even with this narrowing, the risk remains if someone legitimately passes { text: "test" }. Ideally, the core's probe should use a dedicated marker property or sentinel value rather than matching user-facing text content.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (text === "test") {
logger.debug(
`[OpenAI] Returning synthetic embedding for dimension probe (dim=${embeddingDimension})`,
);
return syntheticEmbedding(embeddingDimension, 0.1);
}
// Only short-circuit the exact probe shape the core sends at startup: { text: 'test' }
if (text === "test" && typeof params === "object" && params !== null) {
logger.debug(
`[OpenAI] Returning synthetic embedding for dimension probe (dim=${embeddingDimension})`,
);
return syntheticEmbedding(embeddingDimension, 0.1);
}
🤖 Prompt for AI Agents
In `@src/models/embedding.ts` around lines 101 - 106, The current short-circuit
"if (text === 'test')" in the embedding creation path returns a synthetic
embedding for any literal "test" input; change this to only trigger for the
probe sentinel by verifying the original param is the special probe object shape
(not just the string). Concretely, update the condition around where text is
extracted (the code that handles string vs object params) to check that the
incoming arg is an object and contains the probe marker (e.g., a dedicated
boolean/string sentinel property used by ensureEmbeddingDimension), and only
then call syntheticEmbedding(embeddingDimension, 0.1); otherwise continue to
call the real embedding path. Reference symbols: syntheticEmbedding,
embeddingDimension, and ensureEmbeddingDimension probe.

Comment on lines +153 to +164
if (
!Array.isArray(embedding) ||
embedding.length !== embeddingDimension
) {
// Dimension mismatch — likely OPENAI_EMBEDDING_DIMENSIONS doesn't match
// the model's actual output. Log it but return a usable vector so the
// agent doesn't crash; the user should fix their config.
logger.error(
`Embedding length ${embedding?.length ?? 0} does not match configured dimension ${embeddingDimension}`,
);
return syntheticEmbedding(embeddingDimension, 0.4);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dimension mismatch returns a synthetic embedding instead of throwing — this silently degrades search quality.

When the API returns a vector with a different length than the configured embeddingDimension, the code logs an error but returns a synthetic 0.4-marker vector. This means the embedding store receives a meaningless vector for real user content, which silently corrupts similarity search results. The user would have no way to know their queries are returning garbage matches.

Consider throwing here (or at least making this a configurable behavior), since a dimension mismatch is always a misconfiguration that should be fixed immediately — not papered over.

🤖 Prompt for AI Agents
In `@src/models/embedding.ts` around lines 153 - 164, The current check in the
embedding validation (variables/functions: embedding, embeddingDimension,
logger.error, syntheticEmbedding) logs a mismatch and returns a synthetic vector
which silently corrupts the embedding store; change this to fail fast by
throwing a clear, descriptive error (including actual length and expected
embeddingDimension) instead of returning synthetic data, or make this behavior
configurable (e.g., add an option/flag like throwOnDimensionMismatch to the
embedding creation/validation path) so callers can opt into strict validation;
update callers/tests to handle the thrown error or pass the new option as
appropriate.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a process-level OpenAI rate limiting utility and refactors model handlers/types/docs to improve resiliency (429 handling, tier logging, startup behavior) and compatibility across @elizaos/core versions.

Changes:

  • Introduces a shared rate limiter with per-endpoint RPM windows, 429 retry/backoff, quota detection, and one-shot tier logging.
  • Refactors text/object/image/audio/embedding handlers to use rate limiting and improved error handling (plus synthetic embedding fast-paths).
  • Adds a startup configuration banner, expands local compatibility types, and updates README/CHANGELOG.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/utils/rate-limiter.ts New rate limiter wrapper, 429/quota detection, per-category windows/backoff, tier logging
src/utils/index.ts Re-exports the rate limiter utilities
src/types/index.ts Adds shared types + local shims for streaming compatibility with older core versions
src/models/text.ts Uses limiter for chat calls; adds streaming initiation throttling
src/models/object.ts Wraps object generation in rate-limited retry wrapper
src/models/image.ts Wraps image generation/description with rate limiting + 429 parsing + tier logging
src/models/embedding.ts Adds synthetic embedding fast-paths and wraps API calls with rate limiting
src/models/audio.ts Wraps TTS/transcription with rate limiting + 429 parsing + tier logging
src/init.ts Removes eager API validation; prints banner and logs baseURL
src/index.ts Simplifies model registration; adds streaming test and type shims usage
src/banner.ts Adds startup configuration banner output
package.json Switches @elizaos/core dependency to workspace:*
README.md Documents banner, limiter behavior/config, compatibility notes, model table, architecture
CHANGELOG.md Adds unreleased changelog describing new functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 77 to 85
let text: string;
if (typeof params === "string") {
text = params;
} else if (typeof params === "object" && params.text) {
text = params.text;
} else {
const errorMsg = "Invalid input format for embedding";
logger.warn(errorMsg);
const fallbackVector = Array(embeddingDimension).fill(0);
fallbackVector[0] = 0.2;
return fallbackVector;
logger.warn("Invalid input format for embedding");
return syntheticEmbedding(embeddingDimension, 0.2);
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params.text is checked for truthiness, so { text: "" } is treated as “invalid input format” (marker 0.2) instead of the intended “empty text” path (marker 0.3). Check for the presence of the text property (e.g., 'text' in params) and then handle empty-string via the existing !text.trim() branch.

Copilot uses AI. Check for mistakes.
Comment on lines +343 to +348
const streamParams: StreamingTextParams = {
prompt: "Count from 1 to 5.",
onStreamChunk: (chunk: string) => {
chunks.push(chunk);
},
});
if (!result || result.length === 0) {
};
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming test sets onStreamChunk but never sets stream: true. In this PR’s text handler, streaming only activates when params.stream is truthy, so this test will run the non-streaming path and chunks will remain empty. Either set stream: true here (and adjust assertions/types to match the streaming return shape) or update the handler to treat onStreamChunk as a streaming request.

Copilot uses AI. Check for mistakes.
| `TEXT_LARGE` | gpt-4o | Complex tasks requiring deeper reasoning |
| `TEXT_EMBEDDING` | text-embedding-3-small | Text embedding (1536 dimensions by default) |
| `IMAGE` | gpt-image-1 | DALL-E image generation |
| `IMAGE_DESCRIPTION` | gpt-4o-mini | Image analysis with title/description extraction |
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README says the default IMAGE_DESCRIPTION model is gpt-4o-mini, but the code default is gpt-5-nano (getImageDescriptionModel(..., "gpt-5-nano")). Please update the table to match the actual default (or change the code default if the README is intended).

Suggested change
| `IMAGE_DESCRIPTION` | gpt-4o-mini | Image analysis with title/description extraction |
| `IMAGE_DESCRIPTION` | gpt-5-nano | Image analysis with title/description extraction |

Copilot uses AI. Check for mistakes.
Comment on lines 46 to 48
"@ai-sdk/openai": "^2.0.32",
"@elizaos/core": "^1.7.0",
"@elizaos/core": "workspace:*",
"ai": "^5.0.47",
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing @elizaos/core dependency to "workspace:*" will break consumers installing this plugin from npm (the workspace protocol generally won’t resolve outside a monorepo). If the intent is to support both local workspace dev and published installs, consider restoring a semver range here and/or moving @elizaos/core to peerDependencies with an appropriate supported range.

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +170
// Phase 2: Sliding window — prune old timestamps, check capacity
const maxRPM = this.getRPM(category);
const cutoff = Date.now() - WINDOW_MS;
const timestamps = this.windows.get(category) ?? [];
const recent = timestamps.filter((t) => t > cutoff);
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RateLimiter.acquire() is not concurrency-safe: concurrent callers can read the same timestamps, build independent recent arrays via filter(), then set() back into the map—overwriting each other and undercounting requests. This can cause the limiter to allow bursts well above maxRPM under load. Consider serializing acquires per category (e.g., a per-category mutex/queue) and re-checking the window after any sleep before recording the timestamp.

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +249
const maxRetries =
options?.maxRetries ??
(parseInt(process.env.OPENAI_RATE_LIMIT_MAX_RETRIES ?? "", 10) ||
DEFAULT_MAX_RETRIES);
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPENAI_RATE_LIMIT_MAX_RETRIES set to 0 won’t be respected because parseInt(...) || DEFAULT_MAX_RETRIES treats 0 as falsy. If you want to allow disabling retries, parse the env var and check Number.isFinite / >= 0 explicitly instead of using || for fallback.

Suggested change
const maxRetries =
options?.maxRetries ??
(parseInt(process.env.OPENAI_RATE_LIMIT_MAX_RETRIES ?? "", 10) ||
DEFAULT_MAX_RETRIES);
const envMaxRetriesRaw = process.env.OPENAI_RATE_LIMIT_MAX_RETRIES;
let envMaxRetries: number | undefined;
if (envMaxRetriesRaw !== undefined) {
const parsed = Number(envMaxRetriesRaw);
if (Number.isFinite(parsed) && parsed >= 0) {
envMaxRetries = parsed;
}
}
const maxRetries =
options?.maxRetries ??
envMaxRetries ??
DEFAULT_MAX_RETRIES;

Copilot uses AI. Check for mistakes.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

const COL2_WIDTH = WIDTH - COL1_WIDTH - 3; // 3 = "| " + " |"

const line = `+${"-".repeat(WIDTH - 2)}+`;
const title = "| OpenAI Plugin" + " ".repeat(WIDTH - 2 - "| OpenAI Plugin".length) + "|";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Banner title and billing rows off-by-one width

Low Severity

The title row and billing row are each 71 characters wide instead of the intended 72, causing misalignment with the border/divider lines. The formula WIDTH - 2 - "| OpenAI Plugin".length subtracts 2 for both | delimiters, but the prefix string "| OpenAI Plugin" already includes the leading |. The subtraction should be WIDTH - 1 - prefix.length. The same error occurs in the billingRow calculation.

Additional Locations (1)

Fix in Cursor Fix in Web

const maxRetries =
options?.maxRetries ??
(parseInt(process.env.OPENAI_RATE_LIMIT_MAX_RETRIES ?? "", 10) ||
DEFAULT_MAX_RETRIES);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Env var max retries zero treated as default

Low Severity

Setting OPENAI_RATE_LIMIT_MAX_RETRIES=0 (to disable retries entirely) is silently ignored and falls back to the default of 5 retries. The || operator on line 248 treats the parsed 0 as falsy, so 0 || DEFAULT_MAX_RETRIES evaluates to 5. Using ?? instead of || would correctly preserve 0 as a valid value, consistent with how options?.maxRetries is handled on the line above.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant