Conversation
… types Add rate-limiter utility with per-endpoint backoff and retry logic. Improve embedding handler with better error handling and fallback support. Refactor init, audio, image, object, and text model handlers for @elizaos/core API compatibility. Expand types and add banner. Update README with comprehensive documentation. Co-authored-by: Cursor <cursoragent@cursor.com>
WalkthroughThis PR introduces a comprehensive rate-limiting system with per-category sliding-window RPM tracking, billing error detection and fail-fast behavior, startup diagnostics including a configuration banner and tier detection, non-blocking initialization to reduce startup latency, synthetic embedding fast-paths, forward/backward compatibility type shims for core versions, and a DRY model registration refactor replacing wrapper lambdas with direct function references. Changes
Sequence DiagramsequenceDiagram
participant Client as Client Request
participant Handler as Model Handler
participant Limiter as Rate Limiter
participant API as OpenAI API
participant Tier as Tier Detection
Client->>Handler: Call handleTextEmbedding(params)
Handler->>Limiter: withRateLimit("embeddings", fn)
Limiter->>Limiter: Check sliding window<br/>& backoff deadline
Limiter->>Handler: Proceed to API call
Handler->>API: POST /embeddings
API-->>Handler: 200 OK + headers
Handler->>Tier: logTierOnce(response)
Tier-->>Tier: Parse x-ratelimit-limit-*<br/>Log tier once
Handler-->>Client: Return embedding vector
note over Limiter: On 429: throwIfRateLimited<br/>classifies as quota (fail-fast)<br/>or rate-limit (retry with backoff)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| } | ||
|
|
||
| const retryAfter = response.headers.get("retry-after"); | ||
| const retryMs = retryAfter ? parseFloat(retryAfter) * 1000 || undefined : undefined; |
There was a problem hiding this comment.
parseFloat on retry-after can return NaN if header value is invalid, then multiplying by 1000 still gives NaN, and the || undefined fallback never triggers (NaN is truthy in JS)
| const retryMs = retryAfter ? parseFloat(retryAfter) * 1000 || undefined : undefined; | |
| const retryMs = retryAfter ? (parseFloat(retryAfter) * 1000 || undefined) : undefined; |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/models/image.ts (1)
269-271:⚠️ Potential issue | 🟡 MinorConsider future-proofing the model version check with a regex pattern.
The current hardcoded checks for
gpt-5,o1, ando3will miss future o-series models (e.g.,o4,o5). Per OpenAI's API documentation, all o-series reasoning models and GPT-5 models requiremax_completion_tokensinstead of the deprecatedmax_tokens. Update the check to use a pattern that matches all future versions:- const useMaxCompletionTokens = modelName.startsWith("gpt-5") || - modelName.startsWith("o1") || - modelName.startsWith("o3"); + // Models requiring max_completion_tokens instead of max_tokens (o-series and gpt-5) + const useMaxCompletionTokens = /^(gpt-5|o[0-9]+)/.test(modelName);src/index.ts (1)
332-365:⚠️ Potential issue | 🔴 CriticalStreaming test is broken:
stream: trueis missing, andonStreamChunkcallback is never invoked.
streamParamsat line 343 setspromptandonStreamChunkbut does not setstream: true. IngenerateTextByModelType(src/models/text.ts line 66), the checkif (params.stream)gates the streaming path. Sincestreamisundefined, the non-streaminggenerateTextpath is taken, which never invokesonStreamChunk. Additionally,onStreamChunkis not forwarded togenerateParams(lines 44–54), so it would not be passed to the underlying Vercel AI SDK functions even if the streaming path were entered.The assertion at line 356–358 (
chunks.length === 0→ throw) will always fail because the chunks array remains empty.Proposed fix
const streamParams: StreamingTextParams = { prompt: "Count from 1 to 5.", + stream: true, onStreamChunk: (chunk: string) => { chunks.push(chunk); }, };With
stream: trueadded,resultwill be aTextStreamResult(not a string), requiring the downstream code to consumeresult.textStreamas anAsyncIterable<string>rather than using theonStreamChunkcallback pattern. The current assertions casting tostringand calling.substring()will fail.
🤖 Fix all issues with AI agents
In `@package.json`:
- Line 47: The package.json currently lists the dependency "@elizaos/core":
"workspace:*", which is a pnpm workspace-only spec and will break installs for
consumers—replace the "workspace:*" value with a concrete semver range (e.g.
"^1.7.2" or "^0.1.9") for the "@elizaos/core" dependency entry in package.json,
update the lockfile by reinstalling (npm install or bun install) and commit the
changed package.json and lockfile; ensure any CI/publish workflows reference the
updated version rather than a workspace protocol.
In `@src/models/embedding.ts`:
- Around line 153-164: The current check in the embedding validation
(variables/functions: embedding, embeddingDimension, logger.error,
syntheticEmbedding) logs a mismatch and returns a synthetic vector which
silently corrupts the embedding store; change this to fail fast by throwing a
clear, descriptive error (including actual length and expected
embeddingDimension) instead of returning synthetic data, or make this behavior
configurable (e.g., add an option/flag like throwOnDimensionMismatch to the
embedding creation/validation path) so callers can opt into strict validation;
update callers/tests to handle the thrown error or pass the new option as
appropriate.
- Around line 101-106: The current short-circuit "if (text === 'test')" in the
embedding creation path returns a synthetic embedding for any literal "test"
input; change this to only trigger for the probe sentinel by verifying the
original param is the special probe object shape (not just the string).
Concretely, update the condition around where text is extracted (the code that
handles string vs object params) to check that the incoming arg is an object and
contains the probe marker (e.g., a dedicated boolean/string sentinel property
used by ensureEmbeddingDimension), and only then call
syntheticEmbedding(embeddingDimension, 0.1); otherwise continue to call the real
embedding path. Reference symbols: syntheticEmbedding, embeddingDimension, and
ensureEmbeddingDimension probe.
🧹 Nitpick comments (7)
src/banner.ts (2)
84-88: Long values will overflow the table row and break alignment.
padEnddoesn't truncate — if${value} ${status}exceedsCOL2_WIDTH - 1(41 chars), the row will be wider than the table borders. A custom base URL likehttps://my-company-proxy.internal.example.com/v1would break the layout.Consider truncating or ellipsizing the value to fit:
Proposed fix
function row(label: string, value: string, status: string): string { - const rightContent = `${value} ${status}`.padEnd(COL2_WIDTH - 1); + const maxValueLen = COL2_WIDTH - 1 - status.length - 1; // 1 for space before status + const truncatedValue = value.length > maxValueLen + ? value.slice(0, maxValueLen - 1) + "…" + : value; + const rightContent = `${truncatedValue} ${status}`.padEnd(COL2_WIDTH - 1); const leftContent = label.padEnd(COL1_WIDTH - 1); return `| ${leftContent}| ${rightContent}|`; }
93-104: Consider usingloggerinstead ofconsole.logfor consistency with the rest of the codebase.Direct
console.logbypasses any structured logging, log-level filtering, or log routing the host application may have configured. If the logger adds unwanted prefixes that break table formatting, this is a reasonable trade-off — but worth a brief inline comment explaining whyconsole.logis used intentionally.src/models/embedding.ts (1)
141-149: Type assertion on response JSON lacks full runtime validation.The
as { data: [...] }cast at line 141 trusts the response shape. The check at line 146 only validatesdata.data[0].embeddingexists but doesn't verify it's actually anumber[]. If the API returns an unexpected shape (e.g., embedding is a string), this would propagate garbage downstream silently.This is acceptable risk given OpenAI's stable API, but worth noting.
src/models/image.ts (1)
326-333: Logging full response headers in error context could leak sensitive data in some proxy configurations.
Object.fromEntries(response.headers.entries())logs all response headers. While OpenAI response headers are typically safe, custom proxies or intermediaries might inject sensitive headers. Consider filtering to known-useful headers.README.md (1)
75-87: Add language specifiers to fenced code blocks flagged by markdownlint.Lines 75, 93, 118, and 269 have fenced code blocks without a language specifier (markdownlint MD040). Use
textfor plaintext output blocks:Fix
- ``` + ```text +----------------------------------------------------------------------+- ``` + ```text [OpenAI] Account tier detected: 10000 RPM, 2000000 TPM- ``` + ```text [OpenAI] Quota exceeded -- your API key has no remaining credits.- ``` + ```text src/Also applies to: 93-95, 118-121, 269-289
src/utils/rate-limiter.ts (2)
154-188: Minor race condition inacquire()under concurrent calls for the same category.When multiple async calls to
acquire(category)interleave (e.g., embeddings burst), each call creates its ownrecentarray fromfilter()at line 170, checks capacity, pushes a timestamp, and writes back viathis.windows.set(). The last writer wins, so earlier timestamps can be lost — allowing slightly more requests than the RPM limit.For a client-side rate limiter this is acceptable (the real enforcement is server-side), but if you want tighter accuracy, consider mutating the stored array in-place rather than replacing it:
Optional tightening
const maxRPM = this.getRPM(category); const cutoff = Date.now() - WINDOW_MS; - const timestamps = this.windows.get(category) ?? []; - const recent = timestamps.filter((t) => t > cutoff); + let recent = this.windows.get(category); + if (!recent) { + recent = []; + this.windows.set(category, recent); + } + // Prune in-place + let writeIdx = 0; + for (let i = 0; i < recent.length; i++) { + if (recent[i] > cutoff) recent[writeIdx++] = recent[i]; + } + recent.length = writeIdx; if (recent.length >= maxRPM) { const oldestInWindow = recent[0]; const waitMs = oldestInWindow + WINDOW_MS - Date.now() + 50; if (waitMs > 0) { logger.warn( `[OpenAI:RateLimit] ${category}: ${recent.length}/${maxRPM} RPM, throttling ${waitMs}ms`, ); await sleep(waitMs); } } recent.push(Date.now()); - this.windows.set(category, recent);
357-384: String-matching fallback inextractRateLimitInfocould produce false positives on unrelated errors containing "429".An error message like
"Server returned 4290 items"would matchmsg.includes("429"). This is unlikely with OpenAI errors but could trip on custom proxy error messages. A word-boundary check would be more precise:Optional tightening
if ( - msg.includes("429") || + /\b429\b/.test(msg) || msg.includes("rate limit") || msg.includes("too many requests") || isBilling ) {
| "dependencies": { | ||
| "@ai-sdk/openai": "^2.0.32", | ||
| "@elizaos/core": "^1.7.0", | ||
| "@elizaos/core": "workspace:*", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for workspace configuration at repo root
echo "=== Checking for workspace config files ==="
ls -la | grep -E "(pnpm-workspace|lerna|yarn|package-lock)" || echo "No standard workspace config files at root"
echo -e "\n=== Checking root package.json ==="
if [ -f "package.json" ]; then
head -50 package.json
fi
echo -e "\n=== Looking for pnpm-workspace.yaml ==="
find . -maxdepth 2 -name "pnpm-workspace.yaml" -o -name "pnpm-workspace.yml" | head -5
echo -e "\n=== Checking if `@elizaos/core` exists as local package ==="
find . -maxdepth 4 -type f -name "package.json" -exec grep -l '"name".*@elizaos/core' {} \;
echo -e "\n=== Checking for CI/publish scripts ==="
find . -maxdepth 2 -type f \( -name "*.yml" -o -name "*.yaml" \) | grep -E "(github|gitlab|ci|publish|release)" | head -10Repository: elizaos-plugins/plugin-openai
Length of output: 1771
🏁 Script executed:
#!/bin/bash
echo "=== Checking for GitHub workflows ==="
find . -path "./.git" -prune -o -type f -name "*.yml" -o -name "*.yaml" | grep -E "(workflow|ci|publish)" | head -20
echo -e "\n=== Checking .github directory ==="
ls -la .github/ 2>/dev/null || echo "No .github directory"
find .github -type f -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -10
echo -e "\n=== Checking for any root-level monorepo indicators ==="
ls -la | grep -E "(workspace|lerna|monorepo|pnpm)"
echo -e "\n=== Checking if `@elizaos/core` is on npm ==="
curl -s "https://registry.npmjs.org/@elizaos/core" 2>/dev/null | head -100
echo -e "\n=== Checking package-lock.json or pnpm-lock.yaml ==="
ls -la | grep -E "lock|pnpm"
echo -e "\n=== Recent git log for package.json changes ==="
git log --oneline -10 -- package.json 2>/dev/null | head -5Repository: elizaos-plugins/plugin-openai
Length of output: 50387
🏁 Script executed:
cat .github/workflows/npm-deploy.ymlRepository: elizaos-plugins/plugin-openai
Length of output: 3697
🏁 Script executed:
#!/bin/bash
echo "=== Checking bunfig or any bun configuration ==="
ls -la bunfig* 2>/dev/null || echo "No bunfig found"
echo -e "\n=== Checking if there's a parent directory monorepo structure ==="
cd .. && ls -la | head -20
echo -e "\n=== Checking .npmrc or similar ==="
cd - > /dev/null
find . -maxdepth 2 -name ".npmrc" -o -name ".yarnrc" | head -5
echo -e "\n=== Searching for workspace references in comments/docs ==="
grep -r "workspace:\*" . --include="*.md" --include="*.txt" 2>/dev/null | head -5Repository: elizaos-plugins/plugin-openai
Length of output: 880
workspace:* is incompatible with this standalone repository and will break installations for consumers.
The package uses the workspace:* protocol for @elizaos/core, which is a pnpm-specific monorepo feature. This repository has no pnpm-workspace.yaml or workspace configuration, and the publish workflow uses bun publish (which delegates to npm), not pnpm publish. When published, npm will not rewrite workspace:* to a concrete version—consumers will receive an unresolvable dependency that fails npm install.
Change to a concrete version range for @elizaos/core (e.g., ^1.7.2 or ^0.1.9) to match the latest published version.
🤖 Prompt for AI Agents
In `@package.json` at line 47, The package.json currently lists the dependency
"@elizaos/core": "workspace:*", which is a pnpm workspace-only spec and will
break installs for consumers—replace the "workspace:*" value with a concrete
semver range (e.g. "^1.7.2" or "^0.1.9") for the "@elizaos/core" dependency
entry in package.json, update the lockfile by reinstalling (npm install or bun
install) and commit the changed package.json and lockfile; ensure any CI/publish
workflows reference the updated version rather than a workspace protocol.
| if (text === "test") { | ||
| logger.debug( | ||
| `[OpenAI] Returning synthetic embedding for dimension probe (dim=${embeddingDimension})`, | ||
| ); | ||
| return syntheticEmbedding(embeddingDimension, 0.1); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's look at the embedding.ts file to understand the context
head -150 src/models/embedding.ts | tail -60Repository: elizaos-plugins/plugin-openai
Length of output: 2359
🏁 Script executed:
#!/bin/bash
# Search for ensureEmbeddingDimension to see how it calls the embedding function
rg -n 'ensureEmbeddingDimension' --type=ts -C 5Repository: elizaos-plugins/plugin-openai
Length of output: 1739
🏁 Script executed:
#!/bin/bash
# Look for the actual embedding function signature and its parameters
rg -n 'function.*embedding|const.*embedding.*=' src/models/embedding.ts -A 3Repository: elizaos-plugins/plugin-openai
Length of output: 656
🏁 Script executed:
#!/bin/bash
# Search for any legitimate uses of embedding the string "test"
rg -n '"test"' src/models/embedding.ts -C 2Repository: elizaos-plugins/plugin-openai
Length of output: 582
🏁 Script executed:
#!/bin/bash
# Find the syntheticEmbedding function definition
rg -n 'syntheticEmbedding' src/models/embedding.ts -B 2 -A 10Repository: elizaos-plugins/plugin-openai
Length of output: 2779
🏁 Script executed:
#!/bin/bash
# Get the full function signature of the embedding function
rg -n 'export.*function|export.*const.*=' src/models/embedding.ts -A 8 | head -40Repository: elizaos-plugins/plugin-openai
Length of output: 357
🏁 Script executed:
#!/bin/bash
# Search the entire codebase for embedText or similar to see how this function is called
rg -n 'embedText|embedding.*(' --type=ts | grep -v node_modules | head -20Repository: elizaos-plugins/plugin-openai
Length of output: 167
🏁 Script executed:
#!/bin/bash
# Search for how handleTextEmbedding is called
rg -n 'handleTextEmbedding|TEXT_EMBEDDING' --type=ts | head -30Repository: elizaos-plugins/plugin-openai
Length of output: 908
🏁 Script executed:
#!/bin/bash
# Look for test files that might legitimately embed "test" string
find . -path ./node_modules -prune -o -name '*.test.ts' -o -name '*.spec.ts' | head -20Repository: elizaos-plugins/plugin-openai
Length of output: 99
🏁 Script executed:
#!/bin/bash
# Search for "test" in test files to see if it's used as embedding text
rg -n '"test"' --type=ts -l | grep -i test | head -10Repository: elizaos-plugins/plugin-openai
Length of output: 55
🏁 Script executed:
#!/bin/bash
# Check the interface/type definition for TextEmbeddingParams
rg -n 'TextEmbeddingParams' --type=ts -A 5Repository: elizaos-plugins/plugin-openai
Length of output: 1495
🏁 Script executed:
#!/bin/bash
# Check the test file to see if it legitimately embeds "test"
cat -n __tests__/plugin.test.tsRepository: elizaos-plugins/plugin-openai
Length of output: 8611
🏁 Script executed:
#!/bin/bash
# Also search for any core imports or references that might show how ensureEmbeddingDimension works
rg -n 'ensureEmbedding|dimension.*probe' --type=tsRepository: elizaos-plugins/plugin-openai
Length of output: 593
Short-circuiting on text === "test" will incorrectly handle any production embedding request for the literal string "test", returning a synthetic zero-vector instead of a real embedding.
While the code comments explain this is intentional for the core runtime's startup ensureEmbeddingDimension() probe, the check is too broad. Any user code that happens to embed "test" will silently get a synthetic vector (zero-filled with marker 0.1), corrupting similarity search results.
Since the code already extracts text from both string and object params (lines 78–81), tighten the short-circuit to match the documented object shape:
- if (text === "test") {
+ // Only short-circuit the exact probe shape the core sends at startup: { text: 'test' }
+ if (text === "test" && typeof params === "object" && params !== null) {
logger.debug(
`[OpenAI] Returning synthetic embedding for dimension probe (dim=${embeddingDimension})`,
);
return syntheticEmbedding(embeddingDimension, 0.1);
}Even with this narrowing, the risk remains if someone legitimately passes { text: "test" }. Ideally, the core's probe should use a dedicated marker property or sentinel value rather than matching user-facing text content.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (text === "test") { | |
| logger.debug( | |
| `[OpenAI] Returning synthetic embedding for dimension probe (dim=${embeddingDimension})`, | |
| ); | |
| return syntheticEmbedding(embeddingDimension, 0.1); | |
| } | |
| // Only short-circuit the exact probe shape the core sends at startup: { text: 'test' } | |
| if (text === "test" && typeof params === "object" && params !== null) { | |
| logger.debug( | |
| `[OpenAI] Returning synthetic embedding for dimension probe (dim=${embeddingDimension})`, | |
| ); | |
| return syntheticEmbedding(embeddingDimension, 0.1); | |
| } |
🤖 Prompt for AI Agents
In `@src/models/embedding.ts` around lines 101 - 106, The current short-circuit
"if (text === 'test')" in the embedding creation path returns a synthetic
embedding for any literal "test" input; change this to only trigger for the
probe sentinel by verifying the original param is the special probe object shape
(not just the string). Concretely, update the condition around where text is
extracted (the code that handles string vs object params) to check that the
incoming arg is an object and contains the probe marker (e.g., a dedicated
boolean/string sentinel property used by ensureEmbeddingDimension), and only
then call syntheticEmbedding(embeddingDimension, 0.1); otherwise continue to
call the real embedding path. Reference symbols: syntheticEmbedding,
embeddingDimension, and ensureEmbeddingDimension probe.
| if ( | ||
| !Array.isArray(embedding) || | ||
| embedding.length !== embeddingDimension | ||
| ) { | ||
| // Dimension mismatch — likely OPENAI_EMBEDDING_DIMENSIONS doesn't match | ||
| // the model's actual output. Log it but return a usable vector so the | ||
| // agent doesn't crash; the user should fix their config. | ||
| logger.error( | ||
| `Embedding length ${embedding?.length ?? 0} does not match configured dimension ${embeddingDimension}`, | ||
| ); | ||
| return syntheticEmbedding(embeddingDimension, 0.4); | ||
| } |
There was a problem hiding this comment.
Dimension mismatch returns a synthetic embedding instead of throwing — this silently degrades search quality.
When the API returns a vector with a different length than the configured embeddingDimension, the code logs an error but returns a synthetic 0.4-marker vector. This means the embedding store receives a meaningless vector for real user content, which silently corrupts similarity search results. The user would have no way to know their queries are returning garbage matches.
Consider throwing here (or at least making this a configurable behavior), since a dimension mismatch is always a misconfiguration that should be fixed immediately — not papered over.
🤖 Prompt for AI Agents
In `@src/models/embedding.ts` around lines 153 - 164, The current check in the
embedding validation (variables/functions: embedding, embeddingDimension,
logger.error, syntheticEmbedding) logs a mismatch and returns a synthetic vector
which silently corrupts the embedding store; change this to fail fast by
throwing a clear, descriptive error (including actual length and expected
embeddingDimension) instead of returning synthetic data, or make this behavior
configurable (e.g., add an option/flag like throwOnDimensionMismatch to the
embedding creation/validation path) so callers can opt into strict validation;
update callers/tests to handle the thrown error or pass the new option as
appropriate.
There was a problem hiding this comment.
Pull request overview
Adds a process-level OpenAI rate limiting utility and refactors model handlers/types/docs to improve resiliency (429 handling, tier logging, startup behavior) and compatibility across @elizaos/core versions.
Changes:
- Introduces a shared rate limiter with per-endpoint RPM windows, 429 retry/backoff, quota detection, and one-shot tier logging.
- Refactors text/object/image/audio/embedding handlers to use rate limiting and improved error handling (plus synthetic embedding fast-paths).
- Adds a startup configuration banner, expands local compatibility types, and updates README/CHANGELOG.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/rate-limiter.ts | New rate limiter wrapper, 429/quota detection, per-category windows/backoff, tier logging |
| src/utils/index.ts | Re-exports the rate limiter utilities |
| src/types/index.ts | Adds shared types + local shims for streaming compatibility with older core versions |
| src/models/text.ts | Uses limiter for chat calls; adds streaming initiation throttling |
| src/models/object.ts | Wraps object generation in rate-limited retry wrapper |
| src/models/image.ts | Wraps image generation/description with rate limiting + 429 parsing + tier logging |
| src/models/embedding.ts | Adds synthetic embedding fast-paths and wraps API calls with rate limiting |
| src/models/audio.ts | Wraps TTS/transcription with rate limiting + 429 parsing + tier logging |
| src/init.ts | Removes eager API validation; prints banner and logs baseURL |
| src/index.ts | Simplifies model registration; adds streaming test and type shims usage |
| src/banner.ts | Adds startup configuration banner output |
| package.json | Switches @elizaos/core dependency to workspace:* |
| README.md | Documents banner, limiter behavior/config, compatibility notes, model table, architecture |
| CHANGELOG.md | Adds unreleased changelog describing new functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let text: string; | ||
| if (typeof params === "string") { | ||
| text = params; | ||
| } else if (typeof params === "object" && params.text) { | ||
| text = params.text; | ||
| } else { | ||
| const errorMsg = "Invalid input format for embedding"; | ||
| logger.warn(errorMsg); | ||
| const fallbackVector = Array(embeddingDimension).fill(0); | ||
| fallbackVector[0] = 0.2; | ||
| return fallbackVector; | ||
| logger.warn("Invalid input format for embedding"); | ||
| return syntheticEmbedding(embeddingDimension, 0.2); | ||
| } |
There was a problem hiding this comment.
params.text is checked for truthiness, so { text: "" } is treated as “invalid input format” (marker 0.2) instead of the intended “empty text” path (marker 0.3). Check for the presence of the text property (e.g., 'text' in params) and then handle empty-string via the existing !text.trim() branch.
| const streamParams: StreamingTextParams = { | ||
| prompt: "Count from 1 to 5.", | ||
| onStreamChunk: (chunk: string) => { | ||
| chunks.push(chunk); | ||
| }, | ||
| }); | ||
| if (!result || result.length === 0) { | ||
| }; |
There was a problem hiding this comment.
The streaming test sets onStreamChunk but never sets stream: true. In this PR’s text handler, streaming only activates when params.stream is truthy, so this test will run the non-streaming path and chunks will remain empty. Either set stream: true here (and adjust assertions/types to match the streaming return shape) or update the handler to treat onStreamChunk as a streaming request.
| | `TEXT_LARGE` | gpt-4o | Complex tasks requiring deeper reasoning | | ||
| | `TEXT_EMBEDDING` | text-embedding-3-small | Text embedding (1536 dimensions by default) | | ||
| | `IMAGE` | gpt-image-1 | DALL-E image generation | | ||
| | `IMAGE_DESCRIPTION` | gpt-4o-mini | Image analysis with title/description extraction | |
There was a problem hiding this comment.
README says the default IMAGE_DESCRIPTION model is gpt-4o-mini, but the code default is gpt-5-nano (getImageDescriptionModel(..., "gpt-5-nano")). Please update the table to match the actual default (or change the code default if the README is intended).
| | `IMAGE_DESCRIPTION` | gpt-4o-mini | Image analysis with title/description extraction | | |
| | `IMAGE_DESCRIPTION` | gpt-5-nano | Image analysis with title/description extraction | |
| "@ai-sdk/openai": "^2.0.32", | ||
| "@elizaos/core": "^1.7.0", | ||
| "@elizaos/core": "workspace:*", | ||
| "ai": "^5.0.47", |
There was a problem hiding this comment.
Changing @elizaos/core dependency to "workspace:*" will break consumers installing this plugin from npm (the workspace protocol generally won’t resolve outside a monorepo). If the intent is to support both local workspace dev and published installs, consider restoring a semver range here and/or moving @elizaos/core to peerDependencies with an appropriate supported range.
| // Phase 2: Sliding window — prune old timestamps, check capacity | ||
| const maxRPM = this.getRPM(category); | ||
| const cutoff = Date.now() - WINDOW_MS; | ||
| const timestamps = this.windows.get(category) ?? []; | ||
| const recent = timestamps.filter((t) => t > cutoff); |
There was a problem hiding this comment.
RateLimiter.acquire() is not concurrency-safe: concurrent callers can read the same timestamps, build independent recent arrays via filter(), then set() back into the map—overwriting each other and undercounting requests. This can cause the limiter to allow bursts well above maxRPM under load. Consider serializing acquires per category (e.g., a per-category mutex/queue) and re-checking the window after any sleep before recording the timestamp.
| const maxRetries = | ||
| options?.maxRetries ?? | ||
| (parseInt(process.env.OPENAI_RATE_LIMIT_MAX_RETRIES ?? "", 10) || | ||
| DEFAULT_MAX_RETRIES); |
There was a problem hiding this comment.
OPENAI_RATE_LIMIT_MAX_RETRIES set to 0 won’t be respected because parseInt(...) || DEFAULT_MAX_RETRIES treats 0 as falsy. If you want to allow disabling retries, parse the env var and check Number.isFinite / >= 0 explicitly instead of using || for fallback.
| const maxRetries = | |
| options?.maxRetries ?? | |
| (parseInt(process.env.OPENAI_RATE_LIMIT_MAX_RETRIES ?? "", 10) || | |
| DEFAULT_MAX_RETRIES); | |
| const envMaxRetriesRaw = process.env.OPENAI_RATE_LIMIT_MAX_RETRIES; | |
| let envMaxRetries: number | undefined; | |
| if (envMaxRetriesRaw !== undefined) { | |
| const parsed = Number(envMaxRetriesRaw); | |
| if (Number.isFinite(parsed) && parsed >= 0) { | |
| envMaxRetries = parsed; | |
| } | |
| } | |
| const maxRetries = | |
| options?.maxRetries ?? | |
| envMaxRetries ?? | |
| DEFAULT_MAX_RETRIES; |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| const COL2_WIDTH = WIDTH - COL1_WIDTH - 3; // 3 = "| " + " |" | ||
|
|
||
| const line = `+${"-".repeat(WIDTH - 2)}+`; | ||
| const title = "| OpenAI Plugin" + " ".repeat(WIDTH - 2 - "| OpenAI Plugin".length) + "|"; |
There was a problem hiding this comment.
Banner title and billing rows off-by-one width
Low Severity
The title row and billing row are each 71 characters wide instead of the intended 72, causing misalignment with the border/divider lines. The formula WIDTH - 2 - "| OpenAI Plugin".length subtracts 2 for both | delimiters, but the prefix string "| OpenAI Plugin" already includes the leading |. The subtraction should be WIDTH - 1 - prefix.length. The same error occurs in the billingRow calculation.
Additional Locations (1)
| const maxRetries = | ||
| options?.maxRetries ?? | ||
| (parseInt(process.env.OPENAI_RATE_LIMIT_MAX_RETRIES ?? "", 10) || | ||
| DEFAULT_MAX_RETRIES); |
There was a problem hiding this comment.
Env var max retries zero treated as default
Low Severity
Setting OPENAI_RATE_LIMIT_MAX_RETRIES=0 (to disable retries entirely) is silently ignored and falls back to the default of 5 retries. The || operator on line 248 treats the parsed 0 as falsy, so 0 || DEFAULT_MAX_RETRIES evaluates to 5. Using ?? instead of || would correctly preserve 0 as a valid value, consistent with how options?.maxRetries is handled on the line above.


… types
Add rate-limiter utility with per-endpoint backoff and retry logic. Improve embedding handler with better error handling and fallback support. Refactor init, audio, image, object, and text model handlers for @elizaos/core API compatibility. Expand types and add banner. Update README with comprehensive documentation.
Note
Medium Risk
Touches all OpenAI model call paths and introduces new retry/throttling and 429 classification logic, which could impact latency and error handling behavior under load. Startup behavior and TypeScript typing are also changed, but mostly in a controlled, additive way.
Overview
Adds a new process-wide rate limiter (
withRateLimit/acquireRateLimit) with per-category sliding-window throttling, exponential backoff usingRetry-After, one-shot tier logging, and fail-fast billing/quota 429 detection viaQuotaExceededError.Updates embeddings/images/audio/object/text handlers to use the limiter and to
await throwIfRateLimited()for typed 429 handling; embeddings also gain synthetic fast-paths (null/empty/"test") to avoid startup probe API calls and return fallback vectors on dimension mismatch.Changes plugin init to be non-blocking and purely synchronous (removes eager
GET /modelsvalidation), adds a startup config banner (src/banner.ts), introduces local core-compat streaming shims (StreamingTextParams/TextStreamResult), DRYs model registration with aPlugin["models"]type assertion, switches@elizaos/coredependency toworkspace:*, and expands docs with a newCHANGELOG.mdand README sections for banner/rate limiting/compatibility.Written by Cursor Bugbot for commit 7542c67. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Greptile Overview
Greptile Summary
This PR adds a comprehensive rate limiting system to prevent 429 errors from OpenAI's API and improves model handlers with better error handling and caching.
Major Changes:
@elizaos/coreversionsTechnical Highlights:
Confidence Score: 4/5
retry-afterheader parsing that could pass NaN to backoff logic. All other code is well-structured with proper error handling, caching strategies, and type safetysrc/utils/rate-limiter.ts:338- the retry-after header parsing needs the parentheses fix to handle invalid header values correctlyImportant Files Changed
Flowchart
flowchart TD Start[API Call Request] --> CheckCategory{Determine Category<br/>chat/embeddings/images/audio} CheckCategory --> AcquireSlot[Acquire Rate Limit Slot] AcquireSlot --> CheckBackoff{Active Backoff?} CheckBackoff -->|Yes| WaitBackoff[Wait for Backoff] CheckBackoff -->|No| CheckWindow{Within RPM Limit?} WaitBackoff --> CheckWindow CheckWindow -->|No| WaitWindow[Wait for Window Slot] CheckWindow -->|Yes| RecordRequest[Record Request Timestamp] WaitWindow --> RecordRequest RecordRequest --> MakeCall[Execute API Call] MakeCall --> CheckResponse{Response Status} CheckResponse -->|200 OK| Success[Return Result] CheckResponse -->|429 Quota| QuotaError[Throw QuotaExceededError<br/>No Retry] CheckResponse -->|429 Rate Limit| RateError{Retries Left?} CheckResponse -->|Other Error| OtherError[Throw Error] RateError -->|Yes| RecordBackoff[Record Backoff Deadline<br/>Exponential + Jitter] RateError -->|No| MaxRetries[Throw Max Retries Error] RecordBackoff --> Sleep[Sleep with Backoff] Sleep --> AcquireSlot Success --> End[Complete] QuotaError --> End OtherError --> End MaxRetries --> EndLast reviewed commit: 7542c67