feat(nodes): add context window optimizer node for token budget management#527
feat(nodes): add context window optimizer node for token budget management#527nihalnihalani wants to merge 9 commits intorocketride-org:developfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Context Optimizer node: global lifecycle loads config and conditionally instantiates a ContextOptimizer; instance wrapper intercepts and deep-copies questions, runs token-budgeted optimization (system prompt, query, documents, history) when available, and forwards optimized questions. Package metadata, dependency, and tests added. Changes
Sequence DiagramsequenceDiagram
participant Startup as "Startup"
participant IGlobal as "IGlobal"
participant Optimizer as "ContextOptimizer"
participant IInstance as "IInstance"
participant Pipeline as "Pipeline"
Startup->>IGlobal: beginGlobal()
alt CONFIG mode
IGlobal->>IGlobal: keep optimizer = None
else NORMAL mode
IGlobal->>IGlobal: import depends, load node config
IGlobal->>Optimizer: instantiate ContextOptimizer(config)
Optimizer->>Optimizer: prepare tokenizer & model limits
end
Pipeline->>IInstance: writeQuestions(question)
alt Optimizer available
IInstance->>IInstance: deep-copy & extract system/query/docs/history
IInstance->>Optimizer: optimize(system_prompt, model, documents, history)
Optimizer->>Optimizer: count tokens -> allocate budgets
Optimizer->>Optimizer: truncate system/query, rank/select docs, summarize history
Optimizer-->>IInstance: optimized components + metadata
IInstance->>IInstance: rebuild Question, log metadata
else No optimizer
IInstance->>IInstance: emit warning & pass-through question
end
IInstance->>Pipeline: writeQuestions(optimized_question)
Startup->>IGlobal: endGlobal()
IGlobal->>IGlobal: clear optimizer & config
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/context_optimizer/IInstance.py`:
- Around line 69-71: Replace the explicit for-loop that builds the history list
in IInstance.py with a single list comprehension: instead of initializing
history = [] and looping over question.history to append {'role': h.role,
'content': h.content}, assign history to a comprehension that iterates over
question.history or [] and constructs the same dicts; update the code around the
variable named history and the use of question.history accordingly.
In `@nodes/src/nodes/context_optimizer/optimizer.py`:
- Around line 299-302: The code builds the final message list via concatenation
into an intermediate variable `result`; replace that with list unpacking to
simplify and improve clarity by returning `[first_msg, summary_placeholder,
*recent]` directly instead of `result = [first_msg, summary_placeholder] +
recent` and `return result`—apply this change in the function where
`summary_placeholder` is set (the block using `first_msg`,
`summary_placeholder`, and `recent`).
In `@nodes/src/nodes/context_optimizer/services.json`:
- Around line 111-118: The "auto" profile currently claims to detect the model
from the pipeline but actually uses the configured default model_name ("gpt-5");
update the "auto" profile definition by either changing the "title" to
accurately reflect current behavior (e.g., "Auto - uses configured default model
(gpt-5)") or add a descriptive note in the profile (e.g., a "description" field)
stating that pipeline model detection is a planned future enhancement; modify
the "auto" object (keys: "title", "model_name") in the JSON to reflect this
clarification.
In `@test/nodes/test_context_optimizer.py`:
- Around line 348-350: Replace the two separate endswith checks with a single
tuple form: update the assertion that currently does "assert
result.endswith('.') or result.endswith('sentence')" to use a single endswith
call with a tuple (e.g., result.endswith(('.', 'sentence'))) so the static
analyzer warning is resolved; this change should be applied where the test calls
optimizer.truncate_to_budget and asserts on the result variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 57cc9afb-270e-4b0d-8a41-9510a6dda2e5
📒 Files selected for processing (7)
nodes/src/nodes/context_optimizer/IGlobal.pynodes/src/nodes/context_optimizer/IInstance.pynodes/src/nodes/context_optimizer/__init__.pynodes/src/nodes/context_optimizer/optimizer.pynodes/src/nodes/context_optimizer/requirements.txtnodes/src/nodes/context_optimizer/services.jsontest/nodes/test_context_optimizer.py
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/context_optimizer/IInstance.py`:
- Around line 97-101: Move the from ai.common.schema import QuestionHistory out
of the method and place it at module top-level so the QuestionHistory symbol is
imported once (improves readability and avoids repeated import overhead); update
the file-level imports and remove the inline import inside the block that
updates question.history (the code section using result['history'] and
constructing QuestionHistory(role=..., content=...)); if a circular-import
prevents a top-level import, use a typing.TYPE_CHECKING guard or keep the local
import but add a short comment explaining the circular dependency.
In `@nodes/src/nodes/context_optimizer/optimizer.py`:
- Around line 409-414: The history token calculation is inconsistent: in
optimizer.optimize (original_history) you add a flat "+ 4" for role overhead
while summarize_history._msg_tokens uses count_tokens(role) + 4; extract a
shared helper (e.g., msg_token_count(message: dict) or _msg_tokens(role,
content)) that returns count_tokens(role) + count_tokens(content) + 4 and use it
both in summarize_history._msg_tokens and in the optimize
original_history/doc/history counting so both places compute identical
per-message tokens and remove the hardcoded "+ 4" discrepancy.
- Around line 209-223: The current logic splits text with _SENTENCE_RE and then
joins selected pieces with ' ', which collapses original inter-sentence
whitespace; change the approach to preserve original separators by using
_SENTENCE_RE with a capturing group (or iterate re.finditer over matches) so you
keep each sentence plus its following separator, measure tokens with
count_tokens(sentence_piece, encoding), append the exact sentence_piece to
result_parts, and return ''.join(result_parts) (instead of ' '.join) when
returning the truncated text; update the code paths that reference
sentences/result_parts and keep count_tokens, encoding and max_tokens usage
unchanged.
- Around line 47-66: The MODEL_LIMITS class variable is mutable and triggers
Ruff RUF012; annotate it as a ClassVar to mark it read-only by importing
ClassVar from typing and changing the declaration to use ClassVar[Dict[str,
int]] for MODEL_LIMITS, or if you prefer to keep the current annotation add a
trailing "# noqa: RUF012" to the MODEL_LIMITS line to suppress the warning;
locate the MODEL_LIMITS symbol in optimizer.py and apply one of these two fixes.
- Around line 282-291: The loop building recent currently uses recent.insert(0,
msg) which causes O(n²) behavior; instead append messages to recent and then
reverse once at the end to preserve original order. Update the block that
iterates over reversed(messages[1:]) (uses _msg_tokens and budget_for_recent) to
call recent.append(msg) when within budget and after the loop call
recent.reverse() so the logic and variables (recent, recent_cost, _msg_tokens,
budget_for_recent, messages) remain the same but with O(n) performance.
In `@nodes/src/nodes/context_optimizer/services.json`:
- Around line 88-101: Extend the "test" cases in services.json to cover token
truncation, document selection under budget, and history compression: add
separate entries in the "cases" array that (1) provide a very long question text
to assert truncation behavior (extend the existing "questions" object and set an
"expect" that checks truncated length or presence of a truncation marker), (2)
supply multiple documents plus a budget constraint to assert the chosen
documents (create an input field representing documents and an "expect"
asserting selected document IDs or count), and (3) include a long conversation
history to verify compression (add a "history" input and an "expect" that
compressed history length or summary is within expected bounds); reference the
existing test structure keys ("test", "cases", "questions", "expect") so new
cases match the same schema and validation style.
In `@test/nodes/test_context_optimizer.py`:
- Around line 764-792: Test currently only checks that 'question' was passed to
optimizer.optimize; tighten it by asserting the actual extracted values: fetch
call_args = mock_opt.optimize.call_args and read the question from
call_args.kwargs.get('question', call_args.args[0] if call_args.args else None)
and assert it equals the expected question text (from the Question instance or
q.get_text()/repr), and similarly assert that 'components' (or the expected
components key) is present and is a list containing the components extracted by
inst.writeQuestions; update test_optimizer_called_with_components to perform
these specific checks using the optimizer.optimize call_args instead of the
loose presence check.
- Around line 234-235: The test test_none_returns_zero calls count_tokens with
None but count_tokens is typed as text: str; update the count_tokens signature
to accept Optional[str] (import typing.Optional) and update its docstring to
state None is allowed, keeping the existing None-check (if not text: return 0)
so the test remains valid; alternatively change the test to pass an empty string
''—pick one consistent approach and update typing and imports accordingly in the
function definition for count_tokens.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3dcd3061-e08b-4c22-8e64-17cc00d04a83
📒 Files selected for processing (4)
nodes/src/nodes/context_optimizer/IInstance.pynodes/src/nodes/context_optimizer/optimizer.pynodes/src/nodes/context_optimizer/services.jsontest/nodes/test_context_optimizer.py
| # Split into sentences | ||
| sentences = self._SENTENCE_RE.split(text) | ||
| result_parts: List[str] = [] | ||
| used_tokens = 0 | ||
|
|
||
| for sentence in sentences: | ||
| sentence_tokens = self.count_tokens(sentence, encoding) | ||
| if used_tokens + sentence_tokens <= max_tokens: | ||
| result_parts.append(sentence) | ||
| used_tokens += sentence_tokens | ||
| else: | ||
| break | ||
|
|
||
| if result_parts: | ||
| return ' '.join(result_parts) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Sentence joining normalizes whitespace.
The _SENTENCE_RE.split(text) preserves sentence content but the ' '.join(result_parts) normalizes all inter-sentence whitespace to single spaces. This may alter formatting if the original had double spaces, newlines, or other whitespace. This is likely acceptable for LLM context but worth documenting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/context_optimizer/optimizer.py` around lines 209 - 223, The
current logic splits text with _SENTENCE_RE and then joins selected pieces with
' ', which collapses original inter-sentence whitespace; change the approach to
preserve original separators by using _SENTENCE_RE with a capturing group (or
iterate re.finditer over matches) so you keep each sentence plus its following
separator, measure tokens with count_tokens(sentence_piece, encoding), append
the exact sentence_piece to result_parts, and return ''.join(result_parts)
(instead of ' '.join) when returning the truncated text; update the code paths
that reference sentences/result_parts and keep count_tokens, encoding and
max_tokens usage unchanged.
There was a problem hiding this comment.
@coderabbitai Fair observation. The ' '.join(result_parts) does normalize inter-sentence whitespace to single spaces. As you noted, this is likely acceptable for LLM context where whitespace formatting is typically not semantically significant, but I'll add a brief comment documenting this behavior for future maintainers.
There was a problem hiding this comment.
Rate Limit Exceeded
@nihalnihalani have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 46 seconds before sending another message.
There was a problem hiding this comment.
Leaving this behavior as-is. The sentence-aware truncation intentionally normalizes inter-sentence whitespace when it rejoins preserved sentences, and the current implementation now documents that directly above the join call. For this node’s LLM-context use case that tradeoff is acceptable, and the focused optimizer suite still passes on the current head (81 passed).
There was a problem hiding this comment.
@nihalnihalani, understood — keeping the whitespace normalization with an in-code comment documenting the intentional behavior is a reasonable resolution. The LLM-context use case makes the tradeoff sound, and the added comment gives future maintainers the necessary context. Thanks for clarifying.
✏️ Learnings added
Learnt from: nihalnihalani
Repo: rocketride-org/rocketride-server PR: 527
File: nodes/src/nodes/context_optimizer/optimizer.py:249-264
Timestamp: 2026-04-09T10:23:39.742Z
Learning: In `nodes/src/nodes/context_optimizer/optimizer.py`, the `truncate_to_budget` method intentionally uses `' '.join(result_parts)` after splitting on sentence boundaries (`_SENTENCE_RE`), which normalizes all inter-sentence whitespace to single spaces. This behavior is intentional and documented in the code — for LLM context, whitespace formatting is not semantically significant. Do not flag this whitespace normalization as a bug or suggest preserving original separators in future reviews.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: ryan-t-christensen
Repo: rocketride-org/rocketride-server PR: 640
File: nodes/src/nodes/tool_github/IInstance.py:1032-1036
Timestamp: 2026-04-09T07:00:32.335Z
Learning: In `nodes/src/nodes/tool_github/IInstance.py`, `_require_str` always calls `.strip()` on the returned value, including when used for `file_create`/`file_edit` `content` arguments. This means a trailing `\n` in plain-text file content will be stripped before base64 encoding. This behavior is intentional and accepted — agents are not expected to supply files where trailing whitespace is semantically significant. Do not flag this as a bug in future reviews.
Learnt from: asclearuc
Repo: rocketride-org/rocketride-server PR: 417
File: nodes/src/nodes/llm_vision_ollama/__init__.py:28-32
Timestamp: 2026-03-24T21:28:07.433Z
Learning: For Python pipeline node files under `nodes/**/*.py`, follow the repo’s string/docstring quoting conventions: use single quotes for regular string literals, and require PEP 257-style docstrings to use triple double quotes (`"""..."""`). Do not recommend changing docstring quotes from `"""..."""` to `'''...'''`.
Learnt from: KaushikSiva
Repo: rocketride-org/rocketride-server PR: 386
File: nodes/src/nodes/search_exa/IInstance.py:31-32
Timestamp: 2026-03-25T16:11:50.206Z
Learning: In nodes/**/*.py, treat `self.instance.writeAnswers(...)` as accepting a single `Answer` object (not `List[Answer]`). In reviews, do not flag calls where a single `Answer` is passed to `writeAnswers` as type mismatches. If the abstract base class (e.g., `filters.py`) still declares `List[Answer]`, consider updating that type annotation to match the actual runtime/override usage, since concrete implementations (e.g., `response/IInstance.py`, `extract_data/IInstance.py`, `dictionary/IInstance.py`, `db_instance_base.py`) and all call sites pass a single `Answer`.
Learnt from: asclearuc
Repo: rocketride-org/rocketride-server PR: 456
File: nodes/src/nodes/db_neo4j/IInstance.py:98-100
Timestamp: 2026-03-27T23:31:51.042Z
Learning: In this project’s Python lifecycle stubs under `nodes/` (e.g., empty lifecycle methods like `endInstance`), it may be intentional to include an explicit `pass` statement even if the method already contains a docstring. When reviewing, do not flag this as redundant (Ruff `PIE790`) for these empty stub lifecycle methods when the `pass` is used to clearly mark the body as intentionally empty.
Learnt from: chinesepowered
Repo: rocketride-org/rocketride-server PR: 518
File: nodes/src/nodes/llm_openai_api/openai_client.py:25-32
Timestamp: 2026-03-30T21:57:11.296Z
Learning: In Python LLM node implementations under `nodes/**/*.py`, it’s an established pattern that `ChatBase.__init__` may not store all configuration on the instance. Subclasses may intentionally call `Config.getNodeConfig(provider, connConfig)` again inside their own `__init__` to read provider-specific fields (e.g., `apikey`, `base_url`). During code review, do not flag this as a redundant duplicate call when it’s used to obtain provider-specific configuration for the subclass.
Learnt from: chinesepowered
Repo: rocketride-org/rocketride-server PR: 518
File: nodes/src/nodes/llm_openai_api/IGlobal.py:11-81
Timestamp: 2026-03-30T22:18:48.072Z
Learning: In this repository’s Python LLM node implementation files under `nodes/**/*.py`, follow the established `validateConfig` style: use the existing “monolithic probe + error-formatting” approach for provider-first exceptions to keep behavior consistent. When linting flags require suppression, prefer the smallest/most targeted Ruff `# noqa` usage (e.g., `PLR0912`/`PLR0915`) and add specific justification comments for any targeted `BLE001`/similar codes rather than refactoring into multiple helper functions solely to satisfy linting. Only refactor the exception/validation pattern if you can update it consistently across all similar nodes.
Learnt from: dsapandora
Repo: rocketride-org/rocketride-server PR: 583
File: nodes/src/nodes/audio_tts/IGlobal.py:236-264
Timestamp: 2026-04-02T13:09:51.916Z
Learning: In RocketRide, filter node code (e.g., implementations under `nodes/**/*.py`) should assume `IGlobal.synthesize()` and filter node methods are executed sequentially per pipeline instance: one document is processed through the filter at a time. Concurrent access to the same `IGlobal` instance is not an expected runtime scenario. Therefore, do not raise review findings about missing synchronization/locks (e.g., missing mutex/atomic protections) for shared `IGlobal` state in filter node implementations unless there is explicit evidence of concurrent access beyond the framework’s pipeline execution model.
Learnt from: Rod-Christensen
Repo: rocketride-org/rocketride-server PR: 599
File: nodes/src/nodes/db_neo4j/IInstance.py:158-163
Timestamp: 2026-04-02T20:29:16.277Z
Learning: When reviewing Python code in this repository, do not treat Ruff rule RET505 ("Unnecessary elif after return statement") as a review finding. Maintainers consider this nitpick unnecessary, and it is acceptable to keep elif/else branching even immediately after a return.
Learnt from: Rod-Christensen
Repo: rocketride-org/rocketride-server PR: 599
File: nodes/src/nodes/tool_firecrawl/IInstance.py:103-103
Timestamp: 2026-04-03T04:00:08.250Z
Learning: When reviewing Python files under `nodes/**/*.py`, do not flag Ruff Q000 (single-quote style) violations for double-quoted string literals if the content contains a single quote/apostrophe (e.g., `"Map a website's structure..."`). In such cases, double quotes are intentional to avoid backslash escapes and are considered correct under Ruff’s Q000 behavior.
Learnt from: nihalnihalani
Repo: rocketride-org/rocketride-server PR: 525
File: nodes/src/nodes/router_llm/IInstance.py:0-0
Timestamp: 2026-04-03T05:20:16.219Z
Learning: When reviewing Python code under the `nodes/` package, treat the `IInstanceBase.open()` lifecycle method’s parameter named `obj: Entry` as the correct engine Python-bridge interface convention. Do not flag `obj` as a non-standard parameter name. This convention exists specifically because the earlier parameter name `object` (which shadowed the Python builtin) was replaced with `obj`.
Learnt from: Rod-Christensen
Repo: rocketride-org/rocketride-server PR: 599
File: nodes/src/nodes/memory_internal/IInstance.py:33-35
Timestamp: 2026-04-03T18:31:22.345Z
Learning: In RocketRide pipeline engine nodes (nodes/**/*.py), rely on the lifecycle invariant: `beginInstance()` is guaranteed to complete successfully before `open()` runs, and if `beginInstance()` throws, the engine aborts and `open()` is never called. Therefore, during code review, do not flag instance fields that are initialized in `beginInstance()` (e.g., `self._store`) as potentially `None`/uninitialized when they are referenced in `open()`. This is safe because the framework lifecycle enforces it.
Learnt from: Rod-Christensen
Repo: rocketride-org/rocketride-server PR: 599
File: nodes/src/nodes/llm_base/IInstance.py:36-50
Timestamp: 2026-04-03T18:31:20.227Z
Learning: In Python pipeline node files under `nodes/**/*.py` (including `nodes/src/nodes/llm_base/IInstance.py`), do not flag missing PEP 257 docstrings for trivial one-liner methods decorated with `invoke_function` (or similar decorator-based invocation). Skip the docstring requirement when the method’s name and return value/behavior make the contract self-evident (e.g., `getContextLength`, `getOutputLength`, `getTokenCounter`, `ask`), because the decorator plus the method name already communicate intent clearly.
Learnt from: Rod-Christensen
Repo: rocketride-org/rocketride-server PR: 599
File: nodes/src/nodes/tool_firecrawl/IInstance.py:67-69
Timestamp: 2026-04-03T18:31:34.956Z
Learning: For Python tools implemented with rocketlib’s `tool_function` decorator, the framework validates the declared `input_schema` at `tool.invoke` before entering the tool method body. Therefore, do not require (or flag) redundant runtime type checks inside the tool (e.g., `isinstance(url, str)` when the schema declares `url` as a string). Instead, only add lightweight value guards for schema-compatible edge cases (e.g., `if not url:` to handle the empty-string case) and avoid duplicating schema-based type validation logic.
Learnt from: charliegillet
Repo: rocketride-org/rocketride-server PR: 509
File: nodes/src/nodes/tool_exa_search/exa_driver.py:146-198
Timestamp: 2026-04-08T07:11:59.835Z
Learning: In RocketRide tool driver implementations under nodes/**/*.py that use ToolsBase, ensure _tool_validate normalizes inputs exactly the same way as _tool_invoke (via a shared _normalize_tool_input helper). This must allow inputs consistently when ToolsBase.handle_invoke triggers validation directly—specifically, JSON strings, Pydantic-like models, and wrapped payloads like {"input": ...} should be accepted and normalized prior to validation.
| "test": { | ||
| "profiles": ["auto"], | ||
| "cases": [ | ||
| { | ||
| "questions": { | ||
| "questions": [{"text": "What is the capital of France?"}] | ||
| }, | ||
| "expect": { | ||
| "questions": { | ||
| "notEmpty": true | ||
| } | ||
| } | ||
| } | ||
| ] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding more comprehensive test cases.
The current test only verifies that questions are non-empty. Consider adding test cases that verify:
- Token truncation behavior with a long question
- Document selection when over budget
- History compression
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/context_optimizer/services.json` around lines 88 - 101,
Extend the "test" cases in services.json to cover token truncation, document
selection under budget, and history compression: add separate entries in the
"cases" array that (1) provide a very long question text to assert truncation
behavior (extend the existing "questions" object and set an "expect" that checks
truncated length or presence of a truncation marker), (2) supply multiple
documents plus a budget constraint to assert the chosen documents (create an
input field representing documents and an "expect" asserting selected document
IDs or count), and (3) include a long conversation history to verify compression
(add a "history" input and an "expect" that compressed history length or summary
is within expected bounds); reference the existing test structure keys ("test",
"cases", "questions", "expect") so new cases match the same schema and
validation style.
There was a problem hiding this comment.
@coderabbitai Agreed — the current test only verifies that questions are non-empty, which is minimal coverage. Adding test cases for token truncation with a long question, document selection when over budget, and history compression would make the service definition tests much more meaningful. I'll add these in the next commit.
asclearuc
left a comment
There was a problem hiding this comment.
Thank you for this contribution — the token counting, budget allocation, and
sentence-aware truncation are well implemented, and reusing tiktoken is the right
choice. The overall design is solid.
A few issues need to be resolved before merge. Inline comments have been left on
specific lines.
| @@ -0,0 +1 @@ | |||
| tiktoken>=0.7.0,<1.0.0 | |||
There was a problem hiding this comment.
Could it be used without version restriction?
uv pip compile is used internally to calculate version number?
it is not critical since version restriction is large.
There was a problem hiding this comment.
Fixed — removed the upper bound. Now tiktoken>=0.7.0 without the <1.0.0 cap.
| import re | ||
| from typing import Any, Dict, List, Optional | ||
|
|
||
| import tiktoken |
There was a problem hiding this comment.
import tiktoken at module level will fail if tiktoken is not yet installed when
the module is first imported. Move the import inside the methods that use it, or
at minimum inside __init__, to ensure depends() in IGlobal has run first.
There was a problem hiding this comment.
Fixed — moved import tiktoken from module level into the _get_encoding() method for lazy loading. This ensures depends() in IGlobal has run first before tiktoken is imported.
| from depends import depends # type: ignore | ||
|
|
||
| # Load the requirements (tiktoken) | ||
| requirements = os.path.dirname(os.path.realpath(__file__)) + '/requirements.txt' |
There was a problem hiding this comment.
Fixed — now using os.path.join(os.path.dirname(os.path.realpath(__file__)), 'requirements.txt') instead of string concatenation.
| opt_history_tokens = sum(self.count_tokens(m.get('content', '')) + self.count_tokens(m.get('role', '')) + 4 for m in opt_history) | ||
| if opt_history_tokens < original_history: | ||
| components_truncated.append('history') | ||
|
|
There was a problem hiding this comment.
Budget is allocated as fixed percentages of the total limit with no redistribution
of unused space. If the system prompt uses 100 tokens out of a 12,800-token
allocation, the surplus 12,700 tokens are discarded — they do not flow to other
components. This means history can be truncated even when the global token budget
has plenty of room.
Example: system_prompt=100 tokens, history=40,000 tokens needed, total_limit=128K
- system_prompt budget: 12,800 (only 100 used, 12,700 wasted)
- history budget: 32,000 → truncated despite 40,900 tokens available globally
Consider a two-pass approach: first check if all components fit within total_limit
without per-component caps; only apply percentage budgets when the total exceeds
the limit. This would make truncation the exception rather than the default.
There was a problem hiding this comment.
Fixed — implemented two-pass approach. Pass 1 checks if all components fit within total_limit without per-component caps (no wasted budget). Pass 2 applies percentage-based budgets only when total exceeds the limit, making truncation the exception rather than the default.
| scored.append((overlap, tokens, doc)) | ||
|
|
||
| # Sort by overlap descending, then by token cost ascending (prefer smaller docs on tie) | ||
| scored.sort(key=lambda x: (-x[0], x[1])) |
There was a problem hiding this comment.
This re-ranks documents by keyword overlap, silently discarding the ordering
established by the upstream vector DB node (which orders by embedding similarity
score). If the vector DB already returned the best documents first, re-ranking
by a weaker keyword heuristic may produce a worse selection.
Consider preserving the original document order and applying only greedy budget
selection (without re-sorting), or using doc.score as the primary sort key with
keyword overlap as a secondary tiebreaker only when scores are absent.
There was a problem hiding this comment.
Fixed — now preserves vector DB ordering when documents carry a score field. Keyword overlap is only used as a lightweight fallback when no scores are present, with original index as secondary tiebreaker to avoid completely discarding upstream ordering.
|
@asclearuc Thank you for the thorough and deeply technical review! Your feedback is excellent and touches on real correctness issues. I'm working on addressing all 8 points:
Will push these fixes shortly. Thanks again for the insightful review! |
There was a problem hiding this comment.
♻️ Duplicate comments (3)
nodes/src/nodes/context_optimizer/IInstance.py (1)
98-102: 🧹 Nitpick | 🔵 TrivialMove
QuestionHistoryimport to module level.The
QuestionHistoryimport at line 99 is still inside the method. As you acknowledged in a previous review, since there's no circular import concern, moving it to the top level alongside the existingQuestionimport improves readability.♻️ Move import to top of file
from rocketlib import IInstanceBase, Entry, debug, warning from .IGlobal import IGlobal -from ai.common.schema import Question +from ai.common.schema import Question, QuestionHistory ... # Update history - from ai.common.schema import QuestionHistory - if result['history']:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/context_optimizer/IInstance.py` around lines 98 - 102, The import of QuestionHistory is currently inside a method in IInstance.py; move it to the module level alongside the existing Question import at the top of the file so the module-level imports read both Question and QuestionHistory, then remove the local "from ai.common.schema import QuestionHistory" inside the method where result['history'] is used (the code that sets question.history = [QuestionHistory(...)] should continue to reference QuestionHistory unchanged).nodes/src/nodes/context_optimizer/optimizer.py (1)
318-327: 🧹 Nitpick | 🔵 TrivialUse
append+reverseto avoid O(n²) list insertions.Using
recent.insert(0, msg)inside a loop causes O(n²) behavior due to list shifting. You agreed to refactor this in a previous review.♻️ Suggested optimization
# Greedily add recent messages from the end recent: List[Dict[str, str]] = [] recent_cost = 0 for msg in reversed(messages[1:]): cost = _msg_tokens(msg) if recent_cost + cost <= budget_for_recent: - recent.insert(0, msg) + recent.append(msg) recent_cost += cost else: break + recent.reverse()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/context_optimizer/optimizer.py` around lines 318 - 327, The loop building recent currently uses recent.insert(0, msg) causing O(n²) behavior; change it to append messages in iteration order (e.g., append to recent when iterating reversed(messages[1:])), then call recent.reverse() after the loop so the final order is preserved; update references in this block (variables recent, recent_cost, and the loop over reversed(messages[1:])) to use append + reverse to avoid repeated front-insertions.test/nodes/test_context_optimizer.py (1)
789-793: 🧹 Nitpick | 🔵 TrivialTighten assertions on optimizer call arguments.
The assertion at line 793 is still loose—it only checks that
'question'exists somewhere in the call. You agreed to verify the actual extracted values in a previous review.♻️ More specific assertion
mock_opt.optimize.assert_called_once() call_kwargs = mock_opt.optimize.call_args - assert 'question' in call_kwargs.kwargs or len(call_kwargs.args) > 0 + # Verify extracted components + assert call_kwargs.kwargs.get('question') == 'What is AI?' + assert call_kwargs.kwargs.get('system_prompt') == 'You are helpful.'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/nodes/test_context_optimizer.py` around lines 789 - 793, The optimizer call assertion is too loose—replace the generic check with a precise comparison to the questions you passed in: after inst.writeQuestions(q) and mock_opt.optimize.assert_called_once(), grab call_kwargs = mock_opt.optimize.call_args and assert that the question was passed exactly (either assert call_kwargs.kwargs.get('question') == q or assert call_kwargs.args and call_kwargs.args[0] == q), and if writeQuestions can pass a list/tuple ensure you compare to the correct element/structure (e.g., q[0] or q) so the test verifies the actual extracted values rather than just the presence of the key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@nodes/src/nodes/context_optimizer/IInstance.py`:
- Around line 98-102: The import of QuestionHistory is currently inside a method
in IInstance.py; move it to the module level alongside the existing Question
import at the top of the file so the module-level imports read both Question and
QuestionHistory, then remove the local "from ai.common.schema import
QuestionHistory" inside the method where result['history'] is used (the code
that sets question.history = [QuestionHistory(...)] should continue to reference
QuestionHistory unchanged).
In `@nodes/src/nodes/context_optimizer/optimizer.py`:
- Around line 318-327: The loop building recent currently uses recent.insert(0,
msg) causing O(n²) behavior; change it to append messages in iteration order
(e.g., append to recent when iterating reversed(messages[1:])), then call
recent.reverse() after the loop so the final order is preserved; update
references in this block (variables recent, recent_cost, and the loop over
reversed(messages[1:])) to use append + reverse to avoid repeated
front-insertions.
In `@test/nodes/test_context_optimizer.py`:
- Around line 789-793: The optimizer call assertion is too loose—replace the
generic check with a precise comparison to the questions you passed in: after
inst.writeQuestions(q) and mock_opt.optimize.assert_called_once(), grab
call_kwargs = mock_opt.optimize.call_args and assert that the question was
passed exactly (either assert call_kwargs.kwargs.get('question') == q or assert
call_kwargs.args and call_kwargs.args[0] == q), and if writeQuestions can pass a
list/tuple ensure you compare to the correct element/structure (e.g., q[0] or q)
so the test verifies the actual extracted values rather than just the presence
of the key.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7bbc2114-9f39-4749-abe1-660f0d9e06c9
📒 Files selected for processing (4)
nodes/src/nodes/context_optimizer/IGlobal.pynodes/src/nodes/context_optimizer/IInstance.pynodes/src/nodes/context_optimizer/optimizer.pytest/nodes/test_context_optimizer.py
All review feedback addressed in latest commits@asclearuc review items — all fixed:
CodeRabbit nitpicks — also addressed:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/context_optimizer/IInstance.py`:
- Around line 86-92: The current block updates question.questions[0].text but
silently discards result['question'] when question.questions is empty; add a
guard that checks if not question.questions and result['question'] and then emit
a debug log (including the optimized text and context) to surface this edge
case, and optionally append a new Question object with text=result['question']
(or preserve embedding info if available) to question.questions so the optimized
content isn’t lost; reference the question.questions list and result['question']
in your changes and use the module's existing logger to record the event.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b77db817-bf83-4e4f-8c16-177629dd3afd
📒 Files selected for processing (4)
nodes/src/nodes/context_optimizer/IInstance.pynodes/src/nodes/context_optimizer/optimizer.pynodes/src/nodes/context_optimizer/requirements.txttest/nodes/test_context_optimizer.py
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/context_optimizer/IInstance.py`:
- Around line 63-68: The current conversion in IInstance.py discards fields like
score by only appending {'content': content, '_original': doc}; change the
transformation so you preserve all dumped fields from doc_dict (including
'score') and add/override 'content' and '_original' instead of replacing the
entire dict; update the loop that builds docs (used by rank_documents()) to
merge doc_dict with {'content': content, '_original': doc} so score-first
ordering works correctly.
- Around line 84-103: The code treats empty optimizer outputs as "no update" by
using truthiness checks (e.g., if result['question'], if result['history'], if
question.questions) so empty strings/lists are ignored; change those checks to
explicit presence checks and assign/clear fields even when the optimizer returns
empty values: use "if 'question' in result:" to update
question.questions[0].text = result['question'] (and truncate extras) or set
question.questions = [] when result['question'] is an empty string and no
embeddings should remain; use "if 'documents' in result:" to set
question.documents = [d['_original'] for d in result['documents'] if '_original'
in d] (allowing empty list to clear documents); and use "if 'history' in
result:" to set question.history = [QuestionHistory(... ) for m in
result['history']] (allowing empty history to clear it); also replace the role
assignment to use result.get('system_prompt', '') or assign explicitly when
'system_prompt' in result so an empty string overwrites the previous role.
In `@test/nodes/test_context_optimizer.py`:
- Around line 32-140: The test's _install_stubs() and global _STUBS_INSTALLED
permanently mutate sys.modules (adding 'rocketlib', 'ai.common.schema', etc.) at
import time which leaks fake modules into other tests; change this to apply the
stubs only within a context/fixture (use unittest.mock.patch.dict(sys.modules,
{...}) or a pytest fixture that patches sys.modules) and import the modules
under test inside that scoped patch so the originals are restored after the
test; update where _install_stubs() is called (and the duplicate stubbing at the
other occurrence around the 177-178 area) to use the scoped patch/fixture
instead of setting sys.modules globally.
- Around line 147-175: The mock tiktoken setup is currently inside the except
block so it only runs if find_spec raises—move the mock creation to run whenever
_TIKTOKEN_AVAILABLE is False: keep the try/except that sets _TIKTOKEN_AVAILABLE
(using importlib.util.find_spec('tiktoken')), but remove the mock from the
except and add an if not _TIKTOKEN_AVAILABLE: block that defines tiktoken_mod,
_MockEncoding, _get_encoding and sets sys.modules['tiktoken']; reference
_TIKTOKEN_AVAILABLE, tiktoken_mod, _MockEncoding and _get_encoding so the
optimizer import that follows will see the mocked module when tiktoken is
absent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 688affea-3221-4a2e-90c6-8528165e9c45
📒 Files selected for processing (2)
nodes/src/nodes/context_optimizer/IInstance.pytest/nodes/test_context_optimizer.py
|
@nihalnihalani, please address CodeRabbit’s concerns as well. |
|
Please address all CodeRabbit comments, not only for this PR but also for others. |
|
All CodeRabbit review comments have been addressed across prior commits — verified with 81/81 tests passing, ruff lint clean, and ruff format clean. I've gone through this PR (and the others) to make sure no CodeRabbit feedback was left unaddressed. Will continue to check back after pushes for any new CodeRabbit comments that may appear, as you suggested. Thanks for the process guidance — applying this as a standard practice going forward. |
|
All CodeRabbit review comments on this PR have been addressed in commit 270498d: Fixes applied:
All 81 tests pass, ruff lint and format checks are clean. I will continue to monitor the PR for any new CodeRabbit comments that may appear after this push. |
Add a new pipeline node that manages LLM context window budgets for context engineering. The optimizer counts tokens using tiktoken, allocates budgets across system prompt, query, documents, and conversation history using configurable priority-based percentages, and truncates/summarizes content to fit within model-specific token limits. Key capabilities: - Token counting with tiktoken (handles unicode/emoji) - Priority-based budget allocation (system > query > docs > history) - Sentence-boundary-aware truncation - Conversation history summarization (keeps first + last N, summarizes middle) - Keyword-overlap document ranking with greedy budget selection - Built-in limits for GPT-5, Claude, and Gemini model families - Auto and manual configuration profiles Includes 68 unit tests covering token counting, budget allocation, truncation, history summarization, document ranking, full pipeline, model limits, edge cases, and IGlobal/IInstance lifecycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace for-loop history building with list comprehension in IInstance.py - Use list unpacking [first_msg, summary_placeholder, *recent] in optimizer.py - Update "auto" profile title to reflect configured default, not auto-detection - Use tuple form for endswith checks in test_context_optimizer.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move tiktoken import from module-level to lazy import inside methods to prevent ImportError before depends() installs the package - Use os.path.join instead of string concatenation for path construction - Add warning() when optimizer is not initialized and question passes through - Validate non-numeric config values with try/except, clamp percentages to [0, 100] and max_context_tokens to >= 0 - Warn when budget percentages sum to less than 90% - Implement two-pass optimization: pass 1 returns content unchanged when everything fits within total limit, pass 2 applies per-component budgets only when total exceeds limit - Preserve vector DB document ordering by using doc.score as primary sort key when available; keyword overlap used only as tiebreaker when scores are absent - Add tests for input validation, two-pass optimization, and score-preserving document ranking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Relax tiktoken version restriction (remove upper bound) per reviewer question - Move QuestionHistory import to module level for readability - Add ClassVar annotation to MODEL_LIMITS to suppress RUF012 - Document whitespace normalization in sentence-boundary truncation - Use append+reverse instead of O(n^2) insert(0) in history summarization - Extract shared _message_tokens helper for consistent token overhead calculation - Accept Optional[str] in count_tokens for explicit None handling - Strengthen test assertion to verify actual extracted question/system_prompt values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pty questions list When question.questions is empty but the optimizer produces a non-empty result, the optimized question text was silently discarded. Add a debug log to surface this edge case for troubleshooting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ptimizer - Preserve all document fields (including score) during optimizer conversion so rank_documents() score-first ordering works correctly - Treat empty optimizer outputs as valid truncation results instead of skipping updates (empty string/list now clears the field) - Fix tiktoken mock: move setup outside except block since find_spec returns None without raising when the package is absent - Scope test stubs via patch.dict + teardown_module so fake modules do not leak into subsequent test files in the pytest session Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
270498d to
82f225a
Compare
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
There was a problem hiding this comment.
Please replace stdlib logging with rocketlib's warning() / debug() throughout this file — the same functions already imported in IInstance.py.
# remove
import logging
logger = logging.getLogger(__name__)
# replace all logger.warning(...) calls with
from rocketlib import debug, warningThis keeps log output consistent with the rest of the node pipeline.
There was a problem hiding this comment.
Already fixed in current head via d629106. optimizer.py now uses rocketlib.warning() instead of stdlib logging on those warning paths, which keeps this node aligned with the rest of the pipeline logging style. I also reran python3 -m pytest -q test/nodes/test_context_optimizer.py and python3 -m ruff check nodes/src/nodes/context_optimizer/optimizer.py nodes/src/nodes/context_optimizer/IInstance.py nodes/src/nodes/context_optimizer/IGlobal.py test/nodes/test_context_optimizer.py; both are clean on the current branch.
| "protocol": "context_optimizer://", | ||
| // | ||
| // Required: | ||
| // Class type of the node - what it does |
There was a problem hiding this comment.
Please mark this node as experimental in capabilities:
"capabilities": ["experimental"],Rationale: the node silently drops documents and truncates content, the MODEL_LIMITS map doesn't cover most real model IDs (so the 128K fallback fires silently), and tiktoken's cl100k_base encoding is OpenAI-specific and will produce inaccurate token counts for Claude/Gemini models. The experimental flag signals to users that this node requires careful validation before use in production pipelines — see agent_rocketride/services.json for precedent.
There was a problem hiding this comment.
Already fixed in current head via d629106. services.json now marks the node as "experimental" in capabilities, so the service metadata reflects that this node still needs careful validation before production use.
…etride-org#527) Clarify "auto" profile title in services.json to say "Use default model limits" since pipeline detection is not implemented — the implementation uses the configured model_name default to look up limits. All other review feedback items (module-level import, _message_tokens helper, Optional[str] signature, list comprehension, ClassVar annotation, endswith tuple form) were addressed in prior commits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review Feedback AddressedFixes
Ready for re-review. |
|
No description provided. |
🚀 Merge RequestAll review feedback addressed:
Ready for re-review and merge. @asclearuc |
|
@asclearuc — All review feedback addressed and pushed (7 fixes, 63/64 tests passing — 1 pre-existing unrelated failure). Ready for re-review. Thanks! 🙏 |
#Hack-with-bay-2
Contribution Type
Feature — Context window optimizer for LLM token budget management
Problem / Use Case
RocketRide pipelines have no way to manage what fits in an LLM's context window. Users must manually estimate token counts and prune content. When pipelines chain documents + history + system prompts, context overflows cause silent truncation or API errors.
"Context engineering" is the #1 AI trend in 2026 — optimizing what information goes into LLM context.
Proposed Solution
A `context_optimizer` node with:
Why This Feature Fits This Codebase
The node processes Question objects from upstream nodes and optimizes their content before forwarding to LLM nodes. It uses `Config.getNodeConfig()` and the standard IGlobal/IInstance pattern. Token counting via tiktoken complements the existing token-aware features in `rocketlib/types.py` (IInvokeLLM.getTokenCounter).
Validation
68 tests passing. ruff check + format clean.
How This Could Be Extended
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Chores