feat(nodes): add Exa semantic web search node for real-time data enrichment#509
feat(nodes): add Exa semantic web search node for real-time data enrichment#509charliegillet wants to merge 8 commits intorocketride-org:developfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new Exa Search tool node ( Changes
Sequence DiagramsequenceDiagram
participant Caller as Tool Caller
participant Instance as IInstance
participant Retry as _request_with_retry
participant ExaAPI as Exa API
Caller->>Instance: exa_search(args with query)
Instance->>Instance: normalize input, validate query
Instance->>Instance: apply IGlobal defaults (numResults, searchType, includeText, etc.)
Instance->>Retry: prepare POST payload to /search
Retry->>ExaAPI: send request
ExaAPI-->>Retry: 429 / 5xx / timeout
Retry->>Retry: exponential backoff retries
Retry->>ExaAPI: retry request
ExaAPI-->>Retry: 200 OK with results
Retry-->>Instance: return response
Instance->>Instance: parse & normalize results
Instance-->>Caller: return structured {success, query, results, num_results}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/tool_exa_search/exa_driver.py`:
- Around line 325-327: The warning currently logs the raw malformed payload via
input_obj!r; change the log to avoid leaking user data by removing the raw
object and only emitting bounded metadata (e.g., the input type and size) —
update the handler around the input type check that calls warning(...)
(referencing input_obj and the warning call in the exa_search logic) to log
something like the type name and a safe length/count instead of the full repr,
then return the empty dict as before.
- Around line 147-170: The _tool_validate method currently expects
already-normalized input but ToolsBase.handle_invoke routes validate calls
directly to _tool_validate; modify _tool_validate to call _normalize_tool_input
on its input_obj (same normalization used in _tool_invoke) before performing
validation so JSON strings, Pydantic models, and wrapped {'input': ...} payloads
are accepted consistently; update references in _tool_validate to use the
normalized input and keep _tool_invoke's normalization unchanged (preserve
_normalize_tool_input, _tool_validate, and _tool_invoke symbols).
In `@nodes/src/nodes/tool_exa_search/IGlobal.py`:
- Around line 50-55: The code currently only reads cfg.get('apikey') causing the
documented EXA_API_KEY env fallback to be ignored; update both the apikey
retrieval sites (in beginGlobal() and validateConfig()) to fall back to
os.environ.get('EXA_API_KEY') when cfg.get('apikey') is empty: compute apikey =
str((cfg.get('apikey') or os.environ.get('EXA_API_KEY') or '')).strip() (or
equivalent), preserve the existing exception raise('tool_exa_search: apikey is
required') if still empty, and ensure you import os if necessary; references:
Config.getNodeConfig, beginGlobal(), validateConfig(), and the local variable
apikey.
In `@nodes/src/nodes/tool_exa_search/services.json`:
- Around line 68-72: Update the "tool_exa_search.includeText" description to
reflect that the driver requests inline contents.text on the search call rather
than using a separate get_contents flow; locate the
"tool_exa_search.includeText" entry in services.json and replace the misleading
sentence about "uses get_contents" with wording that says results include inline
contents.text (or similar) so the UI copy matches the implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8021ca67-25da-40c7-93a9-bff83dd31661
📒 Files selected for processing (6)
nodes/src/nodes/tool_exa_search/IGlobal.pynodes/src/nodes/tool_exa_search/IInstance.pynodes/src/nodes/tool_exa_search/__init__.pynodes/src/nodes/tool_exa_search/exa_driver.pynodes/src/nodes/tool_exa_search/requirements.txtnodes/src/nodes/tool_exa_search/services.json
| "tool_exa_search.includeText": { | ||
| "type": "boolean", | ||
| "title": "Include Text Content", | ||
| "description": "Include full text content in results (uses get_contents)", | ||
| "default": true, |
There was a problem hiding this comment.
Update the includeText description to match the implementation.
The driver requests inline contents.text on the search call and never uses a separate get_contents flow, so this UI copy is misleading.
🛠️ Suggested fix
"tool_exa_search.includeText": {
"type": "boolean",
"title": "Include Text Content",
- "description": "Include full text content in results (uses get_contents)",
+ "description": "Include full text content in results",
"default": true,
"enum": [📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "tool_exa_search.includeText": { | |
| "type": "boolean", | |
| "title": "Include Text Content", | |
| "description": "Include full text content in results (uses get_contents)", | |
| "default": true, | |
| "tool_exa_search.includeText": { | |
| "type": "boolean", | |
| "title": "Include Text Content", | |
| "description": "Include full text content in results", | |
| "default": true, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/tool_exa_search/services.json` around lines 68 - 72, Update
the "tool_exa_search.includeText" description to reflect that the driver
requests inline contents.text on the search call rather than using a separate
get_contents flow; locate the "tool_exa_search.includeText" entry in
services.json and replace the misleading sentence about "uses get_contents" with
wording that says results include inline contents.text (or similar) so the UI
copy matches the implementation.
There was a problem hiding this comment.
Fixed. Updated the description to "Include full text content in results" — removed the stale "uses get_contents" reference since the driver uses inline contents.text on the search call.
There was a problem hiding this comment.
Rate Limit Exceeded
@charliegillet have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 10 seconds before sending another message.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
nodes/src/nodes/tool_exa_search/exa_driver.py (2)
146-169:⚠️ Potential issue | 🟠 MajorNormalize
tool.validateinputs too.
ToolsBase.handle_invoke()sends rawparam['input']straight here. Right nowtool.validaterejects JSON strings, Pydantic models, and wrapped{'input': {...}}payloads thattool.invokeaccepts on Line 168, so the two entrypoints disagree on the same tool contract.🛠️ Proposed fix
def _tool_validate(self, *, tool_name: str, input_obj: Any) -> None: # noqa: ANN401 + input_obj = _normalize_tool_input(input_obj) + bare = self._bare_name(tool_name) if bare != 'exa_search': raise ValueError(f'Unknown tool {tool_name!r} (expected exa_search)')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 146 - 169, The validator currently rejects inputs that _tool_invoke accepts because it doesn't normalize inputs first; update _tool_validate to call _normalize_tool_input(input_obj) at the start (or ensure ToolsBase.handle_invoke normalizes before calling _tool_validate) so JSON strings, Pydantic models, and wrapped {'input': {...}} payloads are normalized before validation; reference _tool_validate, _tool_invoke, and _normalize_tool_input when making the change.
330-332:⚠️ Potential issue | 🟠 MajorDo not log raw malformed tool input.
This warning still includes
input_obj!r. Search payloads can contain user queries or other sensitive context, so a bad request becomes a log-retention leak.🛠️ Proposed fix
if not isinstance(input_obj, dict): - warning(f'exa_search: unexpected input type {type(input_obj).__name__}: {input_obj!r}') + size = len(input_obj) if hasattr(input_obj, '__len__') else 'n/a' + warning(f'exa_search: unexpected input type {type(input_obj).__name__} (size={size})') return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 330 - 332, The warning in exa_driver.py leaks raw payloads via input_obj!r; change the warning in the input-type check (the block that calls warning when not isinstance(input_obj, dict)) to stop including the raw input object and instead log only safe metadata (e.g., the type name and a redacted or length indicator like "<REDACTED_PAYLOAD>" or f'payload_length={len(...) if available}') so sensitive search queries are never written to logs; update the call site that currently references input_obj, leaving the check and warning invocation intact but replacing the formatted raw object with a non-sensitive placeholder.nodes/src/nodes/tool_exa_search/services.json (1)
70-74:⚠️ Potential issue | 🟡 MinorUpdate the
includeTexthelp text.
nodes/src/nodes/tool_exa_search/exa_driver.pyLines 209-213 request inlinecontents.texton the search call; there is no separateget_contentsflow here. The current description is misleading.🛠️ Proposed fix
"tool_exa_search.includeText": { "type": "boolean", "title": "Include Text Content", - "description": "Include full text content in results (uses get_contents)", + "description": "Include full text content inline in search results", "default": true, "enum": [🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/tool_exa_search/services.json` around lines 70 - 74, Update the "tool_exa_search.includeText" description to accurately reflect that the search flow requests inline contents.text rather than using a separate get_contents call; replace the misleading "uses get_contents" text with something like "Include full text content inline in search results (requests contents.text on the search call)". Ensure the change is made on the services.json entry for tool_exa_search.includeText and cross-check the corresponding behavior in exa_driver.py where the search call requests contents.text to keep wording consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/tool_exa_search/services.json`:
- Around line 41-47: The JSON schema property "tool_exa_search.numResults"
currently uses "type": "number" which allows fractions but the code in
IGlobal.py casts this value with int(...); update the schema to use "type":
"integer" (keep "minimum": 1, "maximum": 50 and "default": 10) so the UI
validation rejects fractional values and matches the runtime behavior of the
cast in IGlobal.py.
---
Duplicate comments:
In `@nodes/src/nodes/tool_exa_search/exa_driver.py`:
- Around line 146-169: The validator currently rejects inputs that _tool_invoke
accepts because it doesn't normalize inputs first; update _tool_validate to call
_normalize_tool_input(input_obj) at the start (or ensure ToolsBase.handle_invoke
normalizes before calling _tool_validate) so JSON strings, Pydantic models, and
wrapped {'input': {...}} payloads are normalized before validation; reference
_tool_validate, _tool_invoke, and _normalize_tool_input when making the change.
- Around line 330-332: The warning in exa_driver.py leaks raw payloads via
input_obj!r; change the warning in the input-type check (the block that calls
warning when not isinstance(input_obj, dict)) to stop including the raw input
object and instead log only safe metadata (e.g., the type name and a redacted or
length indicator like "<REDACTED_PAYLOAD>" or f'payload_length={len(...) if
available}') so sensitive search queries are never written to logs; update the
call site that currently references input_obj, leaving the check and warning
invocation intact but replacing the formatted raw object with a non-sensitive
placeholder.
In `@nodes/src/nodes/tool_exa_search/services.json`:
- Around line 70-74: Update the "tool_exa_search.includeText" description to
accurately reflect that the search flow requests inline contents.text rather
than using a separate get_contents call; replace the misleading "uses
get_contents" text with something like "Include full text content inline in
search results (requests contents.text on the search call)". Ensure the change
is made on the services.json entry for tool_exa_search.includeText and
cross-check the corresponding behavior in exa_driver.py where the search call
requests contents.text to keep wording consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 46fd30ad-e718-4a6b-8e85-b8e68a7f790c
⛔ Files ignored due to path filters (1)
packages/shared-ui/src/assets/nodes/exa.svgis excluded by!**/*.svg
📒 Files selected for processing (2)
nodes/src/nodes/tool_exa_search/exa_driver.pynodes/src/nodes/tool_exa_search/services.json
| query = input_obj.get('query') | ||
| if not query or not isinstance(query, str) or not query.strip(): | ||
| raise ValueError('query is required and must be a non-empty string') | ||
|
|
||
| search_type = input_obj.get('type') | ||
| if search_type is not None and search_type not in VALID_SEARCH_TYPES: | ||
| raise ValueError(f'type must be one of {sorted(VALID_SEARCH_TYPES)}; got {search_type!r}') | ||
|
|
||
| num_results = input_obj.get('num_results') | ||
| if num_results is not None: | ||
| if not isinstance(num_results, int) or num_results < 1 or num_results > 50: | ||
| raise ValueError('num_results must be an integer between 1 and 50') |
There was a problem hiding this comment.
Validate the rest of the advertised tool schema.
INPUT_SCHEMA exposes use_autoprompt, include_text, the domain filters, and the published-date bounds, but _tool_validate() never checks their shapes. Bad values then either get silently dropped in _invoke_search() or forwarded upstream, so tool.validate can return success for requests that will not execute as requested.
🛠️ Proposed fix
num_results = input_obj.get('num_results')
if num_results is not None:
if not isinstance(num_results, int) or num_results < 1 or num_results > 50:
raise ValueError('num_results must be an integer between 1 and 50')
+
+ for field in ('use_autoprompt', 'include_text'):
+ value = input_obj.get(field)
+ if value is not None and not isinstance(value, bool):
+ raise ValueError(f'{field} must be a boolean')
+
+ for field in ('include_domains', 'exclude_domains'):
+ value = input_obj.get(field)
+ if value is not None and (
+ not isinstance(value, list)
+ or any(not isinstance(item, str) or not item.strip() for item in value)
+ ):
+ raise ValueError(f'{field} must be a list of non-empty strings')
+
+ for field in ('start_published_date', 'end_published_date'):
+ value = input_obj.get(field)
+ if value is not None and (not isinstance(value, str) or not value.strip()):
+ raise ValueError(f'{field} must be a non-empty string')🧰 Tools
🪛 Ruff (0.15.7)
[warning] 156-156: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 160-160: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 163-164: Use a single if statement instead of nested if statements
(SIM102)
[warning] 165-165: Avoid specifying long messages outside the exception class
(TRY003)
There was a problem hiding this comment.
Fixed. _tool_validate() now validates all advertised schema fields: use_autoprompt (boolean), include_text (boolean), include_domains (array of strings), exclude_domains (array of strings), start_published_date (string), and end_published_date (string).
There was a problem hiding this comment.
Rate Limit Exceeded
@charliegillet have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 7 seconds before sending another message.
| "tool_exa_search.numResults": { | ||
| "type": "number", | ||
| "title": "Number of Results", | ||
| "description": "Maximum number of search results to return (1-50)", | ||
| "default": 10, | ||
| "minimum": 1, | ||
| "maximum": 50 |
There was a problem hiding this comment.
Make numResults integral in the schema.
In nodes/src/nodes/tool_exa_search/IGlobal.py Lines 54-55, this setting is immediately cast with int(...), so values like 3.9 would validate here and then be truncated to 3 at runtime. The UI schema should reject fractional values up front so config validation matches execution.
🛠️ Proposed fix
"tool_exa_search.numResults": {
- "type": "number",
+ "type": "integer",
"title": "Number of Results",
"description": "Maximum number of search results to return (1-50)",
"default": 10,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "tool_exa_search.numResults": { | |
| "type": "number", | |
| "title": "Number of Results", | |
| "description": "Maximum number of search results to return (1-50)", | |
| "default": 10, | |
| "minimum": 1, | |
| "maximum": 50 | |
| "tool_exa_search.numResults": { | |
| "type": "integer", | |
| "title": "Number of Results", | |
| "description": "Maximum number of search results to return (1-50)", | |
| "default": 10, | |
| "minimum": 1, | |
| "maximum": 50 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/tool_exa_search/services.json` around lines 41 - 47, The JSON
schema property "tool_exa_search.numResults" currently uses "type": "number"
which allows fractions but the code in IGlobal.py casts this value with
int(...); update the schema to use "type": "integer" (keep "minimum": 1,
"maximum": 50 and "default": 10) so the UI validation rejects fractional values
and matches the runtime behavior of the cast in IGlobal.py.
There was a problem hiding this comment.
Fixed. Changed numResults type from "number" to "integer" in services.json to match the int() cast at runtime.
There was a problem hiding this comment.
Rate Limit Exceeded
@charliegillet have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 5 seconds before sending another message.
asclearuc
left a comment
There was a problem hiding this comment.
Thank you for this contribution. Please note that PR #386 (search_exa) covers Exa integration and has already been approved — it is waiting to be merged.
Once #386 is merged, please rebase this PR on top of it and add what is currently missing from #386:
- Retry logic for 429, 5xx, and timeouts (as implemented in
_request_with_retryhere) - Domain filtering (
includeDomains,excludeDomains) - Date filtering (
startPublishedDate,endPublishedDate)
|
Acknowledged — will rebase on top of #386 once it's merged and add the missing retry logic, domain filtering, and date filtering on top. Thanks for the review! |
…chment Adds a new tool_exa_search node that integrates with the Exa API (exa.ai) to provide semantic web search capabilities for pipelines. Agents can invoke this tool to search the web in real time and retrieve structured results including titles, URLs, text content, relevance scores, and published dates. Closes rocketride-org#429 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The constant was defined but never referenced anywhere in the codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add "minimum": 1 and "maximum": 50 constraints to match the Python clamping logic in exa_driver.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Catch requests.RequestException and re-raise as RuntimeError with a sanitized message that omits headers. Also wrap the Timeout re-raise to avoid exposing request details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5fa0b00 to
9809906
Compare
|
@asclearuc PR #386 has now been merged and I've rebased this PR on top of it. This PR adds a |
Senior Review: feat(nodes) — Exa semantic web search nodeWhat works well
Blockers (must fix before merge)
Should fix
Nice-to-have
Solid node implementation — fix the CI failures and the security logging issue, and this is close to mergeable. |
🚀 Merge RequestGood tool node pattern following existing conventions. Before merge (blockers):
CI must be green first. |
- Normalize input in _tool_validate() for JSON strings, Pydantic models, and wrapped payloads (not just in invoke) - Remove raw input from warning log to prevent leaking sensitive data - Add EXA_API_KEY env var fallback in IGlobal beginGlobal/validateConfig - Validate all advertised schema fields (use_autoprompt, include_text, domain filters, date bounds) in _tool_validate - Change numResults type from "number" to "integer" in services.json - Fix includeText description (remove stale "uses get_contents" reference) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the heads-up about PR #386! I've addressed all the code quality feedback in the meantime so this PR is ready to rebase cleanly once #386 merges. The retry logic, domain filtering, and date filtering that this PR adds on top of #386's base implementation will be preserved during the rebase. |
|
No description provided. |
Latest fixes
|
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
nodes/src/nodes/tool_exa_search/exa_driver.py (1)
178-194:⚠️ Potential issue | 🟡 MinorReject blank domain/date filter values.
tool.validatestill accepts[''],[' '], or''here. Those either get forwarded upstream or dropped in_invoke_search(), so validation still does not match execution for these filters.💡 Suggested change
include_domains = input_obj.get('include_domains') if include_domains is not None: - if not isinstance(include_domains, list) or not all(isinstance(d, str) for d in include_domains): - raise ValueError('include_domains must be an array of strings') + if not isinstance(include_domains, list) or not all(isinstance(d, str) and d.strip() for d in include_domains): + raise ValueError('include_domains must be an array of non-empty strings') exclude_domains = input_obj.get('exclude_domains') if exclude_domains is not None: - if not isinstance(exclude_domains, list) or not all(isinstance(d, str) for d in exclude_domains): - raise ValueError('exclude_domains must be an array of strings') + if not isinstance(exclude_domains, list) or not all(isinstance(d, str) and d.strip() for d in exclude_domains): + raise ValueError('exclude_domains must be an array of non-empty strings') start_published_date = input_obj.get('start_published_date') - if start_published_date is not None and not isinstance(start_published_date, str): - raise ValueError('start_published_date must be a string in ISO 8601 format') + if start_published_date is not None and (not isinstance(start_published_date, str) or not start_published_date.strip()): + raise ValueError('start_published_date must be a non-empty string in ISO 8601 format') end_published_date = input_obj.get('end_published_date') - if end_published_date is not None and not isinstance(end_published_date, str): - raise ValueError('end_published_date must be a string in ISO 8601 format') + if end_published_date is not None and (not isinstance(end_published_date, str) or not end_published_date.strip()): + raise ValueError('end_published_date must be a non-empty string in ISO 8601 format')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 178 - 194, The current validation in include_domains, exclude_domains, start_published_date, and end_published_date (in exa_driver.py) allows blank/whitespace-only strings or lists like [''] which are invalid at runtime; update the checks in the validation block to reject empty strings and strings that are only whitespace and to reject lists that contain any such blank entries (i.e., ensure include_domains and exclude_domains are lists of non-empty trimmed strings, and start_published_date/end_published_date are non-empty trimmed strings in addition to type checks) so the behavior matches what _invoke_search() expects and tool.validate no longer permits blank values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/tool_exa_search/exa_driver.py`:
- Around line 165-168: The validation for num_results currently allows booleans
because bool is a subclass of int; update the check around the num_results
variable in exa_driver.py (the block that raises ValueError) to explicitly
reject bools — e.g., ensure the type is an int but not a bool (use either
type(num_results) is int or add "and not isinstance(num_results, bool)") before
enforcing the 1..50 range so True/False no longer pass validation.
- Around line 363-367: The current unwrapping only handles when
input_obj['input'] is a dict; update the shared normalization used by
ToolsBase/_tool_validate to recursively unwrap {'input': ...} layers and also
accept JSON strings and Pydantic-like models: while 'input' in input_obj,
extract inner = input_obj['input']; if inner is a str attempt json.loads(inner)
(fall back to the string if it fails); if inner has .dict() or is a model
instance convert to a dict via inner.dict() or vars(inner); merge extras as
existing code does and repeat until the resulting input_obj is a plain dict or
primitive. Ensure this logic is used by _tool_validate and any helper that
currently contains the shown unwrapping snippet.
In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 41-42: Add a PEP 257 single-line class docstring to the public
node entry class IInstance (which subclasses IInstanceBase and exposes IGlobal)
so it matches sibling node classes; open the class definition for IInstance and
insert a short single-quoted docstring (one sentence) immediately below the
class line, following project style (single quotes, Python 3.10+), and ensure
ruff/formatting remains clean.
---
Duplicate comments:
In `@nodes/src/nodes/tool_exa_search/exa_driver.py`:
- Around line 178-194: The current validation in include_domains,
exclude_domains, start_published_date, and end_published_date (in exa_driver.py)
allows blank/whitespace-only strings or lists like [''] which are invalid at
runtime; update the checks in the validation block to reject empty strings and
strings that are only whitespace and to reject lists that contain any such blank
entries (i.e., ensure include_domains and exclude_domains are lists of non-empty
trimmed strings, and start_published_date/end_published_date are non-empty
trimmed strings in addition to type checks) so the behavior matches what
_invoke_search() expects and tool.validate no longer permits blank values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 822aa41e-cb3f-44a3-bd91-0d2a6231305a
📒 Files selected for processing (6)
nodes/src/nodes/tool_exa_search/IGlobal.pynodes/src/nodes/tool_exa_search/IInstance.pynodes/src/nodes/tool_exa_search/__init__.pynodes/src/nodes/tool_exa_search/exa_driver.pynodes/src/nodes/tool_exa_search/requirements.txtnodes/src/nodes/tool_exa_search/services.json
| num_results = input_obj.get('num_results') | ||
| if num_results is not None: | ||
| if not isinstance(num_results, int) or num_results < 1 or num_results > 50: | ||
| raise ValueError('num_results must be an integer between 1 and 50') |
There was a problem hiding this comment.
Reject booleans for num_results.
In Python, bool is a subclass of int, so {"num_results": true} currently passes validation and gets treated as 1. That violates the declared schema and hides malformed tool calls.
💡 Suggested change
num_results = input_obj.get('num_results')
if num_results is not None:
- if not isinstance(num_results, int) or num_results < 1 or num_results > 50:
+ if isinstance(num_results, bool) or not isinstance(num_results, int) or num_results < 1 or num_results > 50:
raise ValueError('num_results must be an integer between 1 and 50')🧰 Tools
🪛 Ruff (0.15.9)
[warning] 166-167: Use a single if statement instead of nested if statements
(SIM102)
[warning] 168-168: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 165 - 168, The
validation for num_results currently allows booleans because bool is a subclass
of int; update the check around the num_results variable in exa_driver.py (the
block that raises ValueError) to explicitly reject bools — e.g., ensure the type
is an int but not a bool (use either type(num_results) is int or add "and not
isinstance(num_results, bool)") before enforcing the 1..50 range so True/False
no longer pass validation.
| # Unwrap ``{"input": {...}}`` wrappers that some framework paths leave behind | ||
| if 'input' in input_obj and isinstance(input_obj['input'], dict): | ||
| inner = input_obj['input'] | ||
| extras = {k: v for k, v in input_obj.items() if k != 'input'} | ||
| input_obj = {**inner, **extras} |
There was a problem hiding this comment.
Normalize wrapped input payloads recursively.
This only unwraps {'input': ...} when the inner value is already a dict. Payloads like {'input': '{"query":"..."}'}, {'input': model}, or nested wrappers still fail even though the helper is supposed to accept wrapped JSON/model inputs too.
💡 Suggested change
- if 'input' in input_obj and isinstance(input_obj['input'], dict):
- inner = input_obj['input']
+ if 'input' in input_obj:
+ inner = _normalize_tool_input(input_obj['input'])
extras = {k: v for k, v in input_obj.items() if k != 'input'}
input_obj = {**inner, **extras}Based on learnings, in RocketRide tool driver implementations under nodes/**/*.py that use ToolsBase, _tool_validate must accept JSON strings, Pydantic-like models, and wrapped payloads like {'input': ...} via the shared normalization helper.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 363 - 367, The
current unwrapping only handles when input_obj['input'] is a dict; update the
shared normalization used by ToolsBase/_tool_validate to recursively unwrap
{'input': ...} layers and also accept JSON strings and Pydantic-like models:
while 'input' in input_obj, extract inner = input_obj['input']; if inner is a
str attempt json.loads(inner) (fall back to the string if it fails); if inner
has .dict() or is a model instance convert to a dict via inner.dict() or
vars(inner); merge extras as existing code does and repeat until the resulting
input_obj is a plain dict or primitive. Ensure this logic is used by
_tool_validate and any helper that currently contains the shown unwrapping
snippet.
|
Responding to @nihalnihalani's CI concern: CI is now green on all 3 platforms (Ubuntu, Windows, macOS). The earlier failures were addressed in commits 9809906 (sanitized retry exceptions to prevent API key leakage) and 0360394 (review feedback including validation ordering fix). Ruff lint and format also pass clean. |
|
Thanks for the contributionю Needs to be rebased on #599's pattern before this can merge. PR #599 (da19f6b) retired the |
|
Changes requested — Hey @charliegillet — thanks so much for putting this together! Integrating Exa as an agent tool is genuinely a great idea, and I can see you've done your homework on the That said, I do need to flag a few things before we can merge — some of them are pretty significant, so let me walk through them: 🔄 The driver pattern has been supersededThis is the big one. The What I'd suggest: delete 🧪 No test configuration in
|
…review feedback - Delete exa_driver.py and move search logic into IInstance.py using @tool_function decorator, matching the tool_firecrawl pattern - Add output_schema and summary to the @tool_function decorator - Fix bare Exception in IGlobal.py: use rocketlib.error() + ValueError - Store config values directly on IGlobal instead of the driver object - Add test config with default profile to services.json
|
Thanks for the thorough review, @dsapandora! All 6 points have been addressed in 25a481c: 1. Migrate from driver pattern to
|
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
nodes/src/nodes/tool_exa_search/IInstance.py (1)
53-54:⚠️ Potential issue | 🟡 MinorAdd the missing
IInstanceclass docstring.The module docstring is present, but the public node entry class still has no PEP 257 class docstring.
Suggested fix
class IInstance(IInstanceBase): + """Node instance that exposes the Exa search tool.""" + IGlobal: IGlobalAs per coding guidelines,
nodes/**/*.py: Python pipeline nodes: use single quotes, ruff for linting/formatting, PEP 257 docstrings, target Python 3.10+.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/tool_exa_search/IInstance.py` around lines 53 - 54, Add a PEP 257 single-line class docstring to the public node entry class IInstance (which inherits IInstanceBase and references IGlobal) to describe its purpose; place the docstring immediately under the class definition using single quotes (per project style) and keep it concise (e.g., one-line summary of the class role) so ruff formatting/linting passes for Python 3.10+.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/tool_exa_search/IGlobal.py`:
- Line 63: The current assignment in IGlobal.py treats an explicit 0 as a
missing value because it uses cfg.get('numResults') or 10; change the logic so
it only defaults to 10 when cfg.get('numResults') is None (not falsy). In other
words, read the raw value from cfg.get('numResults'), check for None, then
convert to int and clamp with max(1, min(50, ...)) and assign to
self.num_results; reference the existing self.num_results assignment and
cfg.get('numResults') in your change.
In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 115-117: The input normalization currently performed inside
exa_search (via _normalize_tool_input) runs after the `@tool_function`
input_schema validation, so JSON-string or wrapped forms like {'input':
'{"query":"..."}'} are rejected before unwrapping; move or duplicate the
normalization into the pre-validation dispatch layer used by tool.invoke (or
into the `@tool_function` wrapper) so that raw args are normalized (unwrapping
nested {'input': ...}, parsing JSON strings, converting model objects to dicts)
before input_schema is applied; apply the same change for the other tool methods
referenced around the 246-275 region to ensure all tool invocations accept the
supported payload forms prior to schema validation.
---
Duplicate comments:
In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 53-54: Add a PEP 257 single-line class docstring to the public
node entry class IInstance (which inherits IInstanceBase and references IGlobal)
to describe its purpose; place the docstring immediately under the class
definition using single quotes (per project style) and keep it concise (e.g.,
one-line summary of the class role) so ruff formatting/linting passes for Python
3.10+.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fc6ba3b7-77af-4693-9256-06dd0da343d5
📒 Files selected for processing (3)
nodes/src/nodes/tool_exa_search/IGlobal.pynodes/src/nodes/tool_exa_search/IInstance.pynodes/src/nodes/tool_exa_search/services.json
| def exa_search(self, args): | ||
| """Search the web using Exa semantic search.""" | ||
| args = _normalize_tool_input(args) |
There was a problem hiding this comment.
Normalize wrapped inputs before the @tool_function schema gate.
This helper runs only after exa_search() has already been dispatched, but in this repo the decorator’s input_schema is validated at tool.invoke before the method body executes. That means the JSON-string / model / {'input': ...} forms this PR is trying to support can still be rejected before _normalize_tool_input() runs. And even when they do reach this helper, input is only unwrapped when it is already a dict, so wrappers like {'input': '{"query":"..."}'} still fall through to query is required.
Suggested helper fix for the nested-wrapper half of the problem
- if 'input' in input_obj and isinstance(input_obj['input'], dict):
- inner = input_obj['input']
+ if 'input' in input_obj:
+ inner = _normalize_tool_input(input_obj['input'])
extras = {k: v for k, v in input_obj.items() if k != 'input'}
input_obj = {**inner, **extras}You'll still need the same normalization in the pre-validation dispatch path if those non-object payload forms must remain supported. Based on learnings, input_schema declared on a tool_function decorator is validated by the framework at the tool.invoke dispatch layer before the tool method body is called.
Also applies to: 246-275
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/tool_exa_search/IInstance.py` around lines 115 - 117, The
input normalization currently performed inside exa_search (via
_normalize_tool_input) runs after the `@tool_function` input_schema validation, so
JSON-string or wrapped forms like {'input': '{"query":"..."}'} are rejected
before unwrapping; move or duplicate the normalization into the pre-validation
dispatch layer used by tool.invoke (or into the `@tool_function` wrapper) so that
raw args are normalized (unwrapping nested {'input': ...}, parsing JSON strings,
converting model objects to dicts) before input_schema is applied; apply the
same change for the other tool methods referenced around the 246-275 region to
ensure all tool invocations accept the supported payload forms prior to schema
validation.
`cfg.get('numResults') or 10` silently turned an explicit `0` into the
default of 10 before clamping. A direct config bypassing the UI schema
could therefore widen an out-of-range low value instead of clamping it
to 1. Distinguish "not set" from "set to 0" so the clamp takes effect.
`bool` is a subclass of `int` in Python, so `{'num_results': true}` was
silently accepted and clamped to 1 instead of being treated as malformed.
Explicitly reject non-int (including bool) and fall back to the configured
default. Also adds a PEP 257 class docstring to match sibling tool nodes.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
nodes/src/nodes/tool_exa_search/IInstance.py (1)
278-281:⚠️ Potential issue | 🟠 MajorRecursively normalize wrapped
inputpayloads.This still only unwraps
inputwhen it is already adict, so payloads like{'input': '{"query":"..."}'}or{'input': model}still fall through toquery is required.Based on learnings, `_normalize_tool_input` should accept JSON strings, Pydantic-like models, and wrapped payloads like `{'input': ...}` consistently.🛠 Suggested fix
- if 'input' in input_obj and isinstance(input_obj['input'], dict): - inner = input_obj['input'] + if 'input' in input_obj: + inner = _normalize_tool_input(input_obj['input']) extras = {k: v for k, v in input_obj.items() if k != 'input'} input_obj = {**inner, **extras}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/tool_exa_search/IInstance.py` around lines 278 - 281, _update the _normalize_tool_input logic to recursively unwrap wrapped payloads and accept JSON strings and Pydantic-like model objects: while 'input' in input_obj, extract inner = input_obj['input']; if inner is a str, attempt json.loads(inner) (fall back to the original string on failure); if inner has a .dict() or .to_dict() method call it to get a dict, or use vars(inner) as fallback; if inner becomes a dict merge it with any extras ({k:v for k,v in input_obj.items() if k!='input'}) to form the new input_obj and continue the loop; ensure the function returns the final normalized dict (or original value) and handles exceptions gracefully (log/raise as appropriate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 233-234: Wrap the resp.json() call in a try/except and validate
the parsed payload is a mapping before returning: inside the block that
currently calls resp.raise_for_status() and return resp.json(), catch
JSONDecodeError (or general Exception) from resp.json(), and if parsing fails or
the result is not a dict/mapping (i.e., not suitable for response.get(...) in
exa_search()), return a structured failure payload such as {"success": False,
"error": "<parse error or unexpected type>", "raw": <resp.text>} so callers like
exa_search() never receive a non-object and won't crash; update the code around
resp.raise_for_status(), resp.json() in IInstance.py accordingly.
- Line 115: The `@tool_function` decorator in IInstance.py is using an unsupported
keyword summary= which causes a TypeError; remove the summary= parameter from
the `@tool_function` decorator (use description= and input_schema/output_schema
only) so the decorated function (the tool function decorated with
`@tool_function`) registers correctly; ensure any human-facing text formerly in
summary is moved into the description argument of the same decorator or into the
function docstring.
---
Duplicate comments:
In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 278-281: _update the _normalize_tool_input logic to recursively
unwrap wrapped payloads and accept JSON strings and Pydantic-like model objects:
while 'input' in input_obj, extract inner = input_obj['input']; if inner is a
str, attempt json.loads(inner) (fall back to the original string on failure); if
inner has a .dict() or .to_dict() method call it to get a dict, or use
vars(inner) as fallback; if inner becomes a dict merge it with any extras ({k:v
for k,v in input_obj.items() if k!='input'}) to form the new input_obj and
continue the loop; ensure the function returns the final normalized dict (or
original value) and handles exceptions gracefully (log/raise as appropriate).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b8e0d999-4f87-4d5f-acc6-fa040de088b0
📒 Files selected for processing (2)
nodes/src/nodes/tool_exa_search/IGlobal.pynodes/src/nodes/tool_exa_search/IInstance.py
| }, | ||
| }, | ||
| description='Search the web using Exa semantic search. Provide a natural language query to find relevant web pages. Returns structured results with title, URL, text content, relevance score, and published date.', | ||
| summary='Searches the web using Exa semantic search API', |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C3 '^def tool_function\(' packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py
rg -n -C3 '\btool_function\b' packages/server/engine-lib/rocketlib-python/lib/rocketlib/__init__.py
rg -n -C2 'summary=' nodes/src/nodes/tool_exa_search/IInstance.pyRepository: rocketride-org/rocketride-server
Length of output: 972
🏁 Script executed:
# Get the complete tool_function signature
cat -n packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py | sed -n '65,80p'Repository: rocketride-org/rocketride-server
Length of output: 766
🏁 Script executed:
# Also search for other uses of summary= with tool_function in the repository
rg -n 'summary\s*=' nodes/src/nodes/ -A1 -B1Repository: rocketride-org/rocketride-server
Length of output: 1444
Remove the unsupported summary= parameter from the @tool_function decorator.
The tool_function signature supports only input_schema, description, and output_schema. Passing summary= raises TypeError for an unexpected keyword argument, preventing the node from registering.
Fix
- summary='Searches the web using Exa semantic search API',📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| summary='Searches the web using Exa semantic search API', | |
| description='Searches the web using Exa semantic search API', |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/tool_exa_search/IInstance.py` at line 115, The `@tool_function`
decorator in IInstance.py is using an unsupported keyword summary= which causes
a TypeError; remove the summary= parameter from the `@tool_function` decorator
(use description= and input_schema/output_schema only) so the decorated function
(the tool function decorated with `@tool_function`) registers correctly; ensure
any human-facing text formerly in summary is moved into the description argument
of the same decorator or into the function docstring.
| resp.raise_for_status() | ||
| return resp.json() |
There was a problem hiding this comment.
Validate the success payload before returning it.
resp.json() can either raise or return a non-object body. Both cases currently bypass the structured success: False path and can crash exa_search() on response.get(...).
🛠 Suggested fix
resp.raise_for_status()
- return resp.json()
+ try:
+ data = resp.json()
+ except ValueError:
+ raise RuntimeError('Exa search: invalid JSON response from API') from None
+ if not isinstance(data, dict):
+ raise RuntimeError('Exa search: unexpected response shape from API')
+ return data🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/tool_exa_search/IInstance.py` around lines 233 - 234, Wrap
the resp.json() call in a try/except and validate the parsed payload is a
mapping before returning: inside the block that currently calls
resp.raise_for_status() and return resp.json(), catch JSONDecodeError (or
general Exception) from resp.json(), and if parsing fails or the result is not a
dict/mapping (i.e., not suitable for response.get(...) in exa_search()), return
a structured failure payload such as {"success": False, "error": "<parse error
or unexpected type>", "raw": <resp.text>} so callers like exa_search() never
receive a non-object and won't crash; update the code around
resp.raise_for_status(), resp.json() in IInstance.py accordingly.
|
Thanks for the thorough review! Status on each blocker as of 9ead2ec:
|
|
Thanks — fully addressed in 25a481c. |
|
Really appreciate the detailed walkthrough — every item you flagged is addressed, mostly by the
Also picked up two follow-on items from this pass:
Let me know if you want anything else tightened before merge. |
Summary
tool_exa_searchpipeline node that integrates with the Exa API to provide semantic web search as an agent toolType
Feature
Why this feature fits this codebase
RocketRide's tool node system follows a consistent pattern where each tool lives under
nodes/src/nodes/tool_*withIGlobal.pyfor shared state,IInstance.pyfor per-invocation logic, a driver class extendingai.common.tools.ToolsBase, and aservices.jsonfor UI/config registration. The existingtool_firecrawlandtool_http_requestnodes demonstrate this exact pattern. The newtool_exa_searchnode plugs into this architecture:IGlobal.beginGlobal()reads config viaConfig.getNodeConfig()and creates anExaSearchDriver,IInstance.invoke()delegates todriver.handle_invoke(), and the driver's_tool_query()/_tool_validate()/_tool_invoke()hooks implement theToolsBaseinterface so the engine can discover and call the tool. Theservices.jsonregisters the node withclassType: ["tool"],capabilities: ["invoke"], andregister: "filter"— matching the conventions of every other tool node. This gives agents real-time web search without any new framework plumbing.What changed
nodes/src/nodes/tool_exa_search/services.json— Node definition with config fields (API key, numResults, useAutoprompt, searchType, includeText), UI shape, preconfig profile, andexa.svgicon referencenodes/src/nodes/tool_exa_search/__init__.py— Module entry point exportingIGlobalandIInstancenodes/src/nodes/tool_exa_search/IGlobal.py— Global state: reads config, validates API key, createsExaSearchDriverinstancenodes/src/nodes/tool_exa_search/IInstance.py— Instance: delegatesinvoke()to the driver'shandle_invoke()nodes/src/nodes/tool_exa_search/exa_driver.py—ToolsBaseimplementation with_tool_query()(tool descriptor + input schema),_tool_validate()(input validation),_tool_invoke()(Exa API POST with retry logic for 429/5xx/timeouts, result parsing into structured dicts), and_normalize_tool_input()helper for Pydantic/JSON/wrapper unwrappingnodes/src/nodes/tool_exa_search/requirements.txt— Declaresrequestsdependencypackages/shared-ui/src/assets/nodes/exa.svg— SVG icon for the node in the pipeline builder UIValidation
validateConfigpasses (no warning toast)exa_searchwith a query like "latest advances in LLM reasoning" — confirm structured results with title, url, score, and text fields are returnedinclude_domains: ["arxiv.org"]and verify only arxiv results appearstart_published_date: "2025-01-01"and verify no older resultsruff checkandruff format --checkon the new files — should pass cleanlyHow this could be extended
The
ExaSearchDriverpattern can be reused for other search APIs (e.g., Tavily, Serper, Brave Search) by swapping the API endpoint, payload format, and result parsing in a new driver class while keeping the sameIGlobal/IInstancescaffold. The_request_with_retryhelper is generic and could be extracted into a shared utility for any tool node that calls external HTTP APIs.Closes #429
#Hack-with-bay-2
Summary by CodeRabbit