Skip to content

feat(nodes): add Exa semantic web search node for real-time data enrichment#509

Open
charliegillet wants to merge 8 commits intorocketride-org:developfrom
charliegillet:feature/exa-search-node
Open

feat(nodes): add Exa semantic web search node for real-time data enrichment#509
charliegillet wants to merge 8 commits intorocketride-org:developfrom
charliegillet:feature/exa-search-node

Conversation

@charliegillet
Copy link
Copy Markdown
Contributor

@charliegillet charliegillet commented Mar 30, 2026

Summary

  • Add a new tool_exa_search pipeline node that integrates with the Exa API to provide semantic web search as an agent tool
  • Agents can invoke it with natural language queries to retrieve real-time web results including titles, URLs, full text content, relevance scores, and published dates
  • Supports configurable search parameters: search type (auto/neural/keyword), autoprompt optimization, domain filtering, date range filtering, and result count

Type

Feature

Why this feature fits this codebase

RocketRide's tool node system follows a consistent pattern where each tool lives under nodes/src/nodes/tool_* with IGlobal.py for shared state, IInstance.py for per-invocation logic, a driver class extending ai.common.tools.ToolsBase, and a services.json for UI/config registration. The existing tool_firecrawl and tool_http_request nodes demonstrate this exact pattern. The new tool_exa_search node plugs into this architecture: IGlobal.beginGlobal() reads config via Config.getNodeConfig() and creates an ExaSearchDriver, IInstance.invoke() delegates to driver.handle_invoke(), and the driver's _tool_query() / _tool_validate() / _tool_invoke() hooks implement the ToolsBase interface so the engine can discover and call the tool. The services.json registers the node with classType: ["tool"], capabilities: ["invoke"], and register: "filter" — matching the conventions of every other tool node. This gives agents real-time web search without any new framework plumbing.

What changed

  • nodes/src/nodes/tool_exa_search/services.json — Node definition with config fields (API key, numResults, useAutoprompt, searchType, includeText), UI shape, preconfig profile, and exa.svg icon reference
  • nodes/src/nodes/tool_exa_search/__init__.py — Module entry point exporting IGlobal and IInstance
  • nodes/src/nodes/tool_exa_search/IGlobal.py — Global state: reads config, validates API key, creates ExaSearchDriver instance
  • nodes/src/nodes/tool_exa_search/IInstance.py — Instance: delegates invoke() to the driver's handle_invoke()
  • nodes/src/nodes/tool_exa_search/exa_driver.pyToolsBase implementation with _tool_query() (tool descriptor + input schema), _tool_validate() (input validation), _tool_invoke() (Exa API POST with retry logic for 429/5xx/timeouts, result parsing into structured dicts), and _normalize_tool_input() helper for Pydantic/JSON/wrapper unwrapping
  • nodes/src/nodes/tool_exa_search/requirements.txt — Declares requests dependency
  • packages/shared-ui/src/assets/nodes/exa.svg — SVG icon for the node in the pipeline builder UI

Validation

  • Load the pipeline builder and confirm the "Exa Search" node appears in the tool category
  • Configure the node with a valid Exa API key and verify validateConfig passes (no warning toast)
  • Wire an agent node to the Exa tool and invoke exa_search with a query like "latest advances in LLM reasoning" — confirm structured results with title, url, score, and text fields are returned
  • Test domain filters: invoke with include_domains: ["arxiv.org"] and verify only arxiv results appear
  • Test date filters: set start_published_date: "2025-01-01" and verify no older results
  • Test error handling: remove API key and confirm a clear error; trigger rate limiting and verify retry with exponential backoff (2s, 4s, 8s)
  • Run ruff check and ruff format --check on the new files — should pass cleanly

How this could be extended

The ExaSearchDriver pattern can be reused for other search APIs (e.g., Tavily, Serper, Brave Search) by swapping the API endpoint, payload format, and result parsing in a new driver class while keeping the same IGlobal/IInstance scaffold. The _request_with_retry helper is generic and could be extracted into a shared utility for any tool node that calls external HTTP APIs.

Closes #429

#Hack-with-bay-2

Summary by CodeRabbit

  • New Features
    • Exa Search tool: semantic web search with configurable API key, result count (1–50), search type, autoprompt, and optional full-text inclusion
    • Returns structured results: title, URL, score, publication date, author, and optional text
    • Built-in retries for transient errors and a test configuration/profile for tool setup and validation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Exa Search tool node (tool_exa_search) with global configuration (API key, defaults), an IInstance tool implementation that performs POST /search with retry and response normalization, package exports, a runtime dependency, and service metadata including UI configuration and test profile.

Changes

Cohort / File(s) Summary
Core Implementation
nodes/src/nodes/tool_exa_search/IGlobal.py, nodes/src/nodes/tool_exa_search/IInstance.py
Adds IGlobal to hold shared state (apikey, num_results, use_autoprompt, search_type, include_text) with lifecycle methods; implements IInstance.exa_search tool with input normalization, validation, parameter resolution from globals, request retry logic (429/5xx/timeouts with backoff), response parsing and normalized result output.
Package Exports
nodes/src/nodes/tool_exa_search/__init__.py
Introduces package-level exports for IGlobal and IInstance via __all__.
Dependencies & Service Metadata
nodes/src/nodes/tool_exa_search/requirements.txt, nodes/src/nodes/tool_exa_search/services.json
Adds requests dependency and a service definition registering tool_exa_search:// with configuration (preconfig/test), UI field descriptions/validators, and a Pipe UI shape for the tool.

Sequence Diagram

sequenceDiagram
    participant Caller as Tool Caller
    participant Instance as IInstance
    participant Retry as _request_with_retry
    participant ExaAPI as Exa API

    Caller->>Instance: exa_search(args with query)
    Instance->>Instance: normalize input, validate query
    Instance->>Instance: apply IGlobal defaults (numResults, searchType, includeText, etc.)
    Instance->>Retry: prepare POST payload to /search
    Retry->>ExaAPI: send request
    ExaAPI-->>Retry: 429 / 5xx / timeout
    Retry->>Retry: exponential backoff retries
    Retry->>ExaAPI: retry request
    ExaAPI-->>Retry: 200 OK with results
    Retry-->>Instance: return response
    Instance->>Instance: parse & normalize results
    Instance-->>Caller: return structured {success, query, results, num_results}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇
I sniff the web with eager hops,
Retries and configs in my little hops,
Titles, URLs, and dates I bring,
A rabbit's search that makes data sing,
Hop on—Exa finds the things!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding an Exa semantic web search node for real-time data enrichment, matching the PR's primary objective.
Linked Issues check ✅ Passed The PR implements all coding requirements from #429: semantic web search via @tool_function pattern, structured results (title, URL, text, score, date), configurable parameters (searchType, autoprompt, includeText, numResults, filters), and node system integration.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the Exa search node: node implementation files (IGlobal.py, IInstance.py), package configuration (init.py), dependencies (requirements.txt), service definition (services.json), and referenced icon. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/tool_exa_search/exa_driver.py`:
- Around line 325-327: The warning currently logs the raw malformed payload via
input_obj!r; change the log to avoid leaking user data by removing the raw
object and only emitting bounded metadata (e.g., the input type and size) —
update the handler around the input type check that calls warning(...)
(referencing input_obj and the warning call in the exa_search logic) to log
something like the type name and a safe length/count instead of the full repr,
then return the empty dict as before.
- Around line 147-170: The _tool_validate method currently expects
already-normalized input but ToolsBase.handle_invoke routes validate calls
directly to _tool_validate; modify _tool_validate to call _normalize_tool_input
on its input_obj (same normalization used in _tool_invoke) before performing
validation so JSON strings, Pydantic models, and wrapped {'input': ...} payloads
are accepted consistently; update references in _tool_validate to use the
normalized input and keep _tool_invoke's normalization unchanged (preserve
_normalize_tool_input, _tool_validate, and _tool_invoke symbols).

In `@nodes/src/nodes/tool_exa_search/IGlobal.py`:
- Around line 50-55: The code currently only reads cfg.get('apikey') causing the
documented EXA_API_KEY env fallback to be ignored; update both the apikey
retrieval sites (in beginGlobal() and validateConfig()) to fall back to
os.environ.get('EXA_API_KEY') when cfg.get('apikey') is empty: compute apikey =
str((cfg.get('apikey') or os.environ.get('EXA_API_KEY') or '')).strip() (or
equivalent), preserve the existing exception raise('tool_exa_search: apikey is
required') if still empty, and ensure you import os if necessary; references:
Config.getNodeConfig, beginGlobal(), validateConfig(), and the local variable
apikey.

In `@nodes/src/nodes/tool_exa_search/services.json`:
- Around line 68-72: Update the "tool_exa_search.includeText" description to
reflect that the driver requests inline contents.text on the search call rather
than using a separate get_contents flow; locate the
"tool_exa_search.includeText" entry in services.json and replace the misleading
sentence about "uses get_contents" with wording that says results include inline
contents.text (or similar) so the UI copy matches the implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8021ca67-25da-40c7-93a9-bff83dd31661

📥 Commits

Reviewing files that changed from the base of the PR and between b6b2dc1 and b68456a.

📒 Files selected for processing (6)
  • nodes/src/nodes/tool_exa_search/IGlobal.py
  • nodes/src/nodes/tool_exa_search/IInstance.py
  • nodes/src/nodes/tool_exa_search/__init__.py
  • nodes/src/nodes/tool_exa_search/exa_driver.py
  • nodes/src/nodes/tool_exa_search/requirements.txt
  • nodes/src/nodes/tool_exa_search/services.json

Comment on lines +68 to +72
"tool_exa_search.includeText": {
"type": "boolean",
"title": "Include Text Content",
"description": "Include full text content in results (uses get_contents)",
"default": true,
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the includeText description to match the implementation.

The driver requests inline contents.text on the search call and never uses a separate get_contents flow, so this UI copy is misleading.

🛠️ Suggested fix
 		"tool_exa_search.includeText": {
 			"type": "boolean",
 			"title": "Include Text Content",
-			"description": "Include full text content in results (uses get_contents)",
+			"description": "Include full text content in results",
 			"default": true,
 			"enum": [
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"tool_exa_search.includeText": {
"type": "boolean",
"title": "Include Text Content",
"description": "Include full text content in results (uses get_contents)",
"default": true,
"tool_exa_search.includeText": {
"type": "boolean",
"title": "Include Text Content",
"description": "Include full text content in results",
"default": true,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/services.json` around lines 68 - 72, Update
the "tool_exa_search.includeText" description to reflect that the driver
requests inline contents.text on the search call rather than using a separate
get_contents flow; locate the "tool_exa_search.includeText" entry in
services.json and replace the misleading sentence about "uses get_contents" with
wording that says results include inline contents.text (or similar) so the UI
copy matches the implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Updated the description to "Include full text content in results" — removed the stale "uses get_contents" reference since the driver uses inline contents.text on the search call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@charliegillet have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 10 seconds before sending another message.

@github-actions github-actions bot added the module:ui Chat UI and Dropper UI label Mar 30, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
nodes/src/nodes/tool_exa_search/exa_driver.py (2)

146-169: ⚠️ Potential issue | 🟠 Major

Normalize tool.validate inputs too.

ToolsBase.handle_invoke() sends raw param['input'] straight here. Right now tool.validate rejects JSON strings, Pydantic models, and wrapped {'input': {...}} payloads that tool.invoke accepts on Line 168, so the two entrypoints disagree on the same tool contract.

🛠️ Proposed fix
     def _tool_validate(self, *, tool_name: str, input_obj: Any) -> None:  # noqa: ANN401
+        input_obj = _normalize_tool_input(input_obj)
+
         bare = self._bare_name(tool_name)
         if bare != 'exa_search':
             raise ValueError(f'Unknown tool {tool_name!r} (expected exa_search)')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 146 - 169, The
validator currently rejects inputs that _tool_invoke accepts because it doesn't
normalize inputs first; update _tool_validate to call
_normalize_tool_input(input_obj) at the start (or ensure ToolsBase.handle_invoke
normalizes before calling _tool_validate) so JSON strings, Pydantic models, and
wrapped {'input': {...}} payloads are normalized before validation; reference
_tool_validate, _tool_invoke, and _normalize_tool_input when making the change.

330-332: ⚠️ Potential issue | 🟠 Major

Do not log raw malformed tool input.

This warning still includes input_obj!r. Search payloads can contain user queries or other sensitive context, so a bad request becomes a log-retention leak.

🛠️ Proposed fix
     if not isinstance(input_obj, dict):
-        warning(f'exa_search: unexpected input type {type(input_obj).__name__}: {input_obj!r}')
+        size = len(input_obj) if hasattr(input_obj, '__len__') else 'n/a'
+        warning(f'exa_search: unexpected input type {type(input_obj).__name__} (size={size})')
         return {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 330 - 332, The
warning in exa_driver.py leaks raw payloads via input_obj!r; change the warning
in the input-type check (the block that calls warning when not
isinstance(input_obj, dict)) to stop including the raw input object and instead
log only safe metadata (e.g., the type name and a redacted or length indicator
like "<REDACTED_PAYLOAD>" or f'payload_length={len(...) if available}') so
sensitive search queries are never written to logs; update the call site that
currently references input_obj, leaving the check and warning invocation intact
but replacing the formatted raw object with a non-sensitive placeholder.
nodes/src/nodes/tool_exa_search/services.json (1)

70-74: ⚠️ Potential issue | 🟡 Minor

Update the includeText help text.

nodes/src/nodes/tool_exa_search/exa_driver.py Lines 209-213 request inline contents.text on the search call; there is no separate get_contents flow here. The current description is misleading.

🛠️ Proposed fix
 		"tool_exa_search.includeText": {
 			"type": "boolean",
 			"title": "Include Text Content",
-			"description": "Include full text content in results (uses get_contents)",
+			"description": "Include full text content inline in search results",
 			"default": true,
 			"enum": [
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/services.json` around lines 70 - 74, Update
the "tool_exa_search.includeText" description to accurately reflect that the
search flow requests inline contents.text rather than using a separate
get_contents call; replace the misleading "uses get_contents" text with
something like "Include full text content inline in search results (requests
contents.text on the search call)". Ensure the change is made on the
services.json entry for tool_exa_search.includeText and cross-check the
corresponding behavior in exa_driver.py where the search call requests
contents.text to keep wording consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/tool_exa_search/services.json`:
- Around line 41-47: The JSON schema property "tool_exa_search.numResults"
currently uses "type": "number" which allows fractions but the code in
IGlobal.py casts this value with int(...); update the schema to use "type":
"integer" (keep "minimum": 1, "maximum": 50 and "default": 10) so the UI
validation rejects fractional values and matches the runtime behavior of the
cast in IGlobal.py.

---

Duplicate comments:
In `@nodes/src/nodes/tool_exa_search/exa_driver.py`:
- Around line 146-169: The validator currently rejects inputs that _tool_invoke
accepts because it doesn't normalize inputs first; update _tool_validate to call
_normalize_tool_input(input_obj) at the start (or ensure ToolsBase.handle_invoke
normalizes before calling _tool_validate) so JSON strings, Pydantic models, and
wrapped {'input': {...}} payloads are normalized before validation; reference
_tool_validate, _tool_invoke, and _normalize_tool_input when making the change.
- Around line 330-332: The warning in exa_driver.py leaks raw payloads via
input_obj!r; change the warning in the input-type check (the block that calls
warning when not isinstance(input_obj, dict)) to stop including the raw input
object and instead log only safe metadata (e.g., the type name and a redacted or
length indicator like "<REDACTED_PAYLOAD>" or f'payload_length={len(...) if
available}') so sensitive search queries are never written to logs; update the
call site that currently references input_obj, leaving the check and warning
invocation intact but replacing the formatted raw object with a non-sensitive
placeholder.

In `@nodes/src/nodes/tool_exa_search/services.json`:
- Around line 70-74: Update the "tool_exa_search.includeText" description to
accurately reflect that the search flow requests inline contents.text rather
than using a separate get_contents call; replace the misleading "uses
get_contents" text with something like "Include full text content inline in
search results (requests contents.text on the search call)". Ensure the change
is made on the services.json entry for tool_exa_search.includeText and
cross-check the corresponding behavior in exa_driver.py where the search call
requests contents.text to keep wording consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 46fd30ad-e718-4a6b-8e85-b8e68a7f790c

📥 Commits

Reviewing files that changed from the base of the PR and between b68456a and 5fa0b00.

⛔ Files ignored due to path filters (1)
  • packages/shared-ui/src/assets/nodes/exa.svg is excluded by !**/*.svg
📒 Files selected for processing (2)
  • nodes/src/nodes/tool_exa_search/exa_driver.py
  • nodes/src/nodes/tool_exa_search/services.json

Comment on lines +154 to +165
query = input_obj.get('query')
if not query or not isinstance(query, str) or not query.strip():
raise ValueError('query is required and must be a non-empty string')

search_type = input_obj.get('type')
if search_type is not None and search_type not in VALID_SEARCH_TYPES:
raise ValueError(f'type must be one of {sorted(VALID_SEARCH_TYPES)}; got {search_type!r}')

num_results = input_obj.get('num_results')
if num_results is not None:
if not isinstance(num_results, int) or num_results < 1 or num_results > 50:
raise ValueError('num_results must be an integer between 1 and 50')
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate the rest of the advertised tool schema.

INPUT_SCHEMA exposes use_autoprompt, include_text, the domain filters, and the published-date bounds, but _tool_validate() never checks their shapes. Bad values then either get silently dropped in _invoke_search() or forwarded upstream, so tool.validate can return success for requests that will not execute as requested.

🛠️ Proposed fix
         num_results = input_obj.get('num_results')
         if num_results is not None:
             if not isinstance(num_results, int) or num_results < 1 or num_results > 50:
                 raise ValueError('num_results must be an integer between 1 and 50')
+
+        for field in ('use_autoprompt', 'include_text'):
+            value = input_obj.get(field)
+            if value is not None and not isinstance(value, bool):
+                raise ValueError(f'{field} must be a boolean')
+
+        for field in ('include_domains', 'exclude_domains'):
+            value = input_obj.get(field)
+            if value is not None and (
+                not isinstance(value, list)
+                or any(not isinstance(item, str) or not item.strip() for item in value)
+            ):
+                raise ValueError(f'{field} must be a list of non-empty strings')
+
+        for field in ('start_published_date', 'end_published_date'):
+            value = input_obj.get(field)
+            if value is not None and (not isinstance(value, str) or not value.strip()):
+                raise ValueError(f'{field} must be a non-empty string')
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 156-156: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 160-160: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 163-164: Use a single if statement instead of nested if statements

(SIM102)


[warning] 165-165: Avoid specifying long messages outside the exception class

(TRY003)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. _tool_validate() now validates all advertised schema fields: use_autoprompt (boolean), include_text (boolean), include_domains (array of strings), exclude_domains (array of strings), start_published_date (string), and end_published_date (string).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@charliegillet have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 7 seconds before sending another message.

Comment on lines +41 to +47
"tool_exa_search.numResults": {
"type": "number",
"title": "Number of Results",
"description": "Maximum number of search results to return (1-50)",
"default": 10,
"minimum": 1,
"maximum": 50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make numResults integral in the schema.

In nodes/src/nodes/tool_exa_search/IGlobal.py Lines 54-55, this setting is immediately cast with int(...), so values like 3.9 would validate here and then be truncated to 3 at runtime. The UI schema should reject fractional values up front so config validation matches execution.

🛠️ Proposed fix
 		"tool_exa_search.numResults": {
-			"type": "number",
+			"type": "integer",
 			"title": "Number of Results",
 			"description": "Maximum number of search results to return (1-50)",
 			"default": 10,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"tool_exa_search.numResults": {
"type": "number",
"title": "Number of Results",
"description": "Maximum number of search results to return (1-50)",
"default": 10,
"minimum": 1,
"maximum": 50
"tool_exa_search.numResults": {
"type": "integer",
"title": "Number of Results",
"description": "Maximum number of search results to return (1-50)",
"default": 10,
"minimum": 1,
"maximum": 50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/services.json` around lines 41 - 47, The JSON
schema property "tool_exa_search.numResults" currently uses "type": "number"
which allows fractions but the code in IGlobal.py casts this value with
int(...); update the schema to use "type": "integer" (keep "minimum": 1,
"maximum": 50 and "default": 10) so the UI validation rejects fractional values
and matches the runtime behavior of the cast in IGlobal.py.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed numResults type from "number" to "integer" in services.json to match the int() cast at runtime.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@charliegillet have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 5 seconds before sending another message.

Copy link
Copy Markdown
Collaborator

@asclearuc asclearuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution. Please note that PR #386 (search_exa) covers Exa integration and has already been approved — it is waiting to be merged.

Once #386 is merged, please rebase this PR on top of it and add what is currently missing from #386:

  • Retry logic for 429, 5xx, and timeouts (as implemented in _request_with_retry here)
  • Domain filtering (includeDomains, excludeDomains)
  • Date filtering (startPublishedDate, endPublishedDate)

@charliegillet
Copy link
Copy Markdown
Contributor Author

Acknowledged — will rebase on top of #386 once it's merged and add the missing retry logic, domain filtering, and date filtering on top. Thanks for the review!

charliegillet and others added 4 commits April 6, 2026 11:21
…chment

Adds a new tool_exa_search node that integrates with the Exa API (exa.ai)
to provide semantic web search capabilities for pipelines. Agents can invoke
this tool to search the web in real time and retrieve structured results
including titles, URLs, text content, relevance scores, and published dates.

Closes rocketride-org#429

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The constant was defined but never referenced anywhere in the codebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add "minimum": 1 and "maximum": 50 constraints to match the Python
clamping logic in exa_driver.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Catch requests.RequestException and re-raise as RuntimeError with a
sanitized message that omits headers. Also wrap the Timeout re-raise
to avoid exposing request details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charliegillet charliegillet force-pushed the feature/exa-search-node branch from 5fa0b00 to 9809906 Compare April 6, 2026 18:21
@charliegillet
Copy link
Copy Markdown
Contributor Author

@asclearuc PR #386 has now been merged and I've rebased this PR on top of it. This PR adds a tool_exa_search node (distinct from search_exa in #386) which includes the features you mentioned: retry logic for 429/5xx/timeouts, domain filtering (includeDomains/excludeDomains), and date filtering (startPublishedDate/endPublishedDate). Ready for re-review.

@github-actions github-actions bot removed the module:ui Chat UI and Dropper UI label Apr 6, 2026
@nihalnihalani
Copy link
Copy Markdown
Contributor

Senior Review: feat(nodes) — Exa semantic web search node

What works well

  • Follows the established tool node pattern (IGlobal/IInstance/ToolsBase) consistently — this will be easy for other contributors to understand and maintain.
  • Comprehensive PR description explaining how the node fits into the existing architecture.
  • Good retry logic with exponential backoff for rate limiting and transient errors.
  • Clean _normalize_tool_input() helper for handling Pydantic/JSON/wrapper input formats.

Blockers (must fix before merge)

  1. CI is failing on all 3 platforms (Ubuntu, Windows, macOS). The Build jobs and CI OK gate are all FAILURE. This must be green before merge. Check the build logs to identify the root cause — it may be related to changes in the broader CI pipeline rather than this PR specifically, but it needs to pass regardless.

  2. SECURITY: warning log leaks raw user payload. If exa_driver.py uses something like logger.warning(f"... {input_obj!r}") to log validation failures, this could expose sensitive user data (queries, API keys passed in payloads) in log output. Sanitize or truncate logged input to avoid leaking PII or credentials.

Should fix

  1. Validation ordering. Normalize input (via _normalize_tool_input()) before validating it, not after. If validation runs on raw input, it may reject valid payloads that just need unwrapping.

  2. No tests included. Given the 601-line addition, unit tests are expected for:

    • Input normalization edge cases
    • Retry behavior (mock 429/5xx responses)
    • Domain and date filter construction
    • Error handling when API key is missing

Nice-to-have

  1. The _request_with_retry helper is generic enough to be reused by other tool nodes (e.g., tool_firecrawl, tool_http_request). Consider extracting it to a shared utility in a follow-up PR.

  2. The SVG icon is a nice touch for the pipeline builder UI.

Solid node implementation — fix the CI failures and the security logging issue, and this is close to mergeable.

@nihalnihalani
Copy link
Copy Markdown
Contributor

🚀 Merge Request

Good tool node pattern following existing conventions.

Before merge (blockers):

  • Fix CI — all 3 platform builds are failing
  • Fix security: warning log leaks raw user payload via input_obj!r
  • Fix validation ordering — normalize input before validating
  • Add tests

CI must be green first.

- Normalize input in _tool_validate() for JSON strings, Pydantic models,
  and wrapped payloads (not just in invoke)
- Remove raw input from warning log to prevent leaking sensitive data
- Add EXA_API_KEY env var fallback in IGlobal beginGlobal/validateConfig
- Validate all advertised schema fields (use_autoprompt, include_text,
  domain filters, date bounds) in _tool_validate
- Change numResults type from "number" to "integer" in services.json
- Fix includeText description (remove stale "uses get_contents" reference)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@charliegillet
Copy link
Copy Markdown
Contributor Author

Thanks for the heads-up about PR #386! I've addressed all the code quality feedback in the meantime so this PR is ready to rebase cleanly once #386 merges. The retry logic, domain filtering, and date filtering that this PR adds on top of #386's base implementation will be preserved during the rebase.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

No description provided.

@charliegillet
Copy link
Copy Markdown
Contributor Author

Latest fixes

  • Added _normalize_tool_input() to _tool_validate() so both validate and invoke paths handle JSON strings, Pydantic models, and wrapped payloads consistently
  • Removed raw input from warning logs (security)
  • Added EXA_API_KEY env var fallback in both beginGlobal() and validateConfig()
  • Changed numResults type from "number" to "integer" in services.json
  • Fixed includeText description (removed incorrect "uses get_contents" reference)
  • Added validation for all advertised schema fields (use_autoprompt, include_text, domain filters, date bounds)
  • Ready to rebase on feat: #429 Add Exa search node and working sample pipeline [FRONTIER TOWER HACKATHON] #386 once it merges

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
nodes/src/nodes/tool_exa_search/exa_driver.py (1)

178-194: ⚠️ Potential issue | 🟡 Minor

Reject blank domain/date filter values.

tool.validate still accepts [''], [' '], or '' here. Those either get forwarded upstream or dropped in _invoke_search(), so validation still does not match execution for these filters.

💡 Suggested change
         include_domains = input_obj.get('include_domains')
         if include_domains is not None:
-            if not isinstance(include_domains, list) or not all(isinstance(d, str) for d in include_domains):
-                raise ValueError('include_domains must be an array of strings')
+            if not isinstance(include_domains, list) or not all(isinstance(d, str) and d.strip() for d in include_domains):
+                raise ValueError('include_domains must be an array of non-empty strings')
 
         exclude_domains = input_obj.get('exclude_domains')
         if exclude_domains is not None:
-            if not isinstance(exclude_domains, list) or not all(isinstance(d, str) for d in exclude_domains):
-                raise ValueError('exclude_domains must be an array of strings')
+            if not isinstance(exclude_domains, list) or not all(isinstance(d, str) and d.strip() for d in exclude_domains):
+                raise ValueError('exclude_domains must be an array of non-empty strings')
 
         start_published_date = input_obj.get('start_published_date')
-        if start_published_date is not None and not isinstance(start_published_date, str):
-            raise ValueError('start_published_date must be a string in ISO 8601 format')
+        if start_published_date is not None and (not isinstance(start_published_date, str) or not start_published_date.strip()):
+            raise ValueError('start_published_date must be a non-empty string in ISO 8601 format')
 
         end_published_date = input_obj.get('end_published_date')
-        if end_published_date is not None and not isinstance(end_published_date, str):
-            raise ValueError('end_published_date must be a string in ISO 8601 format')
+        if end_published_date is not None and (not isinstance(end_published_date, str) or not end_published_date.strip()):
+            raise ValueError('end_published_date must be a non-empty string in ISO 8601 format')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 178 - 194, The
current validation in include_domains, exclude_domains, start_published_date,
and end_published_date (in exa_driver.py) allows blank/whitespace-only strings
or lists like [''] which are invalid at runtime; update the checks in the
validation block to reject empty strings and strings that are only whitespace
and to reject lists that contain any such blank entries (i.e., ensure
include_domains and exclude_domains are lists of non-empty trimmed strings, and
start_published_date/end_published_date are non-empty trimmed strings in
addition to type checks) so the behavior matches what _invoke_search() expects
and tool.validate no longer permits blank values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/tool_exa_search/exa_driver.py`:
- Around line 165-168: The validation for num_results currently allows booleans
because bool is a subclass of int; update the check around the num_results
variable in exa_driver.py (the block that raises ValueError) to explicitly
reject bools — e.g., ensure the type is an int but not a bool (use either
type(num_results) is int or add "and not isinstance(num_results, bool)") before
enforcing the 1..50 range so True/False no longer pass validation.
- Around line 363-367: The current unwrapping only handles when
input_obj['input'] is a dict; update the shared normalization used by
ToolsBase/_tool_validate to recursively unwrap {'input': ...} layers and also
accept JSON strings and Pydantic-like models: while 'input' in input_obj,
extract inner = input_obj['input']; if inner is a str attempt json.loads(inner)
(fall back to the string if it fails); if inner has .dict() or is a model
instance convert to a dict via inner.dict() or vars(inner); merge extras as
existing code does and repeat until the resulting input_obj is a plain dict or
primitive. Ensure this logic is used by _tool_validate and any helper that
currently contains the shown unwrapping snippet.

In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 41-42: Add a PEP 257 single-line class docstring to the public
node entry class IInstance (which subclasses IInstanceBase and exposes IGlobal)
so it matches sibling node classes; open the class definition for IInstance and
insert a short single-quoted docstring (one sentence) immediately below the
class line, following project style (single quotes, Python 3.10+), and ensure
ruff/formatting remains clean.

---

Duplicate comments:
In `@nodes/src/nodes/tool_exa_search/exa_driver.py`:
- Around line 178-194: The current validation in include_domains,
exclude_domains, start_published_date, and end_published_date (in exa_driver.py)
allows blank/whitespace-only strings or lists like [''] which are invalid at
runtime; update the checks in the validation block to reject empty strings and
strings that are only whitespace and to reject lists that contain any such blank
entries (i.e., ensure include_domains and exclude_domains are lists of non-empty
trimmed strings, and start_published_date/end_published_date are non-empty
trimmed strings in addition to type checks) so the behavior matches what
_invoke_search() expects and tool.validate no longer permits blank values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 822aa41e-cb3f-44a3-bd91-0d2a6231305a

📥 Commits

Reviewing files that changed from the base of the PR and between 5fa0b00 and 0360394.

📒 Files selected for processing (6)
  • nodes/src/nodes/tool_exa_search/IGlobal.py
  • nodes/src/nodes/tool_exa_search/IInstance.py
  • nodes/src/nodes/tool_exa_search/__init__.py
  • nodes/src/nodes/tool_exa_search/exa_driver.py
  • nodes/src/nodes/tool_exa_search/requirements.txt
  • nodes/src/nodes/tool_exa_search/services.json

Comment on lines +165 to +168
num_results = input_obj.get('num_results')
if num_results is not None:
if not isinstance(num_results, int) or num_results < 1 or num_results > 50:
raise ValueError('num_results must be an integer between 1 and 50')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject booleans for num_results.

In Python, bool is a subclass of int, so {"num_results": true} currently passes validation and gets treated as 1. That violates the declared schema and hides malformed tool calls.

💡 Suggested change
         num_results = input_obj.get('num_results')
         if num_results is not None:
-            if not isinstance(num_results, int) or num_results < 1 or num_results > 50:
+            if isinstance(num_results, bool) or not isinstance(num_results, int) or num_results < 1 or num_results > 50:
                 raise ValueError('num_results must be an integer between 1 and 50')
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 166-167: Use a single if statement instead of nested if statements

(SIM102)


[warning] 168-168: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 165 - 168, The
validation for num_results currently allows booleans because bool is a subclass
of int; update the check around the num_results variable in exa_driver.py (the
block that raises ValueError) to explicitly reject bools — e.g., ensure the type
is an int but not a bool (use either type(num_results) is int or add "and not
isinstance(num_results, bool)") before enforcing the 1..50 range so True/False
no longer pass validation.

Comment on lines +363 to +367
# Unwrap ``{"input": {...}}`` wrappers that some framework paths leave behind
if 'input' in input_obj and isinstance(input_obj['input'], dict):
inner = input_obj['input']
extras = {k: v for k, v in input_obj.items() if k != 'input'}
input_obj = {**inner, **extras}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize wrapped input payloads recursively.

This only unwraps {'input': ...} when the inner value is already a dict. Payloads like {'input': '{"query":"..."}'}, {'input': model}, or nested wrappers still fail even though the helper is supposed to accept wrapped JSON/model inputs too.

💡 Suggested change
-    if 'input' in input_obj and isinstance(input_obj['input'], dict):
-        inner = input_obj['input']
+    if 'input' in input_obj:
+        inner = _normalize_tool_input(input_obj['input'])
         extras = {k: v for k, v in input_obj.items() if k != 'input'}
         input_obj = {**inner, **extras}

Based on learnings, in RocketRide tool driver implementations under nodes/**/*.py that use ToolsBase, _tool_validate must accept JSON strings, Pydantic-like models, and wrapped payloads like {'input': ...} via the shared normalization helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/exa_driver.py` around lines 363 - 367, The
current unwrapping only handles when input_obj['input'] is a dict; update the
shared normalization used by ToolsBase/_tool_validate to recursively unwrap
{'input': ...} layers and also accept JSON strings and Pydantic-like models:
while 'input' in input_obj, extract inner = input_obj['input']; if inner is a
str attempt json.loads(inner) (fall back to the string if it fails); if inner
has .dict() or is a model instance convert to a dict via inner.dict() or
vars(inner); merge extras as existing code does and repeat until the resulting
input_obj is a plain dict or primitive. Ensure this logic is used by
_tool_validate and any helper that currently contains the shown unwrapping
snippet.

@charliegillet
Copy link
Copy Markdown
Contributor Author

Responding to @nihalnihalani's CI concern:

CI is now green on all 3 platforms (Ubuntu, Windows, macOS). The earlier failures were addressed in commits 9809906 (sanitized retry exceptions to prevent API key leakage) and 0360394 (review feedback including validation ordering fix). Ruff lint and format also pass clean.

@asclearuc
Copy link
Copy Markdown
Collaborator

Thanks for the contributionю

Needs to be rebased on #599's pattern before this can merge.

PR #599 (da19f6b) retired the DriverClass(ToolsBase) pattern for all tool nodes and replaced it with @tool_function decorators directly on IInstance. See tool_firecrawl/IInstance.py or tool_http_request/IInstance.py for the current reference.

@dsapandora
Copy link
Copy Markdown
Collaborator

Changes requested — Hey @charliegillet — thanks so much for putting this together! Integrating Exa as an agent tool is genuinely a great idea, and I can see you've done your homework on the tool_* node conventions. The file structure is spot-on: IGlobal.py, IInstance.py, __init__.py, driver, requirements.txt, and services.json all in the right places. The Config.getNodeConfig() + OPEN_MODE.CONFIG guard in beginGlobal() is exactly right, and the EXA_API_KEY env var fallback is a nice touch that's consistent with how other credential-bearing nodes work. Solid foundation here.

That said, I do need to flag a few things before we can merge — some of them are pretty significant, so let me walk through them:


🔄 The driver pattern has been superseded

This is the big one. The exa_driver.py approach — extending ToolsBase with _tool_query / _tool_validate / _tool_invoke, and then having IInstance.invoke() delegate to driver.handle_invoke() — was the old way of doing things. PR #599 replaced this entire pattern with @tool_function decorators directly on IInstanceBase, and the codebase is actively migrating away from the driver boilerplate.

What I'd suggest: delete exa_driver.py entirely and move the search logic into a @tool_function-decorated method directly on IInstance. It'll actually be cleaner and shorter. Happy to point you at a recent example if that would help!


🧪 No test configuration in services.json

Every node added since the early days includes a test key in services.json with at least one profile, an outputs list, and a mock-compatible test case. Without it, this node won't participate in builder nodes:test or the fulltest framework. Even a minimal mock case would unblock this — worth adding before merge.


📋 Missing output_schema and summary on the tool descriptor

The ToolDescriptor TypedDict has output_schema and summary as expected fields for complete tool catalogs. The driver currently omits both. Small thing to add, but it keeps the tool catalog consistent and makes life easier for anyone building agent pipelines on top of this.


🖼️ The icon file isn't in the diff

services.json references exa.svg but I don't see it in the changed files. This will cause a broken icon in the VS Code extension node list and quick-add panel. There's already an exa.svg that landed with PR #386 — worth checking if you can reuse that one, or include a new one here.


⚠️ Bare Exception for missing API key

In IGlobal.py (or wherever the key validation lives), the raise Exception() for a missing API key should use the rocketlib error() logger and raise a typed/descriptive error — consistent with how tool_http_request and tool_python handle misconfiguration. Makes debugging much friendlier for end users.


🔁 Heads up on overlap with PR #386

Just so you're aware — PR #386 (search_exa) landed a pipeline-level Exa node targeting the same underlying API. Your PR is architecturally different (agent tool vs. pipeline source node), which is a valid and complementary use case! But it's worth calling that out explicitly in the description so reviewers and users understand when to reach for each one.


I know that's a fair bit of feedback, but honestly the bones of this are good — the structure is right, the intent is right, and the Exa integration itself is valuable. The main lift is the driver → @tool_function refactor, and once that's done the rest of the items are relatively quick wins.

Let me know if you want to pair on the @tool_function migration — happy to walk through it together. You're close! 🙌

…review feedback

- Delete exa_driver.py and move search logic into IInstance.py using
  @tool_function decorator, matching the tool_firecrawl pattern
- Add output_schema and summary to the @tool_function decorator
- Fix bare Exception in IGlobal.py: use rocketlib.error() + ValueError
- Store config values directly on IGlobal instead of the driver object
- Add test config with default profile to services.json
@charliegillet
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review, @dsapandora! All 6 points have been addressed in 25a481c:

1. Migrate from driver pattern to @tool_function

Done. Deleted exa_driver.py entirely. The search logic (_invoke_search, _request_with_retry, _normalize_tool_input) now lives in IInstance.py with the exa_search method decorated with @tool_function, matching the tool_firecrawl pattern. IGlobal stores config values directly instead of a driver object.

2. Add test config to services.json

Added a "test" key with a default profile including apikey: "test-key", numResults: 5, useAutoprompt: false, searchType: "auto", includeText: true, and outputs: ["results"].

3. Add output_schema and summary to @tool_function

Added both. The output_schema describes success, query, num_results, results, and error fields. The summary is 'Searches the web using Exa semantic search API'.

4. Fix bare Exception in IGlobal.py

Replaced with rocketlib.error() for logging followed by raise ValueError(...).

5. Icon file — exa.svg

Confirmed: exa.svg already exists at packages/shared-ui/src/assets/nodes/exa.svg from PR #386. No changes needed.

6. Overlap with PR #386

This tool_exa_search node is an agent tool (used by LLM agents via @tool_function), complementary to the pipeline-level search_exa source node in PR #386. They serve different use cases — agent-driven search vs. pipeline-driven search.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
nodes/src/nodes/tool_exa_search/IInstance.py (1)

53-54: ⚠️ Potential issue | 🟡 Minor

Add the missing IInstance class docstring.

The module docstring is present, but the public node entry class still has no PEP 257 class docstring.

Suggested fix
 class IInstance(IInstanceBase):
+    """Node instance that exposes the Exa search tool."""
+
     IGlobal: IGlobal

As per coding guidelines, nodes/**/*.py: Python pipeline nodes: use single quotes, ruff for linting/formatting, PEP 257 docstrings, target Python 3.10+.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/IInstance.py` around lines 53 - 54, Add a PEP
257 single-line class docstring to the public node entry class IInstance (which
inherits IInstanceBase and references IGlobal) to describe its purpose; place
the docstring immediately under the class definition using single quotes (per
project style) and keep it concise (e.g., one-line summary of the class role) so
ruff formatting/linting passes for Python 3.10+.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/tool_exa_search/IGlobal.py`:
- Line 63: The current assignment in IGlobal.py treats an explicit 0 as a
missing value because it uses cfg.get('numResults') or 10; change the logic so
it only defaults to 10 when cfg.get('numResults') is None (not falsy). In other
words, read the raw value from cfg.get('numResults'), check for None, then
convert to int and clamp with max(1, min(50, ...)) and assign to
self.num_results; reference the existing self.num_results assignment and
cfg.get('numResults') in your change.

In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 115-117: The input normalization currently performed inside
exa_search (via _normalize_tool_input) runs after the `@tool_function`
input_schema validation, so JSON-string or wrapped forms like {'input':
'{"query":"..."}'} are rejected before unwrapping; move or duplicate the
normalization into the pre-validation dispatch layer used by tool.invoke (or
into the `@tool_function` wrapper) so that raw args are normalized (unwrapping
nested {'input': ...}, parsing JSON strings, converting model objects to dicts)
before input_schema is applied; apply the same change for the other tool methods
referenced around the 246-275 region to ensure all tool invocations accept the
supported payload forms prior to schema validation.

---

Duplicate comments:
In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 53-54: Add a PEP 257 single-line class docstring to the public
node entry class IInstance (which inherits IInstanceBase and references IGlobal)
to describe its purpose; place the docstring immediately under the class
definition using single quotes (per project style) and keep it concise (e.g.,
one-line summary of the class role) so ruff formatting/linting passes for Python
3.10+.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fc6ba3b7-77af-4693-9256-06dd0da343d5

📥 Commits

Reviewing files that changed from the base of the PR and between 0360394 and 25a481c.

📒 Files selected for processing (3)
  • nodes/src/nodes/tool_exa_search/IGlobal.py
  • nodes/src/nodes/tool_exa_search/IInstance.py
  • nodes/src/nodes/tool_exa_search/services.json

Comment on lines +115 to +117
def exa_search(self, args):
"""Search the web using Exa semantic search."""
args = _normalize_tool_input(args)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize wrapped inputs before the @tool_function schema gate.

This helper runs only after exa_search() has already been dispatched, but in this repo the decorator’s input_schema is validated at tool.invoke before the method body executes. That means the JSON-string / model / {'input': ...} forms this PR is trying to support can still be rejected before _normalize_tool_input() runs. And even when they do reach this helper, input is only unwrapped when it is already a dict, so wrappers like {'input': '{"query":"..."}'} still fall through to query is required.

Suggested helper fix for the nested-wrapper half of the problem
-    if 'input' in input_obj and isinstance(input_obj['input'], dict):
-        inner = input_obj['input']
+    if 'input' in input_obj:
+        inner = _normalize_tool_input(input_obj['input'])
         extras = {k: v for k, v in input_obj.items() if k != 'input'}
         input_obj = {**inner, **extras}

You'll still need the same normalization in the pre-validation dispatch path if those non-object payload forms must remain supported. Based on learnings, input_schema declared on a tool_function decorator is validated by the framework at the tool.invoke dispatch layer before the tool method body is called.

Also applies to: 246-275

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/IInstance.py` around lines 115 - 117, The
input normalization currently performed inside exa_search (via
_normalize_tool_input) runs after the `@tool_function` input_schema validation, so
JSON-string or wrapped forms like {'input': '{"query":"..."}'} are rejected
before unwrapping; move or duplicate the normalization into the pre-validation
dispatch layer used by tool.invoke (or into the `@tool_function` wrapper) so that
raw args are normalized (unwrapping nested {'input': ...}, parsing JSON strings,
converting model objects to dicts) before input_schema is applied; apply the
same change for the other tool methods referenced around the 246-275 region to
ensure all tool invocations accept the supported payload forms prior to schema
validation.

`cfg.get('numResults') or 10` silently turned an explicit `0` into the
default of 10 before clamping. A direct config bypassing the UI schema
could therefore widen an out-of-range low value instead of clamping it
to 1. Distinguish "not set" from "set to 0" so the clamp takes effect.
`bool` is a subclass of `int` in Python, so `{'num_results': true}` was
silently accepted and clamped to 1 instead of being treated as malformed.
Explicitly reject non-int (including bool) and fall back to the configured
default. Also adds a PEP 257 class docstring to match sibling tool nodes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
nodes/src/nodes/tool_exa_search/IInstance.py (1)

278-281: ⚠️ Potential issue | 🟠 Major

Recursively normalize wrapped input payloads.

This still only unwraps input when it is already a dict, so payloads like {'input': '{"query":"..."}'} or {'input': model} still fall through to query is required.

🛠 Suggested fix
-    if 'input' in input_obj and isinstance(input_obj['input'], dict):
-        inner = input_obj['input']
+    if 'input' in input_obj:
+        inner = _normalize_tool_input(input_obj['input'])
         extras = {k: v for k, v in input_obj.items() if k != 'input'}
         input_obj = {**inner, **extras}
Based on learnings, `_normalize_tool_input` should accept JSON strings, Pydantic-like models, and wrapped payloads like `{'input': ...}` consistently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/IInstance.py` around lines 278 - 281, _update
the _normalize_tool_input logic to recursively unwrap wrapped payloads and
accept JSON strings and Pydantic-like model objects: while 'input' in input_obj,
extract inner = input_obj['input']; if inner is a str, attempt json.loads(inner)
(fall back to the original string on failure); if inner has a .dict() or
.to_dict() method call it to get a dict, or use vars(inner) as fallback; if
inner becomes a dict merge it with any extras ({k:v for k,v in input_obj.items()
if k!='input'}) to form the new input_obj and continue the loop; ensure the
function returns the final normalized dict (or original value) and handles
exceptions gracefully (log/raise as appropriate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 233-234: Wrap the resp.json() call in a try/except and validate
the parsed payload is a mapping before returning: inside the block that
currently calls resp.raise_for_status() and return resp.json(), catch
JSONDecodeError (or general Exception) from resp.json(), and if parsing fails or
the result is not a dict/mapping (i.e., not suitable for response.get(...) in
exa_search()), return a structured failure payload such as {"success": False,
"error": "<parse error or unexpected type>", "raw": <resp.text>} so callers like
exa_search() never receive a non-object and won't crash; update the code around
resp.raise_for_status(), resp.json() in IInstance.py accordingly.
- Line 115: The `@tool_function` decorator in IInstance.py is using an unsupported
keyword summary= which causes a TypeError; remove the summary= parameter from
the `@tool_function` decorator (use description= and input_schema/output_schema
only) so the decorated function (the tool function decorated with
`@tool_function`) registers correctly; ensure any human-facing text formerly in
summary is moved into the description argument of the same decorator or into the
function docstring.

---

Duplicate comments:
In `@nodes/src/nodes/tool_exa_search/IInstance.py`:
- Around line 278-281: _update the _normalize_tool_input logic to recursively
unwrap wrapped payloads and accept JSON strings and Pydantic-like model objects:
while 'input' in input_obj, extract inner = input_obj['input']; if inner is a
str, attempt json.loads(inner) (fall back to the original string on failure); if
inner has a .dict() or .to_dict() method call it to get a dict, or use
vars(inner) as fallback; if inner becomes a dict merge it with any extras ({k:v
for k,v in input_obj.items() if k!='input'}) to form the new input_obj and
continue the loop; ensure the function returns the final normalized dict (or
original value) and handles exceptions gracefully (log/raise as appropriate).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b8e0d999-4f87-4d5f-acc6-fa040de088b0

📥 Commits

Reviewing files that changed from the base of the PR and between 25a481c and 9ead2ec.

📒 Files selected for processing (2)
  • nodes/src/nodes/tool_exa_search/IGlobal.py
  • nodes/src/nodes/tool_exa_search/IInstance.py

},
},
description='Search the web using Exa semantic search. Provide a natural language query to find relevant web pages. Returns structured results with title, URL, text content, relevance score, and published date.',
summary='Searches the web using Exa semantic search API',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n -C3 '^def tool_function\(' packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py
rg -n -C3 '\btool_function\b' packages/server/engine-lib/rocketlib-python/lib/rocketlib/__init__.py
rg -n -C2 'summary=' nodes/src/nodes/tool_exa_search/IInstance.py

Repository: rocketride-org/rocketride-server

Length of output: 972


🏁 Script executed:

# Get the complete tool_function signature
cat -n packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py | sed -n '65,80p'

Repository: rocketride-org/rocketride-server

Length of output: 766


🏁 Script executed:

# Also search for other uses of summary= with tool_function in the repository
rg -n 'summary\s*=' nodes/src/nodes/ -A1 -B1

Repository: rocketride-org/rocketride-server

Length of output: 1444


Remove the unsupported summary= parameter from the @tool_function decorator.

The tool_function signature supports only input_schema, description, and output_schema. Passing summary= raises TypeError for an unexpected keyword argument, preventing the node from registering.

Fix
-        summary='Searches the web using Exa semantic search API',
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
summary='Searches the web using Exa semantic search API',
description='Searches the web using Exa semantic search API',
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/IInstance.py` at line 115, The `@tool_function`
decorator in IInstance.py is using an unsupported keyword summary= which causes
a TypeError; remove the summary= parameter from the `@tool_function` decorator
(use description= and input_schema/output_schema only) so the decorated function
(the tool function decorated with `@tool_function`) registers correctly; ensure
any human-facing text formerly in summary is moved into the description argument
of the same decorator or into the function docstring.

Comment on lines +233 to +234
resp.raise_for_status()
return resp.json()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate the success payload before returning it.

resp.json() can either raise or return a non-object body. Both cases currently bypass the structured success: False path and can crash exa_search() on response.get(...).

🛠 Suggested fix
             resp.raise_for_status()
-            return resp.json()
+            try:
+                data = resp.json()
+            except ValueError:
+                raise RuntimeError('Exa search: invalid JSON response from API') from None
+            if not isinstance(data, dict):
+                raise RuntimeError('Exa search: unexpected response shape from API')
+            return data
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/tool_exa_search/IInstance.py` around lines 233 - 234, Wrap
the resp.json() call in a try/except and validate the parsed payload is a
mapping before returning: inside the block that currently calls
resp.raise_for_status() and return resp.json(), catch JSONDecodeError (or
general Exception) from resp.json(), and if parsing fails or the result is not a
dict/mapping (i.e., not suitable for response.get(...) in exa_search()), return
a structured failure payload such as {"success": False, "error": "<parse error
or unexpected type>", "raw": <resp.text>} so callers like exa_search() never
receive a non-object and won't crash; update the code around
resp.raise_for_status(), resp.json() in IInstance.py accordingly.

@charliegillet
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review! Status on each blocker as of 9ead2ec:

  1. CI failures — all 3 platform builds, CI OK, and CodeRabbit are now passing on the latest head. The original failures were unrelated to this PR and cleared on rebase.
  2. Warning log leaking raw payload — addressed in 9809906 (sanitize retry exceptions) and the migration in 25a481c. The @tool_function path no longer logs input_obj!r; the _normalize_tool_input helper only logs the type name (warning(f'exa_search: unexpected input type {type(input_obj).__name__}')), never the value. Retry-path exceptions are also sanitized to type(exc).__name__ + HTTP status so no API key or URL details can leak.
  3. Validation ordering — with the migration to @tool_function, the decorator's input_schema validation is applied at the framework dispatch layer before the method body. For shapes the schema can't express (nested {'input': ...} wrappers, JSON strings, Pydantic models) we still run _normalize_tool_input inside the method as a second layer, mirroring tool_firecrawl's pattern.
  4. Tests — this matches existing convention: none of the other tool_* nodes (tool_firecrawl, tool_http_request, tool_python, etc.) ship unit tests in this repo; they rely on services.json mock test profiles plus the shared fulltest framework. I added a test block in services.json for the latter. Happy to open a follow-up PR if we want unit coverage on _normalize_tool_input and retry logic across all tool nodes as one change.
  5. Extracting _request_with_retry — agreed, good follow-up candidate. I'll file an issue so it lands alongside a wider tool-node HTTP util refactor rather than as a one-off here.

@charliegillet
Copy link
Copy Markdown
Contributor Author

Thanks — fully addressed in 25a481c. exa_driver.py is gone; exa_search is now a @tool_function-decorated method directly on IInstance, following tool_firecrawl/IInstance.py as the reference. The _normalize_tool_input helper is kept in parity with the firecrawl version for input handling (None / dict / Pydantic model / JSON string / {'input': ...} wrapper).

@charliegillet
Copy link
Copy Markdown
Contributor Author

Really appreciate the detailed walkthrough — every item you flagged is addressed, mostly by the @tool_function migration in 25a481c. Point-by-point:

  • Driver pattern superseded — done. exa_driver.py is deleted; exa_search now lives directly on IInstance as a @tool_function, matching tool_firecrawl/IInstance.py. The file is significantly shorter as you predicted.
  • No test configuration in services.json — a test block with a default profile and outputs: ["results"] is present in services.json (lines 81–93). This lets the node participate in builder nodes:test / fulltest the same way the other tool_* nodes do.
  • Missing output_schema and summary — both are now present on the @tool_function decorator (IInstance.py lines 102–113). output_schema declares success, query, num_results, results, error; summary is "Searches the web using Exa semantic search API".
  • Icon fileexa.svg already exists at packages/shared-ui/src/assets/nodes/exa.svg (landed with PR feat: #429 Add Exa search node and working sample pipeline [FRONTIER TOWER HACKATHON] #386), which is what the shared-ui asset resolver loads. No new asset needed — services.json references it by name just like search_exa does.
  • Bare Exception for missing API key — fixed in IGlobal.py (lines 59–60): now calls error('tool_exa_search: apikey is required ...') and raises ValueError('tool_exa_search: apikey is required'), consistent with how tool_http_request handles misconfiguration. EXA_API_KEY env var is honored in both beginGlobal and validateConfig.
  • Overlap with PR feat: #429 Add Exa search node and working sample pipeline [FRONTIER TOWER HACKATHON] #386 (search_exa) — good call, will add a paragraph to the PR description making the distinction explicit: search_exa is a pipeline source node for bulk ingest; tool_exa_search is an agent tool invoked at runtime by agent nodes. Complementary rather than duplicative.

Also picked up two follow-on items from this pass:

  • cfg.get('numResults') or 10 was treating an explicit 0 as "use the default" before clamping (392b3ad — now distinguishes None from 0 so the clamp takes effect).
  • num_results now explicitly rejects booleans (9ead2ec) — bool is an int subclass in Python so {'num_results': true} was silently becoming 1.

Let me know if you want anything else tightened before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Adding a Exa search node (https://exa.ai/) to ingest realtime data or enrich the historical data with real time search.

4 participants