feat(search): add semantic search for AI-powered tool discovery #321
feat(search): add semantic search for AI-powered tool discovery #321Shashikant86 wants to merge 6 commits intomainfrom
Conversation
commit: |
There was a problem hiding this comment.
Pull request overview
This PR adds semantic search capabilities to the Node SDK for AI-powered tool discovery, mirroring behavior from the Python SDK by introducing a semantic search client, new StackOneToolSet discovery methods, and a semantic-powered tool_search utility tool with local BM25+TF‑IDF fallback behavior.
Changes:
- Added
SemanticSearchClient(+ error/type helpers) to call/actions/search, including action-name normalization. - Added
StackOneToolSet.searchTools()andStackOneToolSet.searchActionNames()to discover tools/actions via semantic search (with connector/account scoping and deduplication). - Extended
Tools.utilityTools()to optionally provide a semantic-backedtool_searchimplementation, plus added examples/mocks/tests for end-to-end coverage.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/toolsets.ts | Adds semantic search discovery APIs (searchTools, searchActionNames) and semantic client wiring. |
| src/tool.ts | Adds Tools.getConnectors() and semantic-backed tool_search option via utilityTools(). |
| src/semantic-search.ts | Introduces the semantic search HTTP client, response parsing, and action name normalization. |
| src/semantic-search.test.ts | Adds unit/integration tests covering the semantic client + toolset integration. |
| src/index.ts | Exposes new semantic-search exports and option types from the package entrypoint. |
| mocks/mcp-server.ts | Adds Calendly-themed mock tools for semantic-search examples/tests. |
| mocks/handlers.mcp.ts | Registers the new Calendly mock account/toolset in MCP handlers. |
| examples/semantic-search.ts | Adds a runnable semantic search example showcasing multiple usage patterns. |
| examples/semantic-search.test.ts | Adds an E2E test validating the semantic-search example flows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // allTools may not be defined if fetchTools failed before semantic search | ||
| if (!allTools!) { | ||
| throw error; | ||
| } |
There was a problem hiding this comment.
if (!allTools!) is a confusing mix of boolean negation and non-null assertion. Since allTools is intentionally possibly-unset when fetchTools() throws, prefer declaring it as let allTools: Tools | undefined (or initializing to undefined) and checking if (!allTools) { ... } for clarity.
There was a problem hiding this comment.
3 issues found across 9 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/semantic-search.ts">
<violation number="1" location="src/semantic-search.ts:113">
P1: Rule violated: **Flag Security Vulnerabilities**
Enforce HTTPS for the semantic search API baseUrl. The current code accepts an arbitrary baseUrl and uses it directly in fetch, which permits insecure HTTP requests; this violates the security rule requiring TLS for all network calls.</violation>
<violation number="2" location="src/semantic-search.ts:162">
P2: Clear the timeout even when `fetch` throws; currently the timer is only cleared on the success path, so failed requests leave a pending timeout.</violation>
</file>
<file name="src/tool.ts">
<violation number="1" location="src/tool.ts:951">
P2: `semanticToolSearch` drops the `parameters` field from `tool_search` results, unlike the existing local `tool_search` and `ToolSearchResult` contract. This breaks feature parity and removes the schema callers need to invoke `tool_execute`. Consider rehydrating parameters from the tool list (or returning them from the semantic API) so the result shape matches the current `tool_search` output.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
glebedel
left a comment
There was a problem hiding this comment.
Overall code looks overly complex for what we're trying to implement here. Worth taking another look to see if you can find ways to make it simpler and per my comment remove or extract logic into different methods
| * Regex to match versioned API action names and extract the MCP-format name. | ||
| * Example: "calendly_1.0.0_calendly_create_scheduling_link_global" -> "calendly_create_scheduling_link" | ||
| */ | ||
| const VERSIONED_ACTION_RE = /^[a-z][a-z0-9]*_\d+(?:\.\d+)+_(.+)_global$/; |
There was a problem hiding this comment.
where's this _global coming from ?
| function semanticToolSearch(semanticClient: SemanticSearchClient): BaseTool { | ||
| const name = 'tool_search' as const; | ||
| const description = | ||
| 'Searches for relevant tools based on a natural language query using semantic vector search. Call this first to discover available tools before executing them.' as const; |
There was a problem hiding this comment.
Do you think there's opportunity to improve this prompt so it isn't called if it's unlikely a stackone tools need to be called (eg. because users agents will have a mix of bash tools, and other MCPs and some prompts may not require or be related to StackOne's tools)
This is more of a research task on whether we can optimize that description (in a subsequent PR) so we can avoid unnecessary round trip to our semantic search back end for prompts that don't benefit from it.
| // Step 2: Query semantic search API | ||
| // topK is intentionally omitted here (matching Python SDK) to let the backend | ||
| // return its default set; client-side filtering + per-connector fallback handle sizing. | ||
| const response = await this.semanticClient.search(query, { connector }); |
There was a problem hiding this comment.
in most cases we should never have a connector being given here so we should have a check of whether it's set or not.
In fact I think we should remove this connector paramater entirely, this makes the whole logic more complex than it needs to be and probably adds very limited to no value.
What we can do instead is have a different method to search by connector which this method can call for each account ID connector
| availableConnectors.has(r.connectorKey.toLowerCase()) && r.similarityScore >= minScore, | ||
| ); | ||
|
|
||
| // Step 3b: If not enough results, make per-connector calls for missing connectors |
There was a problem hiding this comment.
This description seems off , it shouldn't be at all about having enough results, if account IDs are given we should always make a request for the unique deduped set of connectors associated with these account IDs)
| ); | ||
|
|
||
| // Step 3b: If not enough results, make per-connector calls for missing connectors | ||
| if (!connector && (topK == null || filteredResults.length < topK)) { |
There was a problem hiding this comment.
topK should be given to each of the connector request, we dont need to keep track of the topK across connector search requests
| for (const r of extra.results) { | ||
| if ( | ||
| r.similarityScore >= minScore && | ||
| !filteredResults.some((fr) => fr.actionName === r.actionName) |
There was a problem hiding this comment.
sounds like we should have a set for filtered result and not an array to avoid these uncessary looping
| const actionNames = new Set(finalResults.map((r) => normalizeActionName(r.actionName))); | ||
| const matchedTools = allTools.toArray().filter((t) => actionNames.has(t.name)); | ||
|
|
||
| // Sort matched tools by semantic search score order |
There was a problem hiding this comment.
do we need to sort it? We've already filtered out score that were lower than minScore anyways
| const utility = await allTools.utilityTools(); | ||
| const searchTool = utility.getTool('tool_search'); | ||
|
|
||
| if (searchTool) { |
There was a problem hiding this comment.
looks like there's a lot of logic here, shouldn't be a simple call to our existing bm25/lexial search strategy?
|
@glebedel Yes, this code is outdated and need to create another one PR once Python one is agreed and accepted StackOneHQ/stackone-ai-python#149 I should have closed this PR earlier |
|
I will have to open the new PR based on the feedback and sync with Python SDK and from other account. |
Summary
SemanticSearchClient,searchTools(),searchActionNames(), and semantic utility tools for AI-powered tool discovery via the/actions/searchAPIProblem
StackOne manages 10,000+ actions across connectors with some having 2,000+ actions. Keyword matching fails when users search "onboard new hire" but the action is named
hris_create_employee. The SDK needssemantic search capabilities alongside existing keyword search.
Implementation
SemanticSearchClient: HTTP client for/actions/searchAPI using nativefetch, Basic auth, andAbortControllertimeoutsStackOneToolSetmethods:searchTools()andsearchActionNames()with connector filtering, per-connector fallback queries, deduplication across API versions, and automatic fallback to localBM25+TF-IDF
utilityTools({ semanticClient })upgradestool_searchfrom local keyword matching to cloud-based semantic vectors for agent-loop patternscalendly_1.0.0_calendly_list_events_global) automatically normalized to MCP format (calendly_list_events)utilityTools()accepts both legacynumberarg and new options objectfetch, existing Orama + TF-IDF for fallbackTest plan
pnpm vitest src/semantic-search.test.ts— 36 semantic search tests passpnpm vitest examples/semantic-search.test.ts— 6 example tests passpnpm test— all 520 tests pass (no regressions)pnpm build— no TypeScript errorsSummary by cubic
Adds semantic search to the Node SDK for natural language tool discovery, with automatic fallback to local BM25+TF‑IDF when the semantic API is unavailable. Adds runnable examples and tests showing discovery-to-execution with the Vercel AI SDK.
New Features
Migration
Written for commit 4a25ccc. Summary will update on new commits.