Skip to content

feat(search): add semantic search for AI-powered tool discovery #321

Closed
Shashikant86 wants to merge 6 commits intomainfrom
semantic_search
Closed

feat(search): add semantic search for AI-powered tool discovery #321
Shashikant86 wants to merge 6 commits intomainfrom
semantic_search

Conversation

@Shashikant86
Copy link
Copy Markdown

@Shashikant86 Shashikant86 commented Feb 12, 2026

Summary

  • Port semantic search from the Python SDK to the Node SDK with full feature parity
  • Add SemanticSearchClient, searchTools(), searchActionNames(), and semantic utility tools for AI-powered tool discovery via the /actions/search API
  • Automatic fallback to local BM25+TF-IDF when the semantic API is unavailable

Problem

StackOne manages 10,000+ actions across connectors with some having 2,000+ actions. Keyword matching fails when users search "onboard new hire" but the action is named hris_create_employee. The SDK needs
semantic search capabilities alongside existing keyword search.

Implementation

  • SemanticSearchClient: HTTP client for /actions/search API using native fetch, Basic auth, and AbortController timeouts
  • StackOneToolSet methods: searchTools() and searchActionNames() with connector filtering, per-connector fallback queries, deduplication across API versions, and automatic fallback to local
    BM25+TF-IDF
  • Semantic utility tools: utilityTools({ semanticClient }) upgrades tool_search from local keyword matching to cloud-based semantic vectors for agent-loop patterns
  • Connector awareness: Results filtered to only connectors available in the user's linked accounts
  • Action name normalization: Versioned API names (e.g., calendly_1.0.0_calendly_list_events_global) automatically normalized to MCP format (calendly_list_events)
  • Backward compatible: utilityTools() accepts both legacy number arg and new options object
  • No new dependencies: Uses native fetch, existing Orama + TF-IDF for fallback

Test plan

  • pnpm vitest src/semantic-search.test.ts — 36 semantic search tests pass
  • pnpm vitest examples/semantic-search.test.ts — 6 example tests pass
  • pnpm test — all 520 tests pass (no regressions)
  • pnpm build — no TypeScript errors

Summary by cubic

Adds semantic search to the Node SDK for natural language tool discovery, with automatic fallback to local BM25+TF‑IDF when the semantic API is unavailable. Adds runnable examples and tests showing discovery-to-execution with the Vercel AI SDK.

  • New Features

    • SemanticSearchClient for the /actions/search API (native fetch, Basic auth, timeouts) and SemanticSearchError.
    • StackOneToolSet.searchTools(query, { topK, connector, minScore, accountIds, fallbackToLocal }) returns matched Tools from linked accounts; respects backend result size unless topK is set; fills gaps with per-connector queries; results deduped across API versions.
    • StackOneToolSet.searchActionNames(query, { topK, connector, minScore, accountIds }) returns normalized action names with scores, connector, and description without fetching tools; respects backend result size unless topK is set.
    • Tools.utilityTools({ semanticClient }) adds a semantic tool_search with optional connector filter; legacy number arg for hybrid search still supported.
    • Tools.getConnectors() helper; normalizeActionName and semantic types/options exported from src/index.ts.
    • New examples: semantic-search.ts (runnable) and AI SDK integration updated to use searchTools(). E2E tests added for both flows.
  • Migration

    • Use searchTools() for NL queries; optionally pass accountIds to scope to linked connectors.
    • Use searchActionNames() for a fast preview before fetching tools.
    • topK is optional; omit to use backend defaults.
    • For agent loops, pass semanticClient to utilityTools() to enable semantic tool_search (searches full catalog; use connector to scope).
    • Set STACKONE_API_KEY (or apiKey in config); fallbackToLocal defaults to true. No breaking changes.

Written for commit 4a25ccc. Summary will update on new commits.

Copilot AI review requested due to automatic review settings February 12, 2026 12:27
@Shashikant86 Shashikant86 requested a review from a team as a code owner February 12, 2026 12:27
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Feb 12, 2026

Open in StackBlitz

npm i https://pkg.pr.new/StackOneHQ/stackone-ai-node/@stackone/ai@321

commit: 4a25ccc

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds semantic search capabilities to the Node SDK for AI-powered tool discovery, mirroring behavior from the Python SDK by introducing a semantic search client, new StackOneToolSet discovery methods, and a semantic-powered tool_search utility tool with local BM25+TF‑IDF fallback behavior.

Changes:

  • Added SemanticSearchClient (+ error/type helpers) to call /actions/search, including action-name normalization.
  • Added StackOneToolSet.searchTools() and StackOneToolSet.searchActionNames() to discover tools/actions via semantic search (with connector/account scoping and deduplication).
  • Extended Tools.utilityTools() to optionally provide a semantic-backed tool_search implementation, plus added examples/mocks/tests for end-to-end coverage.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/toolsets.ts Adds semantic search discovery APIs (searchTools, searchActionNames) and semantic client wiring.
src/tool.ts Adds Tools.getConnectors() and semantic-backed tool_search option via utilityTools().
src/semantic-search.ts Introduces the semantic search HTTP client, response parsing, and action name normalization.
src/semantic-search.test.ts Adds unit/integration tests covering the semantic client + toolset integration.
src/index.ts Exposes new semantic-search exports and option types from the package entrypoint.
mocks/mcp-server.ts Adds Calendly-themed mock tools for semantic-search examples/tests.
mocks/handlers.mcp.ts Registers the new Calendly mock account/toolset in MCP handlers.
examples/semantic-search.ts Adds a runnable semantic search example showcasing multiple usage patterns.
examples/semantic-search.test.ts Adds an E2E test validating the semantic-search example flows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +434 to +437
// allTools may not be defined if fetchTools failed before semantic search
if (!allTools!) {
throw error;
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!allTools!) is a confusing mix of boolean negation and non-null assertion. Since allTools is intentionally possibly-unset when fetchTools() throws, prefer declaring it as let allTools: Tools | undefined (or initializing to undefined) and checking if (!allTools) { ... } for clarity.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 9 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/semantic-search.ts">

<violation number="1" location="src/semantic-search.ts:113">
P1: Rule violated: **Flag Security Vulnerabilities**

Enforce HTTPS for the semantic search API baseUrl. The current code accepts an arbitrary baseUrl and uses it directly in fetch, which permits insecure HTTP requests; this violates the security rule requiring TLS for all network calls.</violation>

<violation number="2" location="src/semantic-search.ts:162">
P2: Clear the timeout even when `fetch` throws; currently the timer is only cleared on the success path, so failed requests leave a pending timeout.</violation>
</file>

<file name="src/tool.ts">

<violation number="1" location="src/tool.ts:951">
P2: `semanticToolSearch` drops the `parameters` field from `tool_search` results, unlike the existing local `tool_search` and `ToolSearchResult` contract. This breaks feature parity and removes the schema callers need to invoke `tool_execute`. Consider rehydrating parameters from the tool list (or returning them from the semantic API) so the result shape matches the current `tool_search` output.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown
Contributor

@glebedel glebedel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code looks overly complex for what we're trying to implement here. Worth taking another look to see if you can find ways to make it simpler and per my comment remove or extract logic into different methods

* Regex to match versioned API action names and extract the MCP-format name.
* Example: "calendly_1.0.0_calendly_create_scheduling_link_global" -> "calendly_create_scheduling_link"
*/
const VERSIONED_ACTION_RE = /^[a-z][a-z0-9]*_\d+(?:\.\d+)+_(.+)_global$/;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where's this _global coming from ?

function semanticToolSearch(semanticClient: SemanticSearchClient): BaseTool {
const name = 'tool_search' as const;
const description =
'Searches for relevant tools based on a natural language query using semantic vector search. Call this first to discover available tools before executing them.' as const;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's opportunity to improve this prompt so it isn't called if it's unlikely a stackone tools need to be called (eg. because users agents will have a mix of bash tools, and other MCPs and some prompts may not require or be related to StackOne's tools)

This is more of a research task on whether we can optimize that description (in a subsequent PR) so we can avoid unnecessary round trip to our semantic search back end for prompts that don't benefit from it.

// Step 2: Query semantic search API
// topK is intentionally omitted here (matching Python SDK) to let the backend
// return its default set; client-side filtering + per-connector fallback handle sizing.
const response = await this.semanticClient.search(query, { connector });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in most cases we should never have a connector being given here so we should have a check of whether it's set or not.

In fact I think we should remove this connector paramater entirely, this makes the whole logic more complex than it needs to be and probably adds very limited to no value.

What we can do instead is have a different method to search by connector which this method can call for each account ID connector

availableConnectors.has(r.connectorKey.toLowerCase()) && r.similarityScore >= minScore,
);

// Step 3b: If not enough results, make per-connector calls for missing connectors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description seems off , it shouldn't be at all about having enough results, if account IDs are given we should always make a request for the unique deduped set of connectors associated with these account IDs)

);

// Step 3b: If not enough results, make per-connector calls for missing connectors
if (!connector && (topK == null || filteredResults.length < topK)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

topK should be given to each of the connector request, we dont need to keep track of the topK across connector search requests

for (const r of extra.results) {
if (
r.similarityScore >= minScore &&
!filteredResults.some((fr) => fr.actionName === r.actionName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like we should have a set for filtered result and not an array to avoid these uncessary looping

const actionNames = new Set(finalResults.map((r) => normalizeActionName(r.actionName)));
const matchedTools = allTools.toArray().filter((t) => actionNames.has(t.name));

// Sort matched tools by semantic search score order
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to sort it? We've already filtered out score that were lower than minScore anyways

const utility = await allTools.utilityTools();
const searchTool = utility.getTool('tool_search');

if (searchTool) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there's a lot of logic here, shouldn't be a simple call to our existing bm25/lexial search strategy?

@Shashikant86
Copy link
Copy Markdown
Author

@glebedel Yes, this code is outdated and need to create another one PR once Python one is agreed and accepted StackOneHQ/stackone-ai-python#149 I should have closed this PR earlier

@Shashikant86
Copy link
Copy Markdown
Author

I will have to open the new PR based on the feedback and sync with Python SDK and from other account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants