99
1010## Summary
1111
12- | Method | Hit@5 | MRR | Avg Latency | Hits |
13- | --------| -------| -----| -------------| ------|
14- | Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 |
15- | Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 |
16- | ** Improvement** | ** +10.6%** | ** +0.096** | | ** +10** |
12+ | Method | Hit@5 | MRR | Avg Latency | Hits |
13+ | ----------------- | ---------- | ---------- | ----------- | ------- |
14+ | Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 |
15+ | Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 |
16+ | ** Improvement** | ** +10.6%** | ** +0.096** | | ** +10** |
1717
1818## Detailed Breakdown
1919
2222Tasks where semantic search finds the correct result but local BM25 fails.
2323These demonstrate semantic search's ability to understand ** intent and synonyms** .
2424
25- | Query | Local Top Result | Semantic Top Result |
26- | -------| -----------------| -------------------|
27- | "fire someone" | workable_get_job_recruiters | factorial_terminate_employee |
28- | "ping the team" | teamtailor_delete_team | slack_send_message |
29- | "file a new bug" | github_create_or_update_file | jira_update_issue |
30- | "ping my colleague" | salesforce_get_my_events | microsoftoutlook_reply_message |
31- | "fetch staff information" | pinpoint_get_application | workday_list_workers |
32- | "show me everyone in the company" | humaans_get_me | lattice_talent_list_users |
33- | "turn down a job seeker" | pinpoint_get_job_seeker | jobadder_reject_requisition |
34- | "check application status" | dropbox_check_remove_member | jobadder_list_application_status |
35- | "check my to-do list" | jira_check_bulk_permissions | todoist_list_tasks |
36- | "start a group chat" | microsoftteams_update_chat | discord_create_group_dm |
37- | "move candidate forward" | workable_move_candidate | greenhouse_move_application |
38- | "approve PTO" | ashby_approve_offer | planday_approve_absence_request |
39- | "update staff record" | bamboohr_update_hour_record | cezannehr_update_employee |
40- | "pull the org chart" | github_create_issue_comment | lattice_list_review_cycles |
41- | "assign training to employee" | easyllama_assign_training | hibob_create_training_record |
42- | "file a bug report" | smartrecruiters_get_report_file | github_create_issue_comment |
43- | "track customer interaction" | qlik_create_interaction | peoplefluent_track_launch |
25+ | Query | Local Top Result | Semantic Top Result |
26+ | --------------------------------- | ------------------------------- | -------------------------------- |
27+ | "fire someone" | workable_get_job_recruiters | factorial_terminate_employee |
28+ | "ping the team" | teamtailor_delete_team | slack_send_message |
29+ | "file a new bug" | github_create_or_update_file | jira_update_issue |
30+ | "ping my colleague" | salesforce_get_my_events | microsoftoutlook_reply_message |
31+ | "fetch staff information" | pinpoint_get_application | workday_list_workers |
32+ | "show me everyone in the company" | humaans_get_me | lattice_talent_list_users |
33+ | "turn down a job seeker" | pinpoint_get_job_seeker | jobadder_reject_requisition |
34+ | "check application status" | dropbox_check_remove_member | jobadder_list_application_status |
35+ | "check my to-do list" | jira_check_bulk_permissions | todoist_list_tasks |
36+ | "start a group chat" | microsoftteams_update_chat | discord_create_group_dm |
37+ | "move candidate forward" | workable_move_candidate | greenhouse_move_application |
38+ | "approve PTO" | ashby_approve_offer | planday_approve_absence_request |
39+ | "update staff record" | bamboohr_update_hour_record | cezannehr_update_employee |
40+ | "pull the org chart" | github_create_issue_comment | lattice_list_review_cycles |
41+ | "assign training to employee" | easyllama_assign_training | hibob_create_training_record |
42+ | "file a bug report" | smartrecruiters_get_report_file | github_create_issue_comment |
43+ | "track customer interaction" | qlik_create_interaction | peoplefluent_track_launch |
4444
4545### Local Wins (7 tasks)
4646
4747Tasks where BM25 keyword matching outperforms semantic search.
4848
49- | Query | Local Top Result | Semantic Top Result |
50- | -------| -----------------| -------------------|
51- | "see who applied for the role" | greenhouse_list_applied_candidate_tags | ashby_add_hiring_team_member |
52- | "advance someone to the next round" | greenhouse_move_application | factorial_invite_employee |
53- | "see open positions" | teamtailor_list_jobs | hibob_create_position_opening |
54- | "close a deal" | zohocrm_get_deal | shopify_close_order |
55- | "check course completion" | saba_delete_recurring_completion | saba_get_course |
56- | "update deal and notify team" | zohocrm_get_deal | microsoftteams_update_team |
57- | "look up customer" | linear_update_customer_need | shopify_search_customers |
49+ | Query | Local Top Result | Semantic Top Result |
50+ | ----------------------------------- | -------------------------------------- | ----------------------------- |
51+ | "see who applied for the role" | greenhouse_list_applied_candidate_tags | ashby_add_hiring_team_member |
52+ | "advance someone to the next round" | greenhouse_move_application | factorial_invite_employee |
53+ | "see open positions" | teamtailor_list_jobs | hibob_create_position_opening |
54+ | "close a deal" | zohocrm_get_deal | shopify_close_order |
55+ | "check course completion" | saba_delete_recurring_completion | saba_get_course |
56+ | "update deal and notify team" | zohocrm_get_deal | microsoftteams_update_team |
57+ | "look up customer" | linear_update_customer_need | shopify_search_customers |
5858
5959### Both Miss (15 tasks)
6060
6161Hard queries that neither method handles well. Many are abbreviations, cross-domain concepts, or have overly strict expected matches.
6262
63- | Query | Category | Why Hard |
64- | -------| ----------| ---------- |
65- | "onboard a new team member" | hr | "team member" maps to team tools, not HR |
66- | "OOO" | hr | Abbreviation - neither understands |
67- | "DM someone" | messaging | Both find discord_create_dm but expected pattern too strict |
68- | "customer onboarding" | crm | Cross-domain concept |
69- | "close quarter books" | crm | Domain-specific financial term |
70- | "PTO request" | hr | Both find PTO tools but expected pattern mismatch |
71- | "kill the ticket" | project | Both find delete_ticket but expected pattern mismatch |
72- | "who works in engineering" | hr | Requires department filtering, not just listing |
73- | "add a new prospect" | crm | Both find prospect tools but connector mismatch |
74- | "see all shared files" | documents | "shared" narrows scope too much |
75- | "see available trainings" | lms | Both find training tools but pattern mismatch |
76- | "track learning progress" | lms | Abstract concept mapping |
77- | "create team workspace" | messaging | Cross-domain: workspace vs channel |
78- | "log customer call" | crm | Connector-specific (Salesforce) term |
79- | "add new lead" | crm | Connector-specific (HubSpot) but returns wrong HubSpot actions |
63+ | Query | Category | Why Hard |
64+ | --------------------------- | --------- | -------------------------------------------------------------- |
65+ | "onboard a new team member" | hr | "team member" maps to team tools, not HR |
66+ | "OOO" | hr | Abbreviation - neither understands |
67+ | "DM someone" | messaging | Both find discord_create_dm but expected pattern too strict |
68+ | "customer onboarding" | crm | Cross-domain concept |
69+ | "close quarter books" | crm | Domain-specific financial term |
70+ | "PTO request" | hr | Both find PTO tools but expected pattern mismatch |
71+ | "kill the ticket" | project | Both find delete_ticket but expected pattern mismatch |
72+ | "who works in engineering" | hr | Requires department filtering, not just listing |
73+ | "add a new prospect" | crm | Both find prospect tools but connector mismatch |
74+ | "see all shared files" | documents | "shared" narrows scope too much |
75+ | "see available trainings" | lms | Both find training tools but pattern mismatch |
76+ | "track learning progress" | lms | Abstract concept mapping |
77+ | "create team workspace" | messaging | Cross-domain: workspace vs channel |
78+ | "log customer call" | crm | Connector-specific (Salesforce) term |
79+ | "add new lead" | crm | Connector-specific (HubSpot) but returns wrong HubSpot actions |
8080
8181## How to Run
8282
@@ -116,16 +116,16 @@ STACKONE_API_KEY=xxx uv run python tests/benchmark_search.py
116116
11711794 tasks across 8 categories:
118118
119- | Category | Tasks | Description |
120- | ----------| -------| -------------|
121- | HR/HRIS | 19 | Employee management, time off, org structure |
122- | Recruiting/ATS | 12 | Candidates, applications, interviews |
123- | CRM | 12 | Contacts, deals, accounts |
124- | Project Management | 8 | Tasks, issues, projects |
125- | Messaging | 5 | Messages, channels, conversations |
126- | Documents | 5 | Files, folders, drives |
127- | Marketing | 5 | Campaigns, lists, automation |
128- | LMS | 5 | Courses, assignments, completions |
119+ | Category | Tasks | Description |
120+ | ------------------ | ----- | -------------------------------------------- |
121+ | HR/HRIS | 19 | Employee management, time off, org structure |
122+ | Recruiting/ATS | 12 | Candidates, applications, interviews |
123+ | CRM | 12 | Contacts, deals, accounts |
124+ | Project Management | 8 | Tasks, issues, projects |
125+ | Messaging | 5 | Messages, channels, conversations |
126+ | Documents | 5 | Files, folders, drives |
127+ | Marketing | 5 | Campaigns, lists, automation |
128+ | LMS | 5 | Courses, assignments, completions |
129129
130130Plus per-connector tests (Slack, Jira, Greenhouse, Salesforce, HubSpot) and edge cases (abbreviations, slang, complex queries).
131131
@@ -138,6 +138,7 @@ Plus per-connector tests (Slack, Jira, Greenhouse, Salesforce, HubSpot) and edge
138138### Corpus
139139
140140Both local and semantic search operate on the same action catalog:
141+
141142- 5,144 unique actions
142143- 200+ connectors (BambooHR, Greenhouse, Salesforce, Slack, Jira, etc.)
143144- 7 verticals (HRIS, ATS, CRM, Documents, IAM, LMS, Marketing)
0 commit comments