Skip to content

Commit be6db2a

Browse files
committed
Fix the lint in the benchmark file
1 parent 981f912 commit be6db2a

1 file changed

Lines changed: 61 additions & 60 deletions

File tree

tests/BENCHMARK_RESULTS.md

Lines changed: 61 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@
99

1010
## Summary
1111

12-
| Method | Hit@5 | MRR | Avg Latency | Hits |
13-
|--------|-------|-----|-------------|------|
14-
| Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 |
15-
| Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 |
16-
| **Improvement** | **+10.6%** | **+0.096** | | **+10** |
12+
| Method | Hit@5 | MRR | Avg Latency | Hits |
13+
| ----------------- | ---------- | ---------- | ----------- | ------- |
14+
| Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 |
15+
| Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 |
16+
| **Improvement** | **+10.6%** | **+0.096** | | **+10** |
1717

1818
## Detailed Breakdown
1919

@@ -22,61 +22,61 @@
2222
Tasks where semantic search finds the correct result but local BM25 fails.
2323
These demonstrate semantic search's ability to understand **intent and synonyms**.
2424

25-
| Query | Local Top Result | Semantic Top Result |
26-
|-------|-----------------|-------------------|
27-
| "fire someone" | workable_get_job_recruiters | factorial_terminate_employee |
28-
| "ping the team" | teamtailor_delete_team | slack_send_message |
29-
| "file a new bug" | github_create_or_update_file | jira_update_issue |
30-
| "ping my colleague" | salesforce_get_my_events | microsoftoutlook_reply_message |
31-
| "fetch staff information" | pinpoint_get_application | workday_list_workers |
32-
| "show me everyone in the company" | humaans_get_me | lattice_talent_list_users |
33-
| "turn down a job seeker" | pinpoint_get_job_seeker | jobadder_reject_requisition |
34-
| "check application status" | dropbox_check_remove_member | jobadder_list_application_status |
35-
| "check my to-do list" | jira_check_bulk_permissions | todoist_list_tasks |
36-
| "start a group chat" | microsoftteams_update_chat | discord_create_group_dm |
37-
| "move candidate forward" | workable_move_candidate | greenhouse_move_application |
38-
| "approve PTO" | ashby_approve_offer | planday_approve_absence_request |
39-
| "update staff record" | bamboohr_update_hour_record | cezannehr_update_employee |
40-
| "pull the org chart" | github_create_issue_comment | lattice_list_review_cycles |
41-
| "assign training to employee" | easyllama_assign_training | hibob_create_training_record |
42-
| "file a bug report" | smartrecruiters_get_report_file | github_create_issue_comment |
43-
| "track customer interaction" | qlik_create_interaction | peoplefluent_track_launch |
25+
| Query | Local Top Result | Semantic Top Result |
26+
| --------------------------------- | ------------------------------- | -------------------------------- |
27+
| "fire someone" | workable_get_job_recruiters | factorial_terminate_employee |
28+
| "ping the team" | teamtailor_delete_team | slack_send_message |
29+
| "file a new bug" | github_create_or_update_file | jira_update_issue |
30+
| "ping my colleague" | salesforce_get_my_events | microsoftoutlook_reply_message |
31+
| "fetch staff information" | pinpoint_get_application | workday_list_workers |
32+
| "show me everyone in the company" | humaans_get_me | lattice_talent_list_users |
33+
| "turn down a job seeker" | pinpoint_get_job_seeker | jobadder_reject_requisition |
34+
| "check application status" | dropbox_check_remove_member | jobadder_list_application_status |
35+
| "check my to-do list" | jira_check_bulk_permissions | todoist_list_tasks |
36+
| "start a group chat" | microsoftteams_update_chat | discord_create_group_dm |
37+
| "move candidate forward" | workable_move_candidate | greenhouse_move_application |
38+
| "approve PTO" | ashby_approve_offer | planday_approve_absence_request |
39+
| "update staff record" | bamboohr_update_hour_record | cezannehr_update_employee |
40+
| "pull the org chart" | github_create_issue_comment | lattice_list_review_cycles |
41+
| "assign training to employee" | easyllama_assign_training | hibob_create_training_record |
42+
| "file a bug report" | smartrecruiters_get_report_file | github_create_issue_comment |
43+
| "track customer interaction" | qlik_create_interaction | peoplefluent_track_launch |
4444

4545
### Local Wins (7 tasks)
4646

4747
Tasks where BM25 keyword matching outperforms semantic search.
4848

49-
| Query | Local Top Result | Semantic Top Result |
50-
|-------|-----------------|-------------------|
51-
| "see who applied for the role" | greenhouse_list_applied_candidate_tags | ashby_add_hiring_team_member |
52-
| "advance someone to the next round" | greenhouse_move_application | factorial_invite_employee |
53-
| "see open positions" | teamtailor_list_jobs | hibob_create_position_opening |
54-
| "close a deal" | zohocrm_get_deal | shopify_close_order |
55-
| "check course completion" | saba_delete_recurring_completion | saba_get_course |
56-
| "update deal and notify team" | zohocrm_get_deal | microsoftteams_update_team |
57-
| "look up customer" | linear_update_customer_need | shopify_search_customers |
49+
| Query | Local Top Result | Semantic Top Result |
50+
| ----------------------------------- | -------------------------------------- | ----------------------------- |
51+
| "see who applied for the role" | greenhouse_list_applied_candidate_tags | ashby_add_hiring_team_member |
52+
| "advance someone to the next round" | greenhouse_move_application | factorial_invite_employee |
53+
| "see open positions" | teamtailor_list_jobs | hibob_create_position_opening |
54+
| "close a deal" | zohocrm_get_deal | shopify_close_order |
55+
| "check course completion" | saba_delete_recurring_completion | saba_get_course |
56+
| "update deal and notify team" | zohocrm_get_deal | microsoftteams_update_team |
57+
| "look up customer" | linear_update_customer_need | shopify_search_customers |
5858

5959
### Both Miss (15 tasks)
6060

6161
Hard queries that neither method handles well. Many are abbreviations, cross-domain concepts, or have overly strict expected matches.
6262

63-
| Query | Category | Why Hard |
64-
|-------|----------|----------|
65-
| "onboard a new team member" | hr | "team member" maps to team tools, not HR |
66-
| "OOO" | hr | Abbreviation - neither understands |
67-
| "DM someone" | messaging | Both find discord_create_dm but expected pattern too strict |
68-
| "customer onboarding" | crm | Cross-domain concept |
69-
| "close quarter books" | crm | Domain-specific financial term |
70-
| "PTO request" | hr | Both find PTO tools but expected pattern mismatch |
71-
| "kill the ticket" | project | Both find delete_ticket but expected pattern mismatch |
72-
| "who works in engineering" | hr | Requires department filtering, not just listing |
73-
| "add a new prospect" | crm | Both find prospect tools but connector mismatch |
74-
| "see all shared files" | documents | "shared" narrows scope too much |
75-
| "see available trainings" | lms | Both find training tools but pattern mismatch |
76-
| "track learning progress" | lms | Abstract concept mapping |
77-
| "create team workspace" | messaging | Cross-domain: workspace vs channel |
78-
| "log customer call" | crm | Connector-specific (Salesforce) term |
79-
| "add new lead" | crm | Connector-specific (HubSpot) but returns wrong HubSpot actions |
63+
| Query | Category | Why Hard |
64+
| --------------------------- | --------- | -------------------------------------------------------------- |
65+
| "onboard a new team member" | hr | "team member" maps to team tools, not HR |
66+
| "OOO" | hr | Abbreviation - neither understands |
67+
| "DM someone" | messaging | Both find discord_create_dm but expected pattern too strict |
68+
| "customer onboarding" | crm | Cross-domain concept |
69+
| "close quarter books" | crm | Domain-specific financial term |
70+
| "PTO request" | hr | Both find PTO tools but expected pattern mismatch |
71+
| "kill the ticket" | project | Both find delete_ticket but expected pattern mismatch |
72+
| "who works in engineering" | hr | Requires department filtering, not just listing |
73+
| "add a new prospect" | crm | Both find prospect tools but connector mismatch |
74+
| "see all shared files" | documents | "shared" narrows scope too much |
75+
| "see available trainings" | lms | Both find training tools but pattern mismatch |
76+
| "track learning progress" | lms | Abstract concept mapping |
77+
| "create team workspace" | messaging | Cross-domain: workspace vs channel |
78+
| "log customer call" | crm | Connector-specific (Salesforce) term |
79+
| "add new lead" | crm | Connector-specific (HubSpot) but returns wrong HubSpot actions |
8080

8181
## How to Run
8282

@@ -116,16 +116,16 @@ STACKONE_API_KEY=xxx uv run python tests/benchmark_search.py
116116

117117
94 tasks across 8 categories:
118118

119-
| Category | Tasks | Description |
120-
|----------|-------|-------------|
121-
| HR/HRIS | 19 | Employee management, time off, org structure |
122-
| Recruiting/ATS | 12 | Candidates, applications, interviews |
123-
| CRM | 12 | Contacts, deals, accounts |
124-
| Project Management | 8 | Tasks, issues, projects |
125-
| Messaging | 5 | Messages, channels, conversations |
126-
| Documents | 5 | Files, folders, drives |
127-
| Marketing | 5 | Campaigns, lists, automation |
128-
| LMS | 5 | Courses, assignments, completions |
119+
| Category | Tasks | Description |
120+
| ------------------ | ----- | -------------------------------------------- |
121+
| HR/HRIS | 19 | Employee management, time off, org structure |
122+
| Recruiting/ATS | 12 | Candidates, applications, interviews |
123+
| CRM | 12 | Contacts, deals, accounts |
124+
| Project Management | 8 | Tasks, issues, projects |
125+
| Messaging | 5 | Messages, channels, conversations |
126+
| Documents | 5 | Files, folders, drives |
127+
| Marketing | 5 | Campaigns, lists, automation |
128+
| LMS | 5 | Courses, assignments, completions |
129129

130130
Plus per-connector tests (Slack, Jira, Greenhouse, Salesforce, HubSpot) and edge cases (abbreviations, slang, complex queries).
131131

@@ -138,6 +138,7 @@ Plus per-connector tests (Slack, Jira, Greenhouse, Salesforce, HubSpot) and edge
138138
### Corpus
139139

140140
Both local and semantic search operate on the same action catalog:
141+
141142
- 5,144 unique actions
142143
- 200+ connectors (BambooHR, Greenhouse, Salesforce, Slack, Jira, etc.)
143144
- 7 verticals (HRIS, ATS, CRM, Documents, IAM, LMS, Marketing)

0 commit comments

Comments
 (0)