Commit ed36324
committed
feat(meta-tools): add hybrid BM25 + TF-IDF search strategy
This commit implements hybrid search combining BM25 and TF-IDF algorithms
for meta_search_tools, matching the functionality in the Node.js SDK (PR #122).
Based on evaluation results showing 10.8% accuracy improvement with the hybrid approach.
Changes:
1. TF-IDF Implementation (stackone_ai/utils/tfidf_index.py):
- Lightweight TF-IDF vector index with no external dependencies
- Tokenizes text with stopword removal
- Computes smoothed IDF values
- Uses sparse vectors for efficient cosine similarity computation
- Returns results with scores clamped to [0, 1]
2. Hybrid Search Integration (stackone_ai/meta_tools.py):
- Updated ToolIndex to support hybrid_alpha parameter (default: 0.2)
- Implements score fusion: hybrid_score = alpha * bm25 + (1 - alpha) * tfidf
- Fetches top 50 candidates from both algorithms for better fusion
- Normalizes and clamps all scores to [0, 1] range
- Default alpha=0.2 gives more weight to BM25 (optimized through testing)
- Both BM25 and TF-IDF use weighted document representations:
* Tool name boosted 3x for TF-IDF
* Category and actions included for better matching
3. Enhanced API (stackone_ai/models.py):
- Add hybrid_alpha parameter to Tools.meta_tools() method
- Defaults to 0.2 (optimized value from Node.js validation)
- Allows customization for different use cases
- Updated docstrings to explain hybrid search benefits
4. Comprehensive Tests (tests/test_meta_tools.py):
- 4 new test cases for hybrid search functionality:
* hybrid_alpha parameter validation (including boundary checks)
* Hybrid search returns meaningful results
* Different alpha values affect ranking
* meta_tools() accepts custom alpha parameter
- All 18 tests passing
5. Documentation Updates (README.md):
- Updated Meta Tools section to highlight hybrid search
- Added "Hybrid Search Configuration" subsection with examples
- Explained how BM25 and TF-IDF complement each other
- Documented the alpha parameter and its effects
- Updated Features section to mention hybrid search
Technical Details:
- TF-IDF uses standard term frequency normalization and smoothed IDF
- Sparse vector representation for memory efficiency
- Cosine similarity for semantic matching
- BM25 provides keyword matching strength
- Fusion happens after score normalization for fair weighting
- Alpha=0.2 provides optimal balance (validated in Node.js SDK)
Performance:
- 10.8% accuracy improvement over BM25-only approach
- Efficient sparse vector operations
- Minimal memory overhead
- No additional external dependencies
Reference: StackOneHQ/stackone-ai-node#1221 parent 7f9a72a commit ed36324
File tree
6 files changed
+432
-28
lines changed- stackone_ai
- utils
- tests
6 files changed
+432
-28
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| |||
337 | 337 | | |
338 | 338 | | |
339 | 339 | | |
340 | | - | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
341 | 343 | | |
342 | 344 | | |
343 | 345 | | |
| |||
353 | 355 | | |
354 | 356 | | |
355 | 357 | | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
356 | 382 | | |
357 | 383 | | |
358 | 384 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | | - | |
| 28 | + | |
28 | 29 | | |
29 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
30 | 39 | | |
31 | 40 | | |
| 41 | + | |
32 | 42 | | |
33 | | - | |
| 43 | + | |
34 | 44 | | |
| 45 | + | |
35 | 46 | | |
36 | 47 | | |
37 | 48 | | |
| |||
44 | 55 | | |
45 | 56 | | |
46 | 57 | | |
47 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
48 | 70 | | |
49 | 71 | | |
50 | 72 | | |
| |||
54 | 76 | | |
55 | 77 | | |
56 | 78 | | |
57 | | - | |
| 79 | + | |
| 80 | + | |
58 | 81 | | |
59 | 82 | | |
60 | 83 | | |
61 | | - | |
62 | | - | |
| 84 | + | |
63 | 85 | | |
64 | | - | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
65 | 91 | | |
66 | 92 | | |
67 | | - | |
| 93 | + | |
68 | 94 | | |
69 | 95 | | |
70 | 96 | | |
| |||
74 | 100 | | |
75 | 101 | | |
76 | 102 | | |
77 | | - | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
78 | 107 | | |
79 | 108 | | |
80 | 109 | | |
81 | | - | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
82 | 119 | | |
83 | | - | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
84 | 147 | | |
85 | | - | |
86 | | - | |
| 148 | + | |
87 | 149 | | |
88 | 150 | | |
89 | 151 | | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
95 | 155 | | |
96 | 156 | | |
97 | 157 | | |
98 | 158 | | |
99 | 159 | | |
100 | | - | |
| 160 | + | |
101 | 161 | | |
102 | 162 | | |
103 | 163 | | |
| |||
118 | 178 | | |
119 | 179 | | |
120 | 180 | | |
121 | | - | |
122 | | - | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
123 | 184 | | |
124 | 185 | | |
125 | 186 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
532 | 532 | | |
533 | 533 | | |
534 | 534 | | |
535 | | - | |
| 535 | + | |
536 | 536 | | |
537 | 537 | | |
538 | | - | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
539 | 545 | | |
540 | 546 | | |
541 | 547 | | |
| |||
549 | 555 | | |
550 | 556 | | |
551 | 557 | | |
552 | | - | |
553 | | - | |
| 558 | + | |
| 559 | + | |
554 | 560 | | |
555 | 561 | | |
556 | 562 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
0 commit comments