14 Jan 01:21

HCYT

Immutable

v4.0.0a9

9ccc7b2

4.0.0a9 Latest

Latest

Changelog

All notable changes to this project will be documented in this file.

[4.0.0a9] - 2026-01-14

🚀 Major Rewrite (The Rust Era)

Rust Core: The entire core logic has been rewritten in Rust (flashtext-rs), providing massive performance gains and memory safety.
Performance: Throughput increased by 3x-4x compared to v3.0 (Python). Match latency is now near-constant regardless of keyword count.
Drop-in Compatible: 100% API compatibility with the original FlashText and v3.x series.

Added

True Unicode Boundaries: Fixed the long-standing issue where non-ASCII characters (e.g., é, ß, Adjancent CJK) were incorrectly treated as delimiters. Rust's unicode-segmentation now handles word boundaries correctly for ALL languages.
Universal Wheels: Pre-compiled binary wheels for macOS (Intel/Silicon), Windows (x64), Linux (x86_64/aarch64), and Musl Linux (Alpine). No Rust compiler needed for users.
JSON File Loading: Native support for loading keywords from JSON files for faster startup.

Changed

Packaging: Migrated build system to maturin + pyo3.
Minimum Python: Now requires Python >= 3.8.

Assets 3

13 Jan 13:04

HCYT

Immutable

3.1.1

935cc01

3.1.1

Changelog

All notable changes to this project will be documented in this file.

[3.1.1] - 2026-01-13

Refactoring (Architecture 3.0)

Modularization: Split monolithic keyword.py into distinct responsibilities:
- flashtext/keyword.py: High-level API and facade.
- flashtext/trie_dict.py: Data structure operations (pure functions).
- flashtext/utils.py: Algorithms (Levenshtein) and helper utilities.
Utils: Extracted extract_sentences and levensthein to utils.py to reduce class weight.

Performance

Loop Optimization: Optimized extract_keywords hot loop by caching member variables and reducing object creation overhead.
Benchmark: Performance restored to ~0.27s (Case-Sensitive) / 0.29s (Case-Insensitive) on standard corpus.
Reverted: "Internationalized Word Boundaries" (Issue #4) reverted due to 3.5x performance regression. This feature is reopened for future optimized implementation.

Added

Mixed Case Support: Added ability to mix case-sensitive and case-insensitive keywords in the same processor.
- Implemented via Multi-Edge Trie (Space-for-Time), removing runtime lower() calls.
Fuzzy Matching Support: Added max_cost parameter to support Levenshtein distance matching (including CJK support).
Keyword Count API: New len(keyword_processor) support to get total unique terms.
Replacement Metadata: replace_keywords now supports span_info=True to return detailed replacement records.
Sentence Extraction: New extract_sentences() API to find sentences containing keywords.
Clean Name Mapping: add_keyword now accepts a list of clean names (Issue #11).

Fixed

CJK Support: Fixed adjacent keyword extraction for Chinese/Japanese/Korean text (Issue #1).
Unicode Spans: Fixed inaccurate span positions when handling Unicode characters that change length during case folding (Issue #2).
Edge Cases:
- Fixed behavior when removing characters from non_word_boundaries (Issue #10).
- Fixed replace_keywords with empty boundary sets (Issue #3).
Platform: Verified Linux aarch64 support (Pure Python).

Documentation

Added CONTRIBUTING.md with strict performance guidelines.
Added benchmark.py for standardized performance testing.
Updated README.md with new features and benchmark results.

Assets 3

13 Jan 10:36

HCYT

Immutable

3.0.0

11d6ea7

v3.0.0 - Internationalization Fixes

🎉 flashtext-i18n v3.0.0

This is the first release of the i18n-focused fork of flashtext, with fixes for internationalization (CJK, Unicode) issues.

✨ Bug Fixes

#1 CJK + Numbers: 中文關鍵詞後接數字現在可以正確提取
- "地中海贫血2" → ["地中海贫血"] ✅
#2 Unicode Span: Unicode 大小寫轉換不再導致 span 位置錯誤
- 土耳其語 İ 等特殊字符現在正確處理
#3 Adjacent Keywords: 相鄰關鍵詞的 replace_keywords 現在正常運作
- 重寫為基於 extract_keywords，更簡潔可靠
#10 Custom Boundaries: 從 non_word_boundaries 移除的字符現在可以匹配

📦 Installation

pip install flashtext-i18n

🔄 Migration from flashtext

# Before
from flashtext import KeywordProcessor
# After (drop-in replacement)
from flashtext import KeywordProcessor  # Same import, different package

⚠️ Breaking Changes
None - API 100% compatible with flashtext 2.x

🙏 Credits
Original flashtext by Vikash Singh
Fork maintained by termdock & Huang Chung Yi

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changelog

[4.0.0a9] - 2026-01-14

🚀 Major Rewrite (The Rust Era)

Added

Changed

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Changelog

[3.1.1] - 2026-01-13

Refactoring (Architecture 3.0)

Performance

Added

Fixed

Documentation

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🎉 flashtext-i18n v3.0.0

✨ Bug Fixes

📦 Installation

Uh oh!

Releases: termdock/flashtext-i18n

4.0.0a9

Changelog

[4.0.0a9] - 2026-01-14

🚀 Major Rewrite (The Rust Era)

Added

Changed

Uh oh!

3.1.1

Changelog

[3.1.1] - 2026-01-13

Refactoring (Architecture 3.0)

Performance

Added

Fixed

Documentation

Uh oh!

v3.0.0 - Internationalization Fixes

🎉 flashtext-i18n v3.0.0

✨ Bug Fixes

📦 Installation

Uh oh!