KeyviDev · hendrikmuhs · Feb 14, 2026 · Feb 14, 2026 · Feb 14, 2026
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,7 +1,7 @@
 <!--
 Thank you for contributing to keyvi!
 
-Before submission, please ensure that you have read and agree to our 
+Before submission, please ensure that you have read and agree to our
 contributor guidelines: https://github.com/KeyviDev/keyvi/blob/master/CONTRIBUTING.md.
 
 Please delete these lines.

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -28,7 +28,9 @@ repos:
       - id: check-toml
       - id: check-yaml
       - id: end-of-file-fixer
+        exclude: '^(.*\.svg)$'
       - id: trailing-whitespace
+        exclude: '^(.*\.svg)$'
   - repo: https://github.com/charliermarsh/ruff-pre-commit
     rev: "v0.14.0"
     hooks:

diff --git a/LICENSE b/LICENSE
@@ -199,4 +199,3 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-
diff --git a/doc/RELEASE_PROCESS.md b/doc/RELEASE_PROCESS.md
@@ -10,15 +10,15 @@ Create a release branch called `release-X.Y.Z`
  - Commit to `release-X.Y.Z` and push it to https://github.com/KeyviDev/keyvi/
  - Wait for CI to build all targets
 
-### Create tag 
+### Create tag
  - Draft a new release tagged vX.Y.Z with `release-X.Y.Z` as the target branch
  - Add the release notes in the description with references to PRs
  - Publish release
 
 ## On the `master` branch
 
 ### Update the `python/setup.py` file
- - Update to the next release version 
+ - Update to the next release version
 ```
 VERSION_MAJOR = X
 VERSION_MINOR = Y

diff --git a/doc/algorithm/Construction-Basics.md b/doc/algorithm/Construction-Basics.md
@@ -1,6 +1,6 @@
 ## Introduction
 
-“An automaton (plural: automata) is a self-operating machine. The word is sometimes used to describe a robot, more 
+“An automaton (plural: automata) is a self-operating machine. The word is sometimes used to describe a robot, more
 specifically an autonomous robot. Used colloquially, it refers to a mindless follower.”  (Wikipedia)
 
 ### Minimal Acyclic Finite State Automata
@@ -19,7 +19,7 @@ Minimizing yields the FSA:
 keyvi is using so called "incremental construction", the alternative are obviously non-incremental algorithms. If you are curious, there are
 some instruction classes available on youtube.
 
-keyvi is only about text/string automata. There are other use cases for finite state techniques, e.g. modeling 
+keyvi is only about text/string automata. There are other use cases for finite state techniques, e.g. modeling
 real control flows.
 
 ### Incremental Construction by Watson/Daciuk
@@ -34,19 +34,19 @@ real control flows.
             replace_or_register(LastState)‏
         add_suffix(LastState, CurrentSuffix)‏
     replace_or_register(q0)‏
-    
+
     func replace_or_register(State):
         Child = last_child(State)‏
         if (has_children (Child):
             replace_or_register(Child)‏
         if Register.find(Child):
             last_child = Register[Child]
         else:
-            Register.add(Child)	
-	
+            Register.add(Child)
+
 ![Construction example Daciuk/Watson](/doc/images/daciuk_watson.png)
 
-### Incremental Construction in Keyvi 
+### Incremental Construction in Keyvi
 
 #### Algorithm
 
@@ -55,17 +55,17 @@ real control flows.
     while (another word):
         new_word = next word in lexicographic order
         common_prefix = common_prefix(current_word, new_word)‏
-        feed_stack ( current_word , length( common_prefix), length ( current_word ) )			
+        feed_stack ( current_word , length( common_prefix), length ( current_word ) )
         consume_stack ( length ( common_prefix ) )‏
         current_word = new_word
     feed_stack ( current_word, 0, length ( current_word ) )‏
     consume_stack ( 0 )‏
-    
+
     func feed_stack( word, begin, end ):
         for ( i = begin, i<end, ++i ):
             unpacked_state_stack.insert(i, word[i])‏
         unpacked_state_stack.insert(end, “final”)‏
-    
+
     func consume_stack(end):
         while ( highest_stack > end ):
             stack_entry = unpacked_state_stack.pop(highest_stack)‏
@@ -75,9 +75,9 @@ real control flows.
 
 Code:
  General entry point: [generator](/keyvi/src/cpp/dictionary/fsa/generator.h)
- 
+
  Unpacked_State_Stack: [unpacked_state_stack](/keyvi/src/cpp/dictionary/fsa/internal/unpacked_state_stack.h)
- 
+
 #### Illustration
 
 Building a tiny automata containing just 4 strings:
@@ -88,47 +88,47 @@ Building a tiny automata containing just 4 strings:
     abe
 
 ## Step1
-  
-  
+
+
 ![Step1](/doc/images/construction_step1.png)
-  
-  
+
+
 ## Step2
-  
-  
+
+
 ![Step2](/doc/images/construction_step2.png)
-  
-  
+
+
 ## Step3
-  
-  
+
+
 ![Step3](/doc/images/construction_step3.png)
-  
-  
+
+
 ## Step4
-  
-  
+
+
 ![Step4](/doc/images/construction_step4.png)
-  
-  
+
+
 ##Step5
-  
-  
+
+
 ![Step5](/doc/images/construction_step5.png)
-  
-  
+
+
 ## Step6
-  
-  
+
+
 ![Step6](/doc/images/construction_step6.png)
-  
-  
+
+
 ## Step7
-  
-  
+
+
 ![Step7](/doc/images/construction_step7.png)
-  
-  
+
+
 #### Summary
 
 The FSA is built from "right to left", the root state is written last.
@@ -138,4 +138,3 @@ The FSA is built from "right to left", the root state is written last.
  - use sorted data characteristic: compare only the last two words
  - no temporary state creation as with replace_or_register, which can be problematic depended on underlying data structure (e.g. Sparse Array)‏
  - no recursion (as in replace_or_register)‏
-
diff --git a/doc/algorithm/Extensibility.md b/doc/algorithm/Extensibility.md
@@ -1,20 +1,20 @@
 ## Extensibility
 
-The keyvi compiler is implemented in C++11 and uses templates to allow customization, like having a different 
+The keyvi compiler is implemented in C++11 and uses templates to allow customization, like having a different
 persistence layer, different minimization, etc.
 
 The most useful customization are different value types:
 
 ### Value types
 
-Keys are always strings, values can be of any type, even nested types. Built-in types at time of writing are no-value, 
+Keys are always strings, values can be of any type, even nested types. Built-in types at time of writing are no-value,
 integer, strings and json.
 
 Value types have to implement a ["duck-type"](http://en.wikipedia.org/wiki/Duck_typing) interface.
 
 Code: [IValue_Store](/keyvi/src/cpp/dictionary/fsa/internal/ivalue_store.h)
 
-In a nutshell, writing a new value store entails: serialization of the value, the interface to the compiler and the 
+In a nutshell, writing a new value store entails: serialization of the value, the interface to the compiler and the
 deserialization for the lookup.
 
 Note: The compiler interface expects an ID for each unique value, the ID is used for minimization.
diff --git a/doc/algorithm/Minimization.md b/doc/algorithm/Minimization.md
@@ -5,10 +5,10 @@ new state.
 
 ### keyvi Minimization
 
-Minimization is implemented using a hash table. Each state that is written, is inserted into the hash table. Before 
+Minimization is implemented using a hash table. Each state that is written, is inserted into the hash table. Before
 persisting a new state, we try to find a equal state in the hashtable.
 
-Code: 
+Code:
 
 Entry point of minimization: [sparse_array_builder](/keyvi/src/cpp/dictionary/fsa/internal/sparse_array_builder.h)
 Minimization Hashtable: [sparse_array_builder](/keyvi/src/cpp/dictionary/fsa/internal/minimization_hash.h)
@@ -17,7 +17,7 @@ The Hashtable in keyvi has a very small footprint of 12 bytes per entry.
 
 ## Getting best Compression ratios
 
-Minimization/Compression is dependent on the data. FSA's are mainly used in computational linguistics, one of the reasons: 
+Minimization/Compression is dependent on the data. FSA's are mainly used in computational linguistics, one of the reasons:
 FSA's make use of high ambiguity in languages.
 
 ![Compression](/doc/images/compression.png)
@@ -27,28 +27,28 @@ FSA's make use of high ambiguity in languages.
 Therefore having "natural language keys" yields compression, both at prefix as well as suffix side. "Binary keys", e.g.
 fingerprints, are pretty bad in terms of compression.
 
-Note that Prefix compression is basically the same as in a trie. 
+Note that Prefix compression is basically the same as in a trie.
 
 ### Suffix and Value Compression
 
-In contrast to a trie the FSA compresses suffixes as well. But note: The value is part of the suffix, as it is attached 
+In contrast to a trie the FSA compresses suffixes as well. But note: The value is part of the suffix, as it is attached
 to the "final state". Therefore sparse values will yield best results while having totally unique values will result in
 no suffix compression.
 
 ### Improving Compression Rate / Reducing Size
 
 Normalize keys to gain better prefix compression.
 
-Think about your values, reduce the version space if possible. For example: if you store integer values, think about their 
+Think about your values, reduce the version space if possible. For example: if you store integer values, think about their
 range. Normalizing the integers reduce the number of unique values and therefore improve the compression ratio.
 
 Take minimization into account: permutations and repetitions are a strength of the algorithm, e.g. storing tons of almost
  identical keys pointing to the same value. In other data structures this can cause huge memory usage, FSA's are good in
  minimizing that.
- 
+
 ### Check Questions
 
-1. You want to write a Date Extractor which can extract all dates of the fomat "YYYY-MM-DD". Estimate the size 
-requirement. 
+1. You want to write a Date Extractor which can extract all dates of the fomat "YYYY-MM-DD". Estimate the size
+requirement.
 
-2. Now assign a counter(incremented each time) to each key. What happens? What about your size estimate? 
+2. Now assign a counter(incremented each time) to each key. What happens? What about your size estimate?
diff --git a/doc/algorithm/Persistence-Basics.md b/doc/algorithm/Persistence-Basics.md
@@ -1,23 +1,23 @@
 ## Persistence Introduction
 
-The default persistence is implemented as sparse array (sparse table). 
+The default persistence is implemented as sparse array (sparse table).
 
 Code: [sparse_array_persistence](/keyvi/src/cpp/dictionary/fsa/internal/sparse_array_persistence.h)
 
 ### Sparse Array in a nutshell
 
-The underlying data structure consists of 2 simple arrays of the same length(not size), a byte array and a 
+The underlying data structure consists of 2 simple arrays of the same length(not size), a byte array and a
 pointer array(e.g. uint32_t)
 
 ![SparseArraySingleState](/doc/images/sparse_array_single_state.png)
 
-A lookup starts at a given offset, a lookup succeeds if the numeric value (e.g. ASCII value) is found in the bucket 
+A lookup starts at a given offset, a lookup succeeds if the numeric value (e.g. ASCII value) is found in the bucket
 defined by sum of the offset and the numeric value.
 
 ![SparseArrayPointer](/doc/images/sparse_array_pointer.png)
 
 This check is required to allow interleaving of state vectors. To save space vectors of states are interleaved:
- 
+
 ![SparseArrayInterleaved](/doc/images/sparse_array_mixed.png)
 
 Even with a brute-force method that interleaves state vectors yields a very good compression rate.
@@ -30,4 +30,3 @@ The algorithm tries to find space in the existing sparse array:
 ![SparseArrayPacking](/doc/images/sparse_array_packing.png)
 
 Code: [sparse_array_building](/keyvi/src/cpp/dictionary/fsa/internal/sparse_array_builder.h)
-
diff --git a/doc/algorithm/Scaling.md b/doc/algorithm/Scaling.md
@@ -5,7 +5,7 @@ This page describes a number of performance and scaling tricks to make it possib
 ### Sorting
 
 The construction algorithm requires sorted input. To be able to create a dictionary out of millions of keys we apply
-  external memory sorting. Fortunately "sorting" huge lists is not a problem these days. keyvi uses 
+  external memory sorting. Fortunately "sorting" huge lists is not a problem these days. keyvi uses
   [TPIE](http://madalgo.au.dk/tpie/) for external merge sort.
 
 Note: Map-Reduce also sorts data using external memory sort, so using Map-Reduce with 1 Reducer would also give you
@@ -18,27 +18,27 @@ Code: [dictionary_compiler](/keyvi/src/cpp/dictionary/dictionary_compiler.h)
 
 ### Minimization
 
-For each state the compiler stores a fingerprint of the state in the hashtable, although a fingerprint is stored in 
+For each state the compiler stores a fingerprint of the state in the hashtable, although a fingerprint is stored in
 12 bytes the hashtable would not fit into main memory if you have lots of keys.
 
-Therefore keyvi uses several hashtables organized by a LRU (Least Recently Used) Cache: 
+Therefore keyvi uses several hashtables organized by a LRU (Least Recently Used) Cache:
 
-The 1st hashtable is filled with a limited number of entries, once full a new hash table is created. If the amount of 
+The 1st hashtable is filled with a limited number of entries, once full a new hash table is created. If the amount of
 hashtables reaches the limit the last hashtable is thrown away.
 
 To keep "good hashes": Each entry of a successful lookup in a lower hashtable will be moved to the top hashtable. Therefore
-states which often minimize will stay in memory, while states which do not minimize will be thrown away over time. 
+states which often minimize will stay in memory, while states which do not minimize will be thrown away over time.
 
 Code: [LRU Cache](/keyvi/src/cpp/dictionary/fsa/internal/lru_generation_cache.h)
 
 ### Compilation/Index Performance
 
-Apart from low-level optimizations like avoiding object copies, pooling, short string optimization, good hash function etc., 
+Apart from low-level optimizations like avoiding object copies, pooling, short string optimization, good hash function etc.,
 keyvi uses some optimization on the algorithm side.
 
 #### Minimization Stop
 
-As described in Construction the FSA is build from 'right to left', minimization only works this way. Once a minimization 
+As described in Construction the FSA is build from 'right to left', minimization only works this way. Once a minimization
 fails it is impossible to minimize the parent state. Therefore we stop minimization of the preceding states once it fails once.
 Note: we still store the fingerprints in the hashtable for later minimizations.
 
@@ -48,9 +48,9 @@ Note: The amount of memory is configurable in the compiler. Increasing the limit
 
 #### Packing
 
-Sparse Array Construction is one of the most demanding parts. To speedup compilation we make use of bit vectors, 
-sliding windows and the [De Bruijn](http://en.wikipedia.org/wiki/De_Bruijn_sequence) sequence to quickly find spots to pack 
-the data, or - if available - intrinsic compiler/CPU functions. 
+Sparse Array Construction is one of the most demanding parts. To speedup compilation we make use of bit vectors,
+sliding windows and the [De Bruijn](http://en.wikipedia.org/wiki/De_Bruijn_sequence) sequence to quickly find spots to pack
+the data, or - if available - intrinsic compiler/CPU functions.
 
 Code: [BitVector](/keyvi/src/cpp/dictionary/fsa/internal/bit_vector.h)
 

diff --git a/doc/usage/Building keyvi dictionaries with python.md b/doc/usage/Building keyvi dictionaries with python.md
@@ -7,15 +7,15 @@ The compiler is also available from python keyvi:
 
     # repeat for every key
     compiler.Add("foo")
-    
+
     # finally compile
     compiler.Compile()
     compiler.WriteToFile("/tmp/test.kv")
 
 Other available Compiler in `keyvi.compiler`
 
 type              | details
------------------ | --------------------------------------------------------------------------------------------- 
+----------------- | ---------------------------------------------------------------------------------------------
 integer           | CompletionDictionaryCompiler
 key-only          | KeyOnlyDictionaryCompiler
 string            | StringDictionaryCompiler
@@ -24,8 +24,7 @@ json              | JsonDictionaryCompiler
 For dictionaries with values, Add takes the value as second parameter:
 
     compiler.Add("foo", 42)
-    
+
 To ensure that you do not run out of disk space while compiling, set $TMPDIR to a disk with enough free space.
-
-    export TMPDIR=/mnt/tmp
 
+    export TMPDIR=/mnt/tmp
Original file line number	Diff line number	Diff line change
Expand Up		@@ -199,4 +199,3 @@
		WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
		See the License for the specific language governing permissions and
		limitations under the License.