Efficient bloom filter like data structure, based on the Rank Select Quotient Filter (RSQF).
This is a small and flexible general-purpose AMQ-Filter. It not only supports approximate membership testing like a bloom filter but also deletions, merging, resizing and serde serialization.
- High performance
- Supports removals
- Extremely compact, more so than comparable filters
- Can be created with a initial small capacity and grow as needed
- Fast bulk construction from sorted fingerprints via
Builder - (De)Serializable with serde
- Portable Rust implementation
- Only verifiable usages of unsafe
This data structure is a succinct hash table that can store fingerprints in a very compact way. Fingerprints are similar to a hash values, but are possibly truncated. The reason for false positives is that multiple items can map to the same fingerprint. For more information see the quotient filter Wikipedia page that describes a similar but less optimized version of the data structure. The actual implementation is based on the Rank Select Quotient Filter (RSQF).
The public API also exposes a fingerprint API, which can be used to succinctly store u64 hash values.
let mut f = qfilter::Filter::new(1000000, 0.01).unwrap();
for i in 0..1000 {
f.insert(i).unwrap();
}
for i in 0..1000 {
assert!(f.contains(i));
}Methods accepting T: Hash are provided for convenience using foldhash-portable, which offers high performance and stability across platforms. Note that a fixed seed is used (no DoS resistance) and #[derive(Hash)] output is not guaranteed stable across Rust compiler versions.
Filter supports a custom BuildHasher via its S type parameter (similar to HashMap). Use Filter::new_with_hasher() and related constructors. The hasher is not serialized; on deserialization it is reconstructed via S::default(). If that doesn't produce the correct hasher, use Filter::with_hasher() to restore it. Filters being merged must use the same hasher.
For a given capacity and error probability the RSQF may require significantly less space than the equivalent bloom filter or other AMQ-Filters.
| Bits per item | Error probability when full | Bits per item (Cont.) | Error (cont.) |
|---|---|---|---|
| 3.125 | 0.362 | 19.125 | 6.87e-06 |
| 4.125 | 0.201 | 20.125 | 3.43e-06 |
| 5.125 | 0.106 | 21.125 | 1.72e-06 |
| 6.125 | 0.0547 | 22.125 | 8.58e-07 |
| 7.125 | 0.0277 | 23.125 | 4.29e-07 |
| 8.125 | 0.014 | 24.125 | 2.15e-07 |
| 9.125 | 0.00701 | 25.125 | 1.07e-07 |
| 10.125 | 0.00351 | 26.125 | 5.36e-08 |
| 11.125 | 0.00176 | 27.125 | 2.68e-08 |
| 12.125 | 0.000879 | 28.125 | 1.34e-08 |
| 13.125 | 0.000439 | 29.125 | 6.71e-09 |
| 14.125 | 0.00022 | 30.125 | 3.35e-09 |
| 15.125 | 0.00011 | 31.125 | 1.68e-09 |
| 16.125 | 5.49e-05 | 32.125 | 8.38e-10 |
| 17.125 | 2.75e-05 | .. | .. |
| 18.125 | 1.37e-05 | .. | .. |
Version 0.3 is not serialization compatible with previous versions.
Version 0.2 changed public APIs (e.g. fallible constructors) which required a major version bump.
Serialization is bidirectionally compatible between versions 0.1 and 0.2.
- Fingerprint attached values
- Counting with fingerprint attached values, not fingerprint duplication
- More advanced growth strategies (InfiniFilter).
The implementation assumes popcnt and BMI2 (pdep, tzcnt) instructions are available
when compiling for x86_64 targets. These instructions are available on CPUs from 2015 or later. If they are not available, the Filter constructor will panic.
The legacy_x86_64_support feature enables support for older x86_64 CPUs by using
portable fallbacks and a ~10% performance penalty.
This project is licensed under the MIT license.