diff --git a/README.md b/README.md index 94b4349..45afe79 100644 --- a/README.md +++ b/README.md @@ -2,37 +2,43 @@ # HexTree -hextree provides tree structures that represent geographic regions -with [H3 cell]s. +HexTree provides tree structures for efficiently representing geographic +regions using [H3 cell]s. It takes advantage of H3's hierarchical structure +to automatically compact large regions and provide fast spatial queries. The primary structures are: - [**HexTreeMap**]: an H3 cell-to-value map. -- [**HexTreeSet**]: an H3 cell set for hit-testing. +- [**HexTreeSet**]: an H3 cell set for spatial containment testing. You can think of `HexTreeMap` vs. `HexTreeSet` as [`HashMap`] vs. [`HashSet`]. ## How is this different from `HashMap`? -The key feature of a hextree is that its keys (H3 cells) are -hierarchical. For instance, if you previously inserted an entry for a -low-res cell, but later query for a higher-res child cell, the tree -returns the value for the lower res cell. Additionally, with -[compaction], trees can automatically coalesce adjacent high-res cells -into their parent cell. For very large regions, the compaction process -_can_ continue to lowest resolution cells (res-0), possibly removing -millions of redundant cells from the tree. For example, a set of -4,795,661 res-7 cells representing North America coalesces [into a -42,383 element `HexTreeSet`][us915]. - -A hextree's internal structure exactly matches the semantics of an [H3 -cell]. The root of the tree has 122 resolution-0 nodes, followed by 15 -levels of 7-ary nodes. The level of an occupied node, or leaf node, is -the same as its corresponding H3 cell resolution. +HexTree leverages H3's hierarchical cell structure in two key ways: + +**Hierarchical Queries**: When you query for a cell, the tree returns +a value even if only a parent cell was inserted. For instance, if you +insert a low-res cell but later query for a higher-res child cell, the +tree returns the value from the parent. + +**Automatic Compaction**: With [compaction], the tree can automatically +coalesce 7 adjacent child cells into their parent cell, dramatically +reducing memory usage. For very large regions, compaction can continue +recursively to the lowest resolution cells (res-0), possibly removing +millions of redundant cells. For example, 4,795,661 res-7 cells +representing North America compact [into just 42,383 elements][us915]. + +The internal structure mirrors H3's hierarchy: the root contains 122 +resolution-0 base cells, with each level below being a 7-ary tree +(matching H3's 7 possible child cells per parent). The tree supports +up to 15 levels of resolution, where the depth of a leaf node corresponds +to its H3 cell resolution. ## Features * **`serde`**: support for serialization via [serde]. +* **`disktree`**: on-disk memory-mapped storage for large trees (enables `serde`, `byteorder`, and `memmap`). ## License diff --git a/src/cell.rs b/src/cell.rs index e6dde73..1e262db 100755 --- a/src/cell.rs +++ b/src/cell.rs @@ -1,6 +1,6 @@ -//! This has two different types representing H3 indices is slightly +//! This has two different types representing H3 indices in slightly //! different ways, [Index] & [Cell]. Index is lower level and allows -//! you create invalid H3 indices. Cell is higher level and enforces +//! you to create invalid H3 indices. Cell is higher level and enforces //! invariants. use crate::{Error, Result}; @@ -8,7 +8,7 @@ use std::{convert::TryFrom, fmt}; /// A low-level type for H3 [index manipulation]. /// -/// Node that all setters take consume `self` and return a new +/// Note that all setters consume `self` and return a new /// `Index`. /// /// [index manipulation]: https://observablehq.com/@nrabinowitz/h3-index-bit-layout?collection=@nrabinowitz/h3 @@ -112,7 +112,7 @@ impl Index { } } - /// Consumes `self` and returns a new Index with it's resolution + /// Consumes `self` and returns a new Index with its resolution /// `res` digit set to `digit`. /// /// This function does not check `res` nor `digit` for validity @@ -129,7 +129,10 @@ impl Index { } } -/// [HexTreeMap][crate::HexTreeMap]'s key type. +/// A validated H3 cell index. +/// +/// This is the key type for [HexTreeMap][crate::HexTreeMap]. A `Cell` +/// is guaranteed to be a valid H3 cell (mode 1 index). #[derive(Clone, Copy, Eq, Hash, PartialEq)] #[cfg_attr( feature = "serde", @@ -153,7 +156,7 @@ impl Cell { if // reserved must be 0 !idx.reserved() && - // we only care about mode 1 (cell) indicies + // we only care about mode 1 (cell) indices idx.mode() == 1 && // there are only 122 base cells idx.base() < 122 @@ -172,8 +175,9 @@ impl Cell { /// Returns this cell's parent at the specified resolution. /// - /// Returns Some if `res` is less-than or equal-to this cell's - /// resolution, otherwise returns None. + /// Returns `Some` if `res` is less than or equal to this cell's + /// resolution. Returns `None` if `res` is greater than this cell's + /// resolution (you cannot get a higher-resolution parent). #[inline] pub const fn to_parent(&self, res: u8) -> Option { match self.res() { @@ -203,12 +207,12 @@ impl Cell { Index(self.0).res() } - /// Returns true if `self` is related to `other`. + /// Returns `true` if this cell is related to another cell. /// - /// "Related" can be any of the following: - /// - `self` == `other` - /// - `self` is a parent cell of `other` - /// - `other` is a parent cell of `self` + /// Two cells are related if they share a parent-child relationship: + /// - `self` and `other` are the same cell, or + /// - `self` is an ancestor (parent, grandparent, etc.) of `other`, or + /// - `other` is an ancestor of `self` #[inline] pub fn is_related_to(&self, other: &Self) -> bool { let common_res = std::cmp::min(self.res(), other.res()); @@ -238,7 +242,7 @@ impl TryFrom for Cell { } } -/// A type for building up Cells in an iterative matter when +/// A type for building up Cells in an iterative manner when /// tree-walking. pub(crate) struct CellStack(Option); @@ -282,7 +286,7 @@ impl CellStack { } } - /// If self currency contains a cell, this replaces the digit at + /// If self currently contains a cell, this replaces the digit at /// its current res and returns what was there. If self is empty, /// nothing is replaced and None is returned. pub fn swap(&mut self, digit: u8) -> Option { diff --git a/src/compaction.rs b/src/compaction.rs index 29d9e8e..28114d2 100644 --- a/src/compaction.rs +++ b/src/compaction.rs @@ -1,21 +1,27 @@ -//! User pluggable compaction. +//! User-pluggable compaction strategies. +//! +//! Compaction allows the tree to automatically coalesce child cells into +//! their parent when certain conditions are met, reducing memory usage +//! and improving query performance. use crate::Cell; -/// A user provided compactor. +/// A user-provided compactor. /// -/// The compactor trait allows you customize compaction behavior after +/// The compactor trait allows you to customize compaction behavior after /// calling `insert` on a tree. pub trait Compactor { /// Called after every insert into a non-leaf node. /// - /// Given an intermediate (not-leaf) node's cell and up to 7 + /// Given an intermediate (non-leaf) node's cell and up to 7 /// children, you can choose to leave the node alone by returning - /// `None`, or turn it into a leaf-node by return `Some(value)`. + /// `None`, or turn it into a leaf node by returning `Some(value)`. fn compact(&mut self, cell: Cell, children: [Option<&V>; 7]) -> Option; } -/// Does not perform any compaction. +/// A compactor that performs no compaction. +/// +/// This is the default compactor and leaves all inserted cells as-is. #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)] #[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))] pub struct NullCompactor; @@ -26,7 +32,11 @@ impl Compactor for NullCompactor { } } -/// Compacts when all children are complete. +/// A compactor that coalesces nodes when all 7 children are present. +/// +/// This is typically used with `HexTreeSet` (where values are `()`). +/// When all 7 children of a node are complete, they are replaced with +/// a single parent cell. #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)] #[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))] pub struct SetCompactor; @@ -41,7 +51,11 @@ impl Compactor<()> for SetCompactor { } } -/// Compacts when all children are complete and have the same value. +/// A compactor that coalesces nodes when all 7 children have equal values. +/// +/// When all 7 children of a node are present and have the same value, +/// they are replaced with a single parent cell containing that value. +/// This is useful for maps where large contiguous regions share the same value. #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)] #[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))] pub struct EqCompactor; diff --git a/src/disktree/mod.rs b/src/disktree/mod.rs index dd61ed4..7555156 100644 --- a/src/disktree/mod.rs +++ b/src/disktree/mod.rs @@ -1,4 +1,8 @@ -//! An on-disk hextree. +//! On-disk memory-mapped storage for HexTree. +//! +//! DiskTree provides a serialized, memory-mapped representation of a HexTreeMap, +//! allowing you to store and query very large trees without loading them entirely +//! into memory. #[cfg(not(target_pointer_width = "64"))] compile_warning!("disktree may silently fail on non-64bit systems"); @@ -36,7 +40,7 @@ mod tests { } // Construct map with a compactor that automatically combines - // cells with the same save value. + // cells with the same value. let mut monaco = HexTreeMap::with_compactor(EqCompactor); // Now extend the map with cells and a region value. @@ -190,7 +194,7 @@ mod tests { } // Construct map with a compactor that automatically combines - // cells with the same save value. + // cells with the same value. let mut monaco = HexTreeMap::new(); // Now extend the map with cells and a region value. @@ -204,7 +208,7 @@ mod tests { .unwrap(); let monaco_disktree = DiskTreeMap::open(path).unwrap(); - // Create the iterator with the user-defined deserialzer. + // Create the iterator with the user-defined deserializer. let disktree_iter = monaco_disktree.iter().unwrap(); let start = std::time::Instant::now(); let mut disktree_collection = Vec::new(); @@ -294,7 +298,7 @@ mod tests { assert_eq!( leaf_vec.len(), 1, - "Iterator must have extactly one element for a leaf" + "Iterator must have exactly one element for a leaf" ); assert_eq!(hextree_leaf, leaf_vec[0].0); } diff --git a/src/disktree/tree.rs b/src/disktree/tree.rs index b4049b0..43818af 100755 --- a/src/disktree/tree.rs +++ b/src/disktree/tree.rs @@ -16,7 +16,10 @@ use std::{ pub(crate) const HDR_MAGIC: &[u8] = b"hextree\0"; pub(crate) const HDR_SZ: usize = HDR_MAGIC.len() + 1; -/// An on-disk hextree map. +/// A memory-mapped, on-disk HexTreeMap. +/// +/// This structure provides read-only access to a HexTreeMap that has +/// been serialized to disk. pub struct DiskTreeMap(pub(crate) Box + Send + Sync + 'static>); impl DiskTreeMap { diff --git a/src/hex_tree_map.rs b/src/hex_tree_map.rs index 5c0af28..85dd9d7 100644 --- a/src/hex_tree_map.rs +++ b/src/hex_tree_map.rs @@ -42,7 +42,7 @@ use std::{cmp::PartialEq, iter::FromIterator}; /// } /// /// // Construct map with a compactor that automatically combines -/// // cells with the same save value. +/// // cells with the same value. /// let mut monaco = HexTreeMap::with_compactor(EqCompactor); /// /// // Now extend the map with cells and a region value. @@ -66,7 +66,7 @@ use std::{cmp::PartialEq, iter::FromIterator}; pub struct HexTreeMap { /// All h3 0 base cell indices in the tree pub(crate) nodes: Box<[Option>>]>, - /// User-provided compator. Defaults to the null compactor. + /// User-provided compactor. Defaults to the null compactor. compactor: C, } @@ -121,7 +121,7 @@ impl HexTreeMap { /// `self`. /// /// This method is useful if you want to use one compaction - /// strategy for creating an initial, then another one for updates + /// strategy for creating an initial tree, then another one for updates /// later. pub fn replace_compactor(self, new_compactor: NewC) -> HexTreeMap { HexTreeMap { @@ -130,12 +130,11 @@ impl HexTreeMap { } } - /// Returns the number of H3 cells in the set. + /// Returns the number of H3 cells in the map. /// - /// This method only considers complete, or leaf, cells in the - /// set. Due to automatic compaction, this number may be - /// significantly smaller than the number of source cells used to - /// create the set. + /// This method only counts leaf cells (complete entries) in the + /// map. Due to automatic compaction, this number may be + /// significantly smaller than the number of cells originally inserted. pub fn len(&self) -> usize { self.nodes.iter().flatten().map(|node| node.len()).sum() } @@ -145,17 +144,15 @@ impl HexTreeMap { self.len() == 0 } - /// Returns `true` if the set fully contains `cell`. + /// Returns `true` if the map fully contains `cell`. /// - /// This method will return `true` if any of the following are - /// true: + /// This method returns `true` if any of the following are true: /// - /// 1. There was an earlier [insert][Self::insert] call with - /// precisely this target cell. - /// 2. Several previously inserted cells coalesced into - /// precisely this target cell. - /// 3. The set contains a complete (leaf) parent of this target - /// cell due to 1 or 2. + /// 1. This exact cell was previously inserted. + /// 2. Several previously inserted cells were compacted into + /// this cell as their parent. + /// 3. The map contains a parent of this cell (due to 1 or 2), + /// meaning this cell inherits its parent's value. pub fn contains(&self, cell: Cell) -> bool { let base_cell = cell.base(); match self.nodes[base_cell as usize].as_ref() { @@ -167,11 +164,11 @@ impl HexTreeMap { } } - /// Returns a reference to the value corresponding to the given - /// target cell or one of its parents. + /// Returns a reference to the value for the given cell or its nearest parent. /// - /// Note that this method also returns a Cell, which may be a - /// parent of the target cell provided. + /// Returns `Some((cell, value))` where `cell` is either the queried cell + /// or a parent cell that contains it. Returns `None` if no matching cell + /// or parent is found. #[inline] pub fn get(&self, cell: Cell) -> Option<(Cell, &V)> { match self.get_raw(cell) { @@ -192,11 +189,11 @@ impl HexTreeMap { } } - /// Returns a mutable reference to the value corresponding to the - /// given target cell or one of its parents. + /// Returns a mutable reference to the value for the given cell or its nearest parent. /// - /// Note that this method also returns a Cell, which may be a - /// parent of the target cell provided. + /// Returns `Some((cell, value))` where `cell` is either the queried cell + /// or a parent cell that contains it. Returns `None` if no matching cell + /// or parent is found. #[inline] pub fn get_mut(&mut self, cell: Cell) -> Option<(Cell, &mut V)> { match self.get_raw_mut(cell) { @@ -242,7 +239,7 @@ impl HexTreeMap { crate::iteration::IterMut::new(&mut self.nodes, CellStack::new()) } - /// An iterator visiting the specified cell or its children + /// An iterator visiting the specified cell or its children with /// references to the values. pub fn descendants(&self, cell: Cell) -> impl Iterator { let base_cell = cell.base(); diff --git a/src/hex_tree_set.rs b/src/hex_tree_set.rs index f4eeaa3..1e74010 100644 --- a/src/hex_tree_set.rs +++ b/src/hex_tree_set.rs @@ -2,7 +2,7 @@ use crate::{compaction::SetCompactor, Cell, HexTreeMap}; use std::iter::FromIterator; /// A HexTreeSet is a structure for representing geographical regions -/// and efficiently testing performing hit-tests on that region. Or, +/// and efficiently performing hit-tests on that region. Or, /// in other words: I have a region defined; does it contain this /// point on earth? /// diff --git a/src/lib.rs b/src/lib.rs index fe2c9ea..face5c6 100755 --- a/src/lib.rs +++ b/src/lib.rs @@ -15,5 +15,5 @@ mod node; pub use crate::{cell::Cell, hex_tree_map::HexTreeMap, hex_tree_set::HexTreeSet}; pub use error::{Error, Result}; -#[cfg(feature = "disktree")] -pub use memmap; +#[cfg(feature = "serde")] +pub use serde;