draft: RFC for `Horizontal scalability for ahnlich` by jimezesinachi · Pull Request #276 · deven96/ahnlich

jimezesinachi · 2025-11-17T10:00:24Z

Issue - Horizontal scalability for ahnlich

Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>

github-actions · 2025-11-17T10:15:47Z

Test Results

0 tests - 264 0 ✅ - 264 0s ⏱️ - 11m 8s
0 suites - 35 0 💤 ± 0
0 files - 4 0 ❌ ± 0

Results for commit 7f393b2. ± Comparison against base commit 7fdb808.

♻️ This comment has been updated with latest results.

github-actions · 2025-11-17T10:24:14Z

Benchmark Results

group                                                        main                                   pr
-----                                                        ----                                   --
predicate_query_with_index/size_100                          1.03      3.1±0.01µs        ? ?/sec    1.00      3.0±0.00µs        ? ?/sec
predicate_query_with_index/size_1000                         1.03     33.8±0.08µs        ? ?/sec    1.00     32.8±0.02µs        ? ?/sec
predicate_query_with_index/size_10000                        1.02    393.1±0.16µs        ? ?/sec    1.00    386.0±0.55µs        ? ?/sec
predicate_query_with_index/size_100000                       1.01      4.9±0.00ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
predicate_query_without_index/size_100                       1.00      7.2±0.00µs        ? ?/sec    1.00      7.2±0.03µs        ? ?/sec
predicate_query_without_index/size_1000                      1.00     99.3±0.09µs        ? ?/sec    1.01    100.3±0.26µs        ? ?/sec
predicate_query_without_index/size_10000                     1.07    836.3±3.19µs        ? ?/sec    1.00    779.2±3.22µs        ? ?/sec
predicate_query_without_index/size_100000                    1.11     13.6±0.10ms        ? ?/sec    1.00     12.3±0.06ms        ? ?/sec
store_batch_insertion_without_predicates/size_100            1.02    229.6±1.39µs        ? ?/sec    1.00    226.2±1.71µs        ? ?/sec
store_batch_insertion_without_predicates/size_1000           1.02   1234.6±8.07µs        ? ?/sec    1.00   1213.0±9.82µs        ? ?/sec
store_batch_insertion_without_predicates/size_10000          1.00     13.0±0.08ms        ? ?/sec    1.01     13.1±0.05ms        ? ?/sec
store_batch_insertion_without_predicates/size_100000         1.00    129.3±0.43ms        ? ?/sec    1.01    131.2±0.27ms        ? ?/sec
store_retrieval_no_condition/size_100                        1.03    112.7±0.57µs        ? ?/sec    1.00    109.5±1.04µs        ? ?/sec
store_retrieval_no_condition/size_1000                       1.01    779.5±9.55µs        ? ?/sec    1.00   773.5±12.64µs        ? ?/sec
store_retrieval_no_condition/size_10000                      1.01      7.1±0.04ms        ? ?/sec    1.00      7.0±0.02ms        ? ?/sec
store_retrieval_no_condition/size_100000                     1.00     78.3±0.22ms        ? ?/sec    1.00     78.3±0.17ms        ? ?/sec
store_retrieval_non_linear_kdtree/size_100                   1.00    193.5±0.42µs        ? ?/sec    1.00    193.2±0.29µs        ? ?/sec
store_retrieval_non_linear_kdtree/size_1000                  1.00   1147.0±2.80µs        ? ?/sec    1.00   1142.8±1.75µs        ? ?/sec
store_retrieval_non_linear_kdtree/size_10000                 1.00     11.3±0.01ms        ? ?/sec    1.02     11.6±0.15ms        ? ?/sec
store_retrieval_non_linear_kdtree/size_100000                1.00    134.9±0.12ms        ? ?/sec    1.00    135.6±0.07ms        ? ?/sec
store_sequential_insertion_without_predicates/size_100       1.00    273.7±0.54µs        ? ?/sec    1.00    272.3±0.20µs        ? ?/sec
store_sequential_insertion_without_predicates/size_1000      1.00      2.7±0.01ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
store_sequential_insertion_without_predicates/size_10000     1.00     26.9±0.18ms        ? ?/sec    1.00     26.9±0.12ms        ? ?/sec
store_sequential_insertion_without_predicates/size_100000    1.00    268.5±0.30ms        ? ?/sec    1.01    270.5±0.91ms        ? ?/sec

ahnlich/replication-rfc.md

Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>

deven96

So far so good I think this is huge progress already

Particularly liking the test setup to ensure quorum... Don't think those got exercised to the point where a snapshot was triggered though i.e we've had so many writes that we trigger an async snapshot savepoint

deven96 · 2026-02-21T17:04:37Z

ahnlich/replication-rfc.md

+Both DB and AI CLIs now include cluster flags:
+
+- `--cluster-enabled`
+- `--raft-node-id`


Curious why this is surfaced... Aren't these supposed to be implicitly assigned? i.e when you join a cluster you get a node ID?

In theory, yes. But we will have to set up some type of Raft config persistence that keeps track of which ID we gave to an ahnlich node on a cluster, so that it can rejoin the cluster if it ever got disconnected, and I was thinking maybe that's out of scope for ahnlich

deven96 · 2026-02-21T17:06:18Z

ahnlich/replication-rfc.md

+- `--admin-addr`
+- `--raft-storage` (`memory` or `rocksdb`)
+- `--raft-data-dir` (required when `--raft-storage rocksdb`)
+- `--raft-snapshot-logs`


Could we perhaps get short descriptions for these CLI arguments. I personally would like to know snapshot logs vs storage

Makes sense. I think some of the names can be better, after I add the descriptions, you can look and see if we should change any of the names

deven96 · 2026-02-21T17:08:44Z

ahnlich/replication-rfc.md

+
+### Read semantics
+
+In cluster mode, read RPCs call `ensure_linearizable()`. Reads are leader-gated and followers can return `UNAVAILABLE`/forwarding-related errors.


Ah interesting... so even reads come through the leader?

At the moment yes, as a way to ensure consistency before reads and no "stale" reads, but what do you think?

deven96 · 2026-02-21T17:14:31Z

ahnlich/replication/build.rs

+        .expect("no workspace root");
+    let proto_dir = workspace_root.join("protos");
+
+    let cluster_admin = proto_dir.join("services/cluster_admin.proto");


Ensured that these don't get copied with the public services when we export SDKs for other languages?

Not yet, will do so now. Is it okay if I put the new raft protos in a new location (inside the new replication crate), so they are localised to where they're used?

deven96 · 2026-02-21T17:22:17Z

ahnlich/replication/src/storage/mod.rs

+}
+
+#[derive(Debug)]
+struct MemLogStoreInner<C: RaftTypeConfig> {


I think it's fine to not offer this at all.. clusters should inherently mean you want more reliability , so in-memory feels counter-intuitive for this

Makes sense. I was adding it as a test option more than anything else, but we can remove it completely if necessary. Is removing it from the public selection/CLI API and just leaving it in for tests/mocking okay?

deven96 · 2026-02-21T17:55:21Z

ahnlich/db/src/cli/server.rs

+    #[arg(long, requires = "cluster_enabled")]
+    pub raft_node_id: u64,
+    /// Raft internal address host:port
+    #[arg(long, requires = "cluster_enabled")]


I think both addresses should be optional

See here: #276 (comment)

deven96 · 2026-02-21T17:57:08Z

ahnlich/db/src/cli/server.rs

+    /// Snapshot after N logs
+    #[arg(long, default_value_t = 1000, requires = "cluster_enabled")]
+    pub raft_snapshot_logs: u64,
+    /// Join existing cluster via admin addr host:port


Would this be necessary if we already have a specified admin_addr default where we try to first join or then create a cluster?

Hmmm, it won't, but at the moment, the reasoning is that we don't specify admin_addr

deven96 · 2026-02-21T17:59:17Z

ahnlich/ai/Cargo.toml

 directml = ["ort/directml"]

 [dev-dependencies]
+bincode.workspace = true


We should replace bincode from the workspace entirely as it isn't used elsewhere

Makes sense, I'll look into it

deven96 · 2026-02-21T18:04:40Z

ahnlich/ai/src/server/handler.rs

+                    ClusterAdminServiceClient::connect(format!("http://{join}")).await
+                {
+                    let _ = client
+                        .add_learner(tonic::Request::new(AddLearnerRequest { node: Some(node) }))


Is joining distinct from all the other tasks we are spawning here? And if so what happens if the join fails? Seems we are ignoring

Good catch, the order is wrong. Joining is supposed to be tried first if passed, and if it fails, none of the others should spawn

deven96 · 2026-02-21T18:07:43Z

ahnlich/ai/src/engine/store.rs

+        }
+        let snap_guard = stores_snapshot.guard();
+        for (key, value) in stores_snapshot.iter(&snap_guard) {
+            let _ = self.stores.try_insert(key.clone(), value.clone(), &guard);


We should probably not ignore these errors... i.e atleast aggressively error log

Makes sense, I'll get on it

added RFC

6b1503a

Signed-off-by: Jim Ezesinachi <ezesinachijim@gmail.com>

jimezesinachi changed the title ~~draft: RFC for [Horizontal scalability for ahnlich](https://github.com/deven96/ahnlich/issues/271)~~ draft: RFC for Horizontal scalability for ahnlich Nov 17, 2025