-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy patharcvix-octad-data-model.tex
More file actions
1626 lines (1359 loc) · 71.8 KB
/
arcvix-octad-data-model.tex
File metadata and controls
1626 lines (1359 loc) · 71.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% SPDX-License-Identifier: PMPL-1.0-or-later
% arcvix-octad-data-model.tex — arXiv-style academic paper on the Octad Data Model
% VeriSimDB: Drift-Aware Multi-Modal Data Identity in Federated Systems
%
% Author: Jonathan D.A. Jewell
% Affiliation: Independent Researcher / Hyperpolymath
% Date: March 2026
\documentclass[11pt,twocolumn]{article}
% ── Packages ──────────────────────────────────────────────────────────────────
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{mathtools}
\usepackage{stmaryrd} % ⟦ ⟧ semantic brackets
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{cleveref}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{tikz}
\usetikzlibrary{arrows.meta,positioning,shapes.geometric,calc,fit}
\usepackage[margin=1in]{geometry}
\usepackage{microtype}
\usepackage{enumitem}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{fancyhdr}
\usepackage{xspace}
% ── Theorem environments ──────────────────────────────────────────────────────
\newtheorem{definition}{Definition}[section]
\newtheorem{theorem}{Theorem}[section]
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{invariant}{Invariant}[section]
\newtheorem{property}{Property}[section]
% ── Macros ────────────────────────────────────────────────────────────────────
\newcommand{\verisimdb}{\textsc{VeriSimDB}\xspace}
\newcommand{\octad}{\mathcal{O}}
\newcommand{\modset}{\mathcal{M}}
\newcommand{\driftfn}{\Delta}
\newcommand{\identity}{\mathcal{I}}
\newcommand{\repairfn}{\mathcal{R}}
\newcommand{\policyfn}{\mathcal{P}}
\newcommand{\queryfn}{\mathcal{Q}}
\newcommand{\sembrack}[1]{\llbracket #1 \rrbracket}
% ── Listing style ─────────────────────────────────────────────────────────────
\definecolor{codebg}{gray}{0.96}
\definecolor{codegreen}{rgb}{0.0,0.5,0.0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codeblue}{rgb}{0.1,0.2,0.6}
\lstset{
basicstyle=\ttfamily\footnotesize,
backgroundcolor=\color{codebg},
keywordstyle=\color{codeblue}\bfseries,
commentstyle=\color{codegreen},
stringstyle=\color{codegray},
numbers=left,
numberstyle=\tiny\color{codegray},
breaklines=true,
frame=single,
framesep=3pt,
xleftmargin=1.5em,
framexleftmargin=1.5em,
captionpos=b
}
% ── Title ─────────────────────────────────────────────────────────────────────
\title{%
\textbf{The Octad Data Model: Drift-Aware Multi-Modal\\
Data Identity in Federated Systems}%
}
\author{
Jonathan D.A.\ Jewell\\
\textit{Independent Researcher}\\
\texttt{j.d.a.jewell@open.ac.uk}
}
\date{March 2026}
% ══════════════════════════════════════════════════════════════════════════════
\begin{document}
\maketitle
% ── Abstract ──────────────────────────────────────────────────────────────────
\begin{abstract}
Modern data systems face a fundamental tension between the richness of
multi-modal representations and the coherence guarantees required by federated
environments. An entity may simultaneously be a node in a knowledge graph, a
dense vector embedding, a multi-dimensional tensor, a semantically typed
artefact, a full-text document, a versioned temporal object, a provenance-tracked
lineage chain, and a spatially located datum. Existing multi-model databases
typically store these facets in loosely coupled subsystems with no unified
identity or consistency semantics.
We present the \emph{Octad Data Model}, the core abstraction of \verisimdb, in
which every entity exists simultaneously across exactly eight \emph{modalities}
--- Graph, Vector, Tensor, Semantic, Document, Temporal, Provenance, and Spatial
--- bound to a single UUID-based identity. Rather than enforcing strict
consistency through conflict-free replicated data types or blockchain-style
consensus, \verisimdb treats cross-modal \emph{drift} as a first-class concept:
drift is detected through statistical and formal methods, classified by type and
severity, and resolved through configurable policy-driven repair strategies that
range from fully automatic to human-supervised. We formalise the octad algebra,
define drift metrics across modality pairs, and describe a federated
verification protocol suitable for sovereign data stores that cannot be centrally
controlled. We report on a Rust-based implementation using Oxigraph, Tantivy,
HNSW indexing, and R-tree spatial indexing, and present experimental results on
drift detection accuracy and cross-modal query performance over synthetic and
real-world datasets.
\end{abstract}
\noindent\textbf{Keywords:}
multi-model databases, data identity, drift detection, federated systems,
knowledge graphs, vector search, temporal databases, provenance tracking
% ── 1. Introduction ──────────────────────────────────────────────────────────
\section{Introduction}\label{sec:intro}
The contemporary data landscape is characterised by an explosion of
heterogeneous representations. A single scientific publication may be
represented as an RDF triple in a bibliographic knowledge graph, a 768-dimensional
dense embedding in a semantic search index, a tensor of citation features in a
recommendation model, a full-text document in a search engine, a versioned
artefact in a temporal database, and a geolocated entity in a spatial index.
Each representation captures a distinct \emph{modality} of the same underlying
entity, yet in practice these modalities are stored in separate, independently
managed systems with no shared notion of identity or consistency.
This fragmentation creates three classes of problems:
\begin{enumerate}[leftmargin=*]
\item \textbf{Identity fragmentation.} The same entity acquires different
identifiers in each system, requiring brittle reconciliation logic.
\item \textbf{Semantic drift.} When one modality is updated (e.g., a
paper is retracted), other modalities may continue to serve stale or
contradictory information indefinitely.
\item \textbf{Federation barriers.} Sovereign institutions cannot share
data across organisational boundaries without either ceding control to a
central authority or accepting the chaos of uncoordinated peer-to-peer
replication.
\end{enumerate}
Existing multi-model databases --- ArangoDB~\cite{arangodb2023}, SurrealDB~\cite{surrealdb2024},
TerminusDB~\cite{terminusdb2023} --- address identity fragmentation to varying
degrees but treat consistency as a binary property: either strongly consistent
(via MVCC, Raft, or CRDTs) or eventually consistent with no formal drift
semantics. Decentralised systems such as IPFS~\cite{benet2014ipfs} and
Solid~\cite{sambra2016solid} provide content-addressable storage and
access-control mechanisms, but offer no cross-modal coherence guarantees.
We argue that the missing abstraction is \emph{drift-aware multi-modal
identity}: a data model in which (a) an entity's identity is preserved across
all representational modalities, (b) divergence between modalities is detected,
classified, and quantified continuously, and (c) resolution of detected drift is
governed by explicit, auditable policies rather than implicit convergence
assumptions.
This paper makes the following contributions:
\begin{enumerate}[leftmargin=*]
\item We formalise the \emph{Octad Data Model}, in which each entity exists
simultaneously across eight modalities, bound by a single identity and
subject to algebraic invariants (\cref{sec:octad}).
\item We define a \emph{drift calculus} with formal metrics for detecting
and classifying divergence across modality pairs (\cref{sec:drift}).
\item We describe a \emph{policy-driven repair framework} that supports
automatic, semi-automatic, and manual resolution strategies with immutable
audit trails (\cref{sec:repair}).
\item We present a \emph{cross-modal query} language that permits type-safe
access across modality boundaries (\cref{sec:query}).
\item We report on a production-quality implementation in Rust and Elixir,
together with experimental evaluation on drift detection accuracy and
query latency (\cref{sec:implementation,sec:evaluation}).
\end{enumerate}
The remainder of this paper is organised as follows. \Cref{sec:background}
surveys relevant background. \Cref{sec:octad} presents the formal octad model.
\Cref{sec:drift} defines drift detection and classification. \Cref{sec:repair}
describes policy-driven repair. \Cref{sec:query} introduces cross-modal
querying. \Cref{sec:implementation} details the implementation.
\Cref{sec:evaluation} reports experimental results. \Cref{sec:related} discusses
related work, and \cref{sec:conclusion} concludes.
% ── 2. Background ────────────────────────────────────────────────────────────
\section{Background}\label{sec:background}
\subsection{Multi-Model Databases}
The term \emph{multi-model database} denotes a system that natively supports
more than one data model (e.g., document, graph, key-value) within a single
engine~\cite{lu2019multimodel}. ArangoDB~\cite{arangodb2023} combines
document, graph, and key-value access patterns over a shared storage engine.
SurrealDB~\cite{surrealdb2024} extends this with record links, full-text
search, and vector similarity. OrientDB provides document-graph duality.
These systems achieve multi-model access by embedding multiple query
interpreters atop a common storage layer, but they do not formalise the
relationship \emph{between} models for the same entity. An entity's graph
representation and its document representation are conceptually independent;
no mechanism detects when they diverge.
\subsection{Graph Databases and Knowledge Graphs}
RDF-based systems (e.g., Apache Jena, Oxigraph~\cite{oxigraph2023},
Blazegraph) and property-graph systems (e.g., Neo4j, JanusGraph) represent
entities as nodes connected by typed edges. SPARQL and Cypher provide
declarative query interfaces. While graph representations excel at capturing
relational structure, they lack native support for dense vector retrieval,
temporal versioning, or spatial indexing, requiring external systems for these
modalities.
\subsection{Vector Stores and Dense Retrieval}
The rise of transformer-based embeddings has driven the development of
purpose-built vector databases: Pinecone, Weaviate, Milvus, Qdrant, and
Chroma. These systems index high-dimensional vectors using approximate
nearest-neighbour algorithms such as HNSW~\cite{malkov2018hnsw} or
IVF~\cite{jegou2011pq}. While some offer metadata filtering, they provide
no formal mechanism for detecting when an embedding becomes stale relative
to the source document it was derived from.
\subsection{Temporal Databases and Version Control}
Temporal databases~\cite{snodgrass1999temporal} maintain the history of data
changes, supporting time-travel queries (``what was the value at time $t$?'')
and bitemporal models (transaction time vs.\ valid time). TerminusDB~\cite{terminusdb2023}
applies version-control semantics (branches, merges, diffs) to graph data.
Dolt provides git-style versioning for relational tables. However, temporal
guarantees in these systems apply within a single modality; cross-modal
temporal coherence (e.g., ensuring a vector embedding corresponds to the
correct document version) is left to application logic.
\subsection{Provenance and Lineage Tracking}
The W3C PROV model~\cite{w3cprov2013} defines a vocabulary for describing the
provenance of data entities, activities, and agents. Systems such as
Apache Atlas and Amundsen provide metadata lineage for data warehouses.
Content-addressable storage (IPFS~\cite{benet2014ipfs}, Git) ensures
immutability through cryptographic hashing. These approaches track
\emph{where data came from} but do not address \emph{whether} concurrent
modality representations remain consistent with the provenance record.
\subsection{Spatial Databases}
PostGIS, SpatiaLite, and H3~\cite{uber2018h3} provide geospatial indexing
using R-trees, quadtrees, or hexagonal hierarchical grids. The OGC Simple
Features standard and WGS84 coordinate reference system are widely adopted.
Spatial databases rarely integrate with graph or vector retrieval, creating
yet another silo.
\subsection{FAIR Principles and Federated Data}
The FAIR principles~\cite{wilkinson2016fair} (Findable, Accessible,
Interoperable, Reusable) have become the standard for open-science data
management. Solid~\cite{sambra2016solid} provides user-controlled data pods
with linked-data access control. The European Open Science Cloud (EOSC)
and GAIA-X pursue federated data spaces. These initiatives address access
control and discoverability but lack formal models for cross-modal identity
and drift.
% ── 3. The Octad Data Model ──────────────────────────────────────────────────
\section{The Octad Data Model}\label{sec:octad}
\subsection{Design Rationale}
The central observation motivating the octad model is that multi-modal data
representations are not independent: they are \emph{projections} of a single
underlying entity onto different representational spaces. A knowledge graph
edge stating ``paper $p$ cites paper $q$'' and a vector embedding of $p$ that
places it near $q$ in embedding space are not independent facts --- they are
correlated views that should remain mutually consistent. The octad model
makes this correlation explicit and enforceable.
We identify eight modalities as the minimal complete basis for contemporary
data-intensive systems. This choice is informed by a survey of production
workloads spanning scientific publishing, geospatial intelligence, financial
compliance, and AI/ML pipelines. While any finite enumeration is necessarily
incomplete, we demonstrate that these eight modalities cover the representational
needs of each surveyed domain.
\subsection{Formal Definition}
\begin{definition}[Modality Set]\label{def:modset}
The \emph{modality set} $\modset$ is the fixed eight-element set:
\[
\modset = \{
\mathsf{G}, \mathsf{V}, \mathsf{T}, \mathsf{S},
\mathsf{D}, \mathsf{P}, \mathsf{R}, \mathsf{X}
\}
\]
where $\mathsf{G}$ = Graph, $\mathsf{V}$ = Vector, $\mathsf{T}$ = Tensor,
$\mathsf{S}$ = Semantic, $\mathsf{D}$ = Document, $\mathsf{P}$ = Temporal,
$\mathsf{R}$ = Provenance, and $\mathsf{X}$ = Spatial.
\end{definition}
Each modality $m \in \modset$ has an associated \emph{modality space} $\Sigma_m$:
\begin{itemize}[leftmargin=*]
\item $\Sigma_\mathsf{G}$: the set of all labelled directed multigraphs
(RDF triples or property-graph subgraphs).
\item $\Sigma_\mathsf{V}$: $\mathbb{R}^d$ for a fixed embedding dimension $d$.
\item $\Sigma_\mathsf{T}$: the set of typed multi-dimensional arrays
$\bigcup_{n \geq 1} \mathbb{R}^{d_1 \times \cdots \times d_n}$.
\item $\Sigma_\mathsf{S}$: the set of CBOR-encoded semantic annotations,
comprising type URIs and proof blobs.
\item $\Sigma_\mathsf{D}$: the set of structured text documents with
inverted-index metadata.
\item $\Sigma_\mathsf{P}$: the set of Merkle-tree version histories over
octad snapshots.
\item $\Sigma_\mathsf{R}$: the set of SHA-256 hash chains recording
provenance events (creation, transformation, access).
\item $\Sigma_\mathsf{X}$: the set of geometric objects in a WGS84
coordinate reference system, indexed by R-tree.
\end{itemize}
\begin{definition}[Octad]\label{def:octad}
An \emph{octad} is a tuple $\octad = (\mathit{id}, \varphi)$ where:
\begin{itemize}[leftmargin=*]
\item $\mathit{id} \in \mathsf{UUID}$ is a universally unique 128-bit identifier.
\item $\varphi : \modset \to \bigcup_{m \in \modset} (\Sigma_m \cup \{\bot\})$
is a \emph{modality function} such that for each $m \in \modset$,
$\varphi(m) \in \Sigma_m \cup \{\bot\}$, where $\bot$ denotes the absence
of a representation in modality $m$.
\end{itemize}
\end{definition}
The modality function $\varphi$ maps each of the eight modalities to either a
concrete value in the corresponding modality space or the distinguished value
$\bot$. An octad may have $\bot$ in some modalities (e.g., an entity without
spatial coordinates), but its identity persists regardless.
\begin{definition}[Octad Store]\label{def:store}
An \emph{octad store} $\mathcal{S}$ is a partial function
$\mathcal{S} : \mathsf{UUID} \rightharpoonup \octad$ together with:
\begin{enumerate}[leftmargin=*]
\item A \emph{modality index} $\mathcal{I}_m$ for each $m \in \modset$,
providing efficient retrieval within that modality space.
\item A \emph{drift monitor} $\driftfn$ that continuously evaluates
cross-modal coherence (\cref{sec:drift}).
\item A \emph{repair engine} $\repairfn$ that resolves detected drift
according to policy (\cref{sec:repair}).
\end{enumerate}
\end{definition}
\subsection{Identity Invariants}
The octad model enforces two fundamental invariants:
\begin{invariant}[Identity Uniqueness]\label{inv:unique}
For any octad store $\mathcal{S}$ and identifiers $\mathit{id}_1 \neq \mathit{id}_2$,
$\mathcal{S}(\mathit{id}_1)$ and $\mathcal{S}(\mathit{id}_2)$ share no modality
representations. That is, if $\varphi_1(m) \neq \bot$ and $\varphi_2(m) \neq \bot$
for some $m$, then $\varphi_1(m) \neq \varphi_2(m)$.
\end{invariant}
\begin{invariant}[Identity Persistence]\label{inv:persist}
An octad's identifier $\mathit{id}$ is immutable. Mutations to $\varphi$ do
not change $\mathit{id}$; the temporal modality $\varphi(\mathsf{P})$ records the
full history of all mutations.
\end{invariant}
These invariants guarantee that an entity's identity is stable and
unambiguous across all modalities and across time.
\subsection{Modality Algebra}\label{sec:modality-algebra}
We define algebraic operations over octads to support composition, projection,
and merging.
\begin{definition}[Projection]\label{def:projection}
For an octad $\octad = (\mathit{id}, \varphi)$ and a subset
$M \subseteq \modset$, the \emph{projection} $\pi_M(\octad)$ is the tuple
$(\mathit{id}, \varphi|_M)$ where $\varphi|_M(m) = \varphi(m)$ if $m \in M$,
and $\varphi|_M(m) = \bot$ otherwise.
\end{definition}
\begin{definition}[Merge]\label{def:merge}
Given two octads $\octad_1 = (\mathit{id}, \varphi_1)$ and
$\octad_2 = (\mathit{id}, \varphi_2)$ sharing the same identifier, their
\emph{merge} $\octad_1 \oplus \octad_2 = (\mathit{id}, \varphi_\oplus)$ is
defined by:
\[
\varphi_\oplus(m) = \begin{cases}
\varphi_1(m) & \text{if } \varphi_2(m) = \bot \\
\varphi_2(m) & \text{if } \varphi_1(m) = \bot \\
\mathsf{resolve}(\varphi_1(m), \varphi_2(m)) & \text{otherwise}
\end{cases}
\]
where $\mathsf{resolve}$ is a policy-dependent conflict resolution function
(\cref{sec:repair}).
\end{definition}
\begin{proposition}[Merge Commutativity]
If $\mathsf{resolve}$ is commutative, then $\oplus$ is commutative:
$\octad_1 \oplus \octad_2 = \octad_2 \oplus \octad_1$.
\end{proposition}
\begin{proof}
Follows directly from the symmetry of the case analysis in
\cref{def:merge} and the commutativity assumption on $\mathsf{resolve}$.
\end{proof}
\begin{definition}[Enrichment]\label{def:enrichment}
An \emph{enrichment} of octad $\octad = (\mathit{id}, \varphi)$ at modality
$m$ is the operation $\mathsf{enrich}_m(\octad, v)$ producing
$(\mathit{id}, \varphi')$ where $\varphi'(m) = v \in \Sigma_m$ and
$\varphi'(m') = \varphi(m')$ for all $m' \neq m$. Every enrichment appends
a record to $\varphi'(\mathsf{P})$ and $\varphi'(\mathsf{R})$.
\end{definition}
This enrichment operation is the fundamental write primitive. It guarantees
that every modality mutation is reflected in both the temporal and provenance
modalities, maintaining the audit trail invariant.
\subsection{Cross-Modal Coherence Constraints}
Beyond identity invariants, the octad model supports user-defined
\emph{coherence constraints} that express expected relationships between
modalities.
\begin{definition}[Coherence Constraint]\label{def:coherence}
A \emph{coherence constraint} is a predicate
$c : \Sigma_{m_1} \times \Sigma_{m_2} \to \{0, 1\}$ for modalities
$m_1, m_2 \in \modset$ that must hold whenever both
$\varphi(m_1) \neq \bot$ and $\varphi(m_2) \neq \bot$.
\end{definition}
Examples of coherence constraints include:
\begin{itemize}[leftmargin=*]
\item The vector embedding $\varphi(\mathsf{V})$ must be derivable from
the document content $\varphi(\mathsf{D})$ via a specified embedding
function $f$: $\|\varphi(\mathsf{V}) - f(\varphi(\mathsf{D}))\|_2 < \epsilon$.
\item The graph modality $\varphi(\mathsf{G})$ must contain exactly the
relations declared in the semantic modality $\varphi(\mathsf{S})$.
\item The spatial modality $\varphi(\mathsf{X})$ must be consistent with
any geographic references in the document modality $\varphi(\mathsf{D})$.
\end{itemize}
Violation of a coherence constraint constitutes \emph{drift}, which we
formalise in the next section.
% ── 4. Drift Detection and Classification ────────────────────────────────────
\section{Drift Detection and Classification}\label{sec:drift}
\subsection{Drift as a First-Class Concept}
Traditional database systems treat inconsistency as an error condition to be
prevented through transactions, locks, or consensus protocols. In federated
multi-modal settings, this approach is both impractical and undesirable.
Modalities evolve at different rates: a document may be updated before its
embedding is recomputed; a spatial coordinate may be refined independently of
the underlying graph structure; a provenance chain may lag behind the current
state of the data.
\verisimdb adopts a fundamentally different stance: \emph{drift is not an
error; it is an observable quantity that can be measured, classified, and
managed according to policy.} This design choice is motivated by real-world
federated environments where strict consistency would require either
(a)~centralised coordination, which violates sovereignty, or (b)~blocking
updates until all modalities are synchronised, which is impractical for
large-scale systems.
\subsection{Formal Drift Metrics}
We define drift as a function of the divergence between modality
representations of the same octad.
\begin{definition}[Pairwise Drift]\label{def:pairwise-drift}
For modalities $m_1, m_2 \in \modset$ and an octad $\octad = (\mathit{id}, \varphi)$,
the \emph{pairwise drift} is:
\[
\driftfn_{m_1, m_2}(\octad) = d_{m_1, m_2}(\varphi(m_1), \varphi(m_2))
\]
where $d_{m_1, m_2} : \Sigma_{m_1} \times \Sigma_{m_2} \to \mathbb{R}_{\geq 0}$
is a modality-pair-specific distance function, with $d_{m_1, m_2}(\cdot, \bot) =
d_{m_1, m_2}(\bot, \cdot) = 0$ by convention (absent modalities do not drift).
\end{definition}
\begin{definition}[Aggregate Drift]\label{def:agg-drift}
The \emph{aggregate drift} of an octad is the weighted sum over all modality
pairs:
\[
\driftfn(\octad) = \sum_{\{m_1, m_2\} \subseteq \modset}
w_{m_1, m_2} \cdot \driftfn_{m_1, m_2}(\octad)
\]
where $w_{m_1, m_2} \geq 0$ are policy-configured weights satisfying
$\sum w_{m_1, m_2} = 1$.
\end{definition}
For eight modalities, there are $\binom{8}{2} = 28$ distinct modality pairs.
In practice, not all pairs have meaningful distance functions; we define
metrics for the most important pairs.
\subsection{Modality-Pair Distance Functions}\label{sec:drift-metrics}
\paragraph{Vector--Document drift ($d_{\mathsf{V},\mathsf{D}}$).}
Given an embedding function $f : \Sigma_\mathsf{D} \to \mathbb{R}^d$:
\[
d_{\mathsf{V},\mathsf{D}}(\vec{v}, \mathit{doc}) =
1 - \frac{\vec{v} \cdot f(\mathit{doc})}{\|\vec{v}\|\;\|f(\mathit{doc})\|}
\]
This is the cosine distance between the stored embedding and the embedding
that \emph{would} be computed from the current document content. A drift
value near 0 indicates that the embedding is fresh; a value near 1 indicates
that the document has changed substantially since the embedding was computed.
\paragraph{Graph--Semantic drift ($d_{\mathsf{G},\mathsf{S}}$).}
Let $E_\mathsf{G}$ be the set of typed edges in the graph modality and
$T_\mathsf{S}$ the set of type assertions in the semantic modality. Define:
\[
d_{\mathsf{G},\mathsf{S}}(g, s) =
1 - \frac{|E_\mathsf{G} \cap T_\mathsf{S}|}{|E_\mathsf{G} \cup T_\mathsf{S}|}
\]
This is the Jaccard distance between the structural assertions in the graph
and the ontological type assertions in the semantic modality.
\paragraph{Temporal--Provenance drift ($d_{\mathsf{P},\mathsf{R}}$).}
Let $H_\mathsf{P}$ be the Merkle tree of version history and $C_\mathsf{R}$
the SHA-256 hash chain of provenance events. Define:
\[
d_{\mathsf{P},\mathsf{R}}(h, c) =
\frac{|\mathsf{leaves}(H_\mathsf{P}) \setminus \mathsf{events}(C_\mathsf{R})|}{
|\mathsf{leaves}(H_\mathsf{P})|}
\]
This measures the fraction of version-history events not accounted for in the
provenance chain --- an indicator of provenance tracking lag.
\paragraph{Tensor--Vector drift ($d_{\mathsf{T},\mathsf{V}}$).}
When the vector modality is a flattened or projected view of the tensor:
\[
d_{\mathsf{T},\mathsf{V}}(t, \vec{v}) =
\frac{\|\mathsf{proj}(t) - \vec{v}\|_2}{\|\mathsf{proj}(t)\|_2}
\]
where $\mathsf{proj} : \Sigma_\mathsf{T} \to \mathbb{R}^d$ is the configured
projection function.
\paragraph{Document--Spatial drift ($d_{\mathsf{D},\mathsf{X}}$).}
Using named-entity recognition to extract geographic references $G_\mathsf{D}$
from the document and comparing with the spatial modality's coordinates
$p_\mathsf{X}$:
\[
d_{\mathsf{D},\mathsf{X}}(\mathit{doc}, p) =
\min_{g \in G_\mathsf{D}} \mathsf{haversine}(g, p_\mathsf{X})
\]
normalised to $[0, 1]$ by a configurable maximum distance threshold.
\subsection{Drift Classification}\label{sec:drift-class}
Not all drift is equal. We classify detected drift into four categories
based on its source and severity:
\begin{definition}[Drift Classes]\label{def:drift-classes}
\begin{enumerate}[leftmargin=*]
\item \textbf{Representational drift} occurs when a derived modality
becomes stale relative to its source (e.g., an embedding not yet
recomputed after a document update). This is typically benign and
automatically repairable.
\item \textbf{Semantic drift} occurs when the meaning of an entity
changes in one modality but not others (e.g., a retracted paper whose
graph edges still assert validity). This requires domain-aware repair.
\item \textbf{Structural drift} occurs when the topology of
relationships in the graph modality diverges from the structure implied
by other modalities (e.g., a document describes a citation that has
no corresponding graph edge).
\item \textbf{Provenance drift} occurs when the audit trail in the
provenance or temporal modalities fails to account for observed changes
in other modalities. This may indicate tampering or system failure.
\end{enumerate}
\end{definition}
The classification is computed by a rule engine that evaluates which modality
pairs exhibit drift and maps the pattern to a drift class. For instance,
drift exclusively in $d_{\mathsf{V},\mathsf{D}}$ is classified as
representational, while drift in $d_{\mathsf{G},\mathsf{S}}$ with concurrent
drift in $d_{\mathsf{P},\mathsf{R}}$ is classified as provenance drift
(indicating an untracked semantic change).
\subsection{Drift Detection Algorithm}
\begin{algorithm}[t]
\caption{Periodic Drift Detection}\label{alg:drift-detect}
\begin{algorithmic}[1]
\Require Octad store $\mathcal{S}$, sampling interval $\tau$,
thresholds $\theta_\mathsf{soft}, \theta_\mathsf{hard}$
\Loop
\State $\mathit{sample} \gets \mathsf{random\_sample}(\mathcal{S}, k)$
\For{$\octad \in \mathit{sample}$}
\For{$(m_1, m_2) \in \mathsf{active\_pairs}(\octad)$}
\State $\delta \gets \driftfn_{m_1,m_2}(\octad)$
\If{$\delta > \theta_\mathsf{hard}$}
\State $\mathsf{emit\_alert}(\octad, m_1, m_2, \mathsf{HARD})$
\ElsIf{$\delta > \theta_\mathsf{soft}$}
\State $\mathsf{mark\_drifted}(\octad, m_1, m_2)$
\EndIf
\EndFor
\EndFor
\State $\mathsf{sleep}(\tau)$
\EndLoop
\end{algorithmic}
\end{algorithm}
\Cref{alg:drift-detect} shows the core drift detection loop. The algorithm
operates by random sampling to amortise the cost of cross-modal comparisons
over large stores. The function $\mathsf{active\_pairs}(\octad)$ returns
only those modality pairs for which both modalities are non-$\bot$ and a
distance function is defined. Two thresholds govern the response: a soft
threshold triggers marking (the octad is flagged but no immediate action is
taken), and a hard threshold triggers an alert that initiates repair.
The sampling rate $k$ and interval $\tau$ are tunable: aggressive settings
($k = |\mathcal{S}|$, $\tau = 0$) provide continuous full-scan monitoring
at the cost of throughput, while conservative settings enable drift detection
with minimal overhead.
% ── 5. Policy-Driven Repair ──────────────────────────────────────────────────
\section{Policy-Driven Repair}\label{sec:repair}
\subsection{Repair Strategies}
Once drift is detected and classified, the system must determine how to
resolve it. We define three \emph{automation levels}:
\begin{definition}[Repair Automation Levels]
\begin{enumerate}[leftmargin=*]
\item \textbf{Automatic repair.} The system identifies the
\emph{authoritative modality} for the drifted pair and regenerates
the non-authoritative modality. Example: re-embedding a document
when $d_{\mathsf{V},\mathsf{D}}$ exceeds threshold.
\item \textbf{Semi-automatic repair.} The system proposes a repair
action and awaits confirmation from a designated \emph{domain custodian}.
Example: updating graph edges when a semantic annotation changes.
\item \textbf{Manual repair.} The system logs the drift event and
creates a review ticket. Example: resolving conflicting provenance
chains that may indicate data tampering.
\end{enumerate}
\end{definition}
\subsection{The Repair Policy Language}
Repair policies are expressed as rules mapping drift class $\times$ severity
to an automation level and a repair action. Formally:
\begin{definition}[Repair Policy]\label{def:repair-policy}
A \emph{repair policy} is a function:
\[
\policyfn : \mathsf{DriftClass} \times \mathbb{R}_{\geq 0} \to
\mathsf{AutoLevel} \times \mathsf{Action}
\]
where $\mathsf{DriftClass}$ is one of the four classes in
\cref{def:drift-classes}, $\mathsf{AutoLevel} \in \{
\mathsf{auto}, \mathsf{semi}, \mathsf{manual}\}$, and $\mathsf{Action}$
specifies the concrete repair operation.
\end{definition}
Policies are encoded in CBOR and stored in the semantic modality of a
designated \emph{policy octad}, itself subject to the same identity and
versioning invariants as any other octad. This design ensures that policy
changes are auditable and versioned.
\subsection{Authoritative Modality Selection}
When automatic repair is triggered, the system must determine which modality
is \emph{authoritative} --- i.e., which modality's current state should be
treated as ground truth.
\begin{definition}[Authority Function]\label{def:authority}
An \emph{authority function} $\alpha : \modset \times \modset \to \modset$
maps each modality pair to the authoritative member. The default authority
ordering is:
\[
\mathsf{D} > \mathsf{G} > \mathsf{S} > \mathsf{V} > \mathsf{T} >
\mathsf{X} > \mathsf{P} > \mathsf{R}
\]
where $m_1 > m_2$ means $m_1$ is authoritative over $m_2$.
\end{definition}
The rationale for this ordering is that human-authored modalities (documents,
graphs, semantic annotations) take precedence over derived modalities
(embeddings, tensors) and infrastructure modalities (temporal, provenance).
The ordering is configurable per octad store to accommodate domain-specific
requirements.
\subsection{Repair Execution}
\begin{algorithm}[t]
\caption{Policy-Driven Repair Execution}\label{alg:repair}
\begin{algorithmic}[1]
\Require Drifted octad $\octad$, modality pair $(m_1, m_2)$,
drift class $c$, drift value $\delta$
\State $(\mathit{level}, \mathit{action}) \gets \policyfn(c, \delta)$
\If{$\mathit{level} = \mathsf{auto}$}
\State $m_\alpha \gets \alpha(m_1, m_2)$
\State $m_\beta \gets (m_1, m_2) \setminus \{m_\alpha\}$
\State $v' \gets \mathsf{derive}(\varphi(m_\alpha), m_\beta)$
\State $\octad' \gets \mathsf{enrich}_{m_\beta}(\octad, v')$
\State $\mathsf{commit}(\octad')$
\ElsIf{$\mathit{level} = \mathsf{semi}$}
\State $\mathsf{propose\_repair}(\octad, m_1, m_2, \mathit{action})$
\State \textbf{await} custodian approval or timeout
\Else
\State $\mathsf{log\_for\_review}(\octad, m_1, m_2, c, \delta)$
\EndIf
\end{algorithmic}
\end{algorithm}
The $\mathsf{derive}$ function in \cref{alg:repair} is modality-pair-specific:
for $(\mathsf{D}, \mathsf{V})$, it invokes the embedding model; for
$(\mathsf{G}, \mathsf{S})$, it extracts typed edges from semantic annotations.
Every repair action, regardless of automation level, appends an entry to both
the temporal and provenance modalities.
\subsection{Immutable Audit Trail}
All repair actions produce an audit record:
\[
\mathsf{AuditRecord} = (
\mathit{timestamp},\;
\mathit{octad\_id},\;
\mathit{drift\_class},\;
\mathit{drift\_value},\;
\mathit{repair\_action},\;
\mathit{actor},\;
\mathit{prev\_hash}
)
\]
Records form a hash chain: $\mathit{hash}_i = \mathsf{SHA256}(
\mathit{AuditRecord}_i \| \mathit{hash}_{i-1})$. This chain is stored in
the provenance modality and is verifiable by any federated participant.
% ── 6. Cross-Modal Query ─────────────────────────────────────────────────────
\section{Cross-Modal Query}\label{sec:query}
\subsection{Query Model}
A key advantage of the octad model is that queries can \emph{span modality
boundaries} while maintaining type safety. We define a cross-modal query
language, VQL (VeriSim Query Language), that supports modality-specific
operations composed through a common identity layer.
\begin{definition}[Cross-Modal Query]\label{def:query}
A \emph{cross-modal query} $q$ is a composition of modality-specific
query fragments:
\[
q = \queryfn_{m_1}(\mathit{pred}_1) \bowtie_{\mathit{id}}
\queryfn_{m_2}(\mathit{pred}_2) \bowtie_{\mathit{id}} \cdots
\bowtie_{\mathit{id}} \queryfn_{m_k}(\mathit{pred}_k)
\]
where $\queryfn_{m_i}(\mathit{pred}_i)$ returns a set of octad identifiers
matching predicate $\mathit{pred}_i$ in modality $m_i$, and
$\bowtie_{\mathit{id}}$ is the natural join on octad identity.
\end{definition}
This formulation enables queries such as ``find all entities that are within
10km of London (spatial), were authored after 2020 (temporal), cite a
retracted paper (graph), and have a cosine similarity $> 0.9$ to a query
embedding (vector).''
\subsection{Query Planning}
The cross-modal query planner operates in three phases:
\begin{enumerate}[leftmargin=*]
\item \textbf{Decomposition.} The query is parsed into modality-specific
fragments. Each fragment is routed to the appropriate modality index.
\item \textbf{Selectivity estimation.} The planner estimates the
selectivity of each fragment using modality-specific statistics
(cardinality estimates for graph patterns, distance distributions
for vector queries, spatial density for R-tree bounds).
\item \textbf{Execution ordering.} Fragments are executed in order of
decreasing selectivity (most selective first) to minimise the
intermediate result set. The identity join is implemented as a
hash-based intersection of UUID sets.
\end{enumerate}
\subsection{VQL Syntax (Abbreviated)}
\begin{lstlisting}[language=SQL,caption={Cross-modal VQL query example}]
SELECT octad.id, octad.document.title,
octad.spatial.coordinates
FROM octads
WHERE octad.vector NEAR [0.1, 0.3, ...] LIMIT 100
AND octad.graph MATCHES
(?x :cites ?y WHERE ?y :status "retracted")
AND octad.spatial WITHIN radius(51.5, -0.12, 10km)
AND octad.temporal.created > "2020-01-01"
ORDER BY octad.vector.similarity DESC
\end{lstlisting}
The query in Listing~1 demonstrates the four-modality span: vector similarity
(line 4), graph pattern matching (lines 5--6), spatial containment (line 7),
and temporal filtering (line 8). The result includes document and spatial
projections of matching octads.
\subsection{Type Safety}
VQL is statically typed: each modality accessor (e.g., \texttt{octad.vector},
\texttt{octad.graph}) returns a value of the corresponding modality type.
The type checker rejects queries that apply modality-inappropriate operations
(e.g., applying \texttt{NEAR} to a document modality or \texttt{MATCHES} to
a vector modality). This type discipline is enforced at parse time, before
any query execution.
% ── 7. Federated Verification ────────────────────────────────────────────────
\section{Federated Verification}\label{sec:federation}
\subsection{Sovereignty Model}
In a federated deployment, each participating institution operates its own
octad store. \verisimdb does not assume a central coordinator or a shared
consensus protocol. Instead, federation is achieved through:
\begin{enumerate}[leftmargin=*]
\item \textbf{Shared identity namespace.} All federated stores use the
same UUID generation scheme, ensuring that an octad created at
institution A can be referenced by institution B without identifier
collision.
\item \textbf{Bilateral verification.} Two stores that wish to
interoperate exchange \emph{drift attestations}: signed statements of
the form ``octad $\mathit{id}$ has aggregate drift $\driftfn(\octad)
\leq \theta$ as of timestamp $t$, attested by store $S$.''
\item \textbf{Trust policies.} Each store independently configures
which remote attestations it accepts, from which attestors, and
subject to what freshness requirements.
\end{enumerate}
\subsection{Federation Protocol}
\begin{definition}[Drift Attestation]\label{def:attestation}
A \emph{drift attestation} is a signed tuple:
\[
A = (
\mathit{octad\_id},\;
\driftfn(\octad),\;
t,\;
\mathit{store\_id},\;
\mathsf{sign}(\mathit{octad\_id} \| \driftfn(\octad) \| t,\; \mathit{sk})
)
\]
where $\mathit{sk}$ is the attesting store's private key and $t$ is the
attestation timestamp.
\end{definition}
The federation protocol proceeds as follows:
\begin{enumerate}[leftmargin=*]
\item \textbf{Discovery.} Stores publish their public keys and endpoint
URIs to a shared registry (or discover them via DNS-based service
discovery).
\item \textbf{Attestation exchange.} When a store receives a query
referencing a remote octad, it requests a drift attestation from the
remote store.
\item \textbf{Verification.} The requesting store verifies the
attestation signature and checks that $\driftfn(\octad) \leq \theta$
where $\theta$ is the local trust policy's maximum acceptable drift.
\item \textbf{Conditional access.} If verification succeeds, the
requesting store proxies the query to the remote store. If it fails,
the query returns a \emph{drift warning} to the client, who may
choose to proceed with degraded trust or abort.
\end{enumerate}
This protocol is deliberately lightweight: it adds one round-trip (the
attestation request) to cross-store queries, and the attestation itself is
a compact signed message (approximately 200 bytes).
\subsection{Conflict Resolution in Federation}
When the same octad is hosted by multiple federated stores (replicated
octads), conflicts may arise. \verisimdb uses a \emph{last-writer-wins}
strategy by default, with the temporal modality providing a total order on
writes. Stores may optionally employ application-specific merge functions
via the $\mathsf{resolve}$ function in \cref{def:merge}.
\begin{theorem}[Attestation Soundness]\label{thm:attestation}
If a drift attestation $A$ is valid (signature verification succeeds and
$t$ is within the freshness window), then the attesting store's local
drift measurement of the referenced octad was at most $\driftfn(\octad)$
at time $t$.
\end{theorem}
\begin{proof}
Follows from the unforgeability of the signature scheme and the
deterministic computation of $\driftfn$. The attesting store computes
$\driftfn(\octad)$ at time $t$, signs the result, and publishes it.
A verifier who accepts the signature knows that the attesting store
produced this exact drift value. The freshness window guards against
replay of stale attestations.
\end{proof}
% ── 8. Implementation ────────────────────────────────────────────────────────
\section{Implementation}\label{sec:implementation}
\subsection{Architecture Overview}
\verisimdb is implemented as a two-tier system: a \emph{Rust core} providing
modality-specific storage and query engines, and an \emph{Elixir orchestration
layer} providing distributed coordination, drift monitoring, and query
routing. The two tiers communicate via HTTP and a C-ABI foreign function
interface for performance-critical paths.
\begin{table}[t]
\centering
\caption{Modality implementation summary}\label{tab:impl}
\begin{tabularx}{\columnwidth}{@{}lXl@{}}
\toprule
\textbf{Modality} & \textbf{Backend} & \textbf{Crate} \\
\midrule
Graph & Oxigraph (RDF/SPARQL) & \texttt{verisim-graph} \\
Vector & Custom HNSW & \texttt{verisim-vector} \\
Tensor & \texttt{ndarray} / Burn & \texttt{verisim-tensor} \\
Semantic & CBOR (via \texttt{ciborium}) & \texttt{verisim-semantic} \\
Document & Tantivy (inverted index) & \texttt{verisim-document} \\
Temporal & Merkle-tree snapshots & \texttt{verisim-temporal} \\
Provenance & SHA-256 hash chains & \texttt{verisim-provenance} \\
Spatial & R-tree (\texttt{rstar}) & \texttt{verisim-spatial} \\
\bottomrule
\end{tabularx}
\end{table}
\Cref{tab:impl} summarises the backend technology for each modality.
All crates are written in safe Rust, with \texttt{unsafe} blocks limited
to FFI boundaries.
\subsection{Rust Core}
The Rust core is organised as a Cargo workspace with one crate per modality,
plus cross-cutting crates for the octad abstraction (\texttt{verisim-octad}),
drift detection (\texttt{verisim-drift}), and self-normalisation
(\texttt{verisim-normalizer}).
\paragraph{Graph modality.}
Oxigraph provides an embedded RDF store with SPARQL 1.1 support.
\verisimdb extends Oxigraph with property-graph semantics by encoding
property-graph edges as reified RDF statements, enabling a single storage
backend to support both RDF and property-graph query patterns. The graph
modality supports both in-memory and persistent (file-backed) modes.
\paragraph{Vector modality.}
The HNSW index is implemented with configurable parameters
($M = 16$, $\mathit{ef}_\mathsf{construction} = 200$ by default).
Vectors are stored in memory-mapped files, enabling datasets larger than
available RAM. Distance functions include cosine similarity, Euclidean
distance, and inner product.
\paragraph{Tensor modality.}
Multi-dimensional arrays are stored using the \texttt{ndarray} crate for
CPU operations, with optional Burn integration for GPU-accelerated tensor
computations. Tensors are serialised in a custom binary format with
dimension metadata.
\paragraph{Semantic modality.}
Semantic annotations are CBOR-encoded structures containing type URIs
(linking to external ontologies), proof blobs (for formally verified
properties), and policy references. The \texttt{ciborium} crate provides
CBOR serialisation with schema validation.
\paragraph{Document modality.}
Tantivy provides full-text indexing with BM25 ranking, phrase queries,
and boolean operators. Documents are stored with field-level schema
(title, body, metadata) and support incremental index updates.
\paragraph{Temporal modality.}
Version history is maintained as a Merkle tree of octad snapshots.
Each node in the tree contains a cryptographic hash of the octad state
at that version, enabling efficient verification of version integrity
and time-travel queries of the form ``retrieve the octad as it existed
at timestamp $t$.''
\paragraph{Provenance modality.}
The provenance chain is a SHA-256 linked list of events. Each event
records the actor, action, timestamp, and the hash of the previous event.
The chain supports efficient verification: given any event, the entire
history back to the genesis event can be verified in $O(n)$ hash
computations.
\paragraph{Spatial modality.}
The R-tree index (via the \texttt{rstar} crate) supports WGS84
coordinates with point, bounding-box, and polygon geometries.
Supported queries include radius search, bounding-box containment,
and $k$-nearest-neighbour.
\subsection{Elixir Orchestration Layer}
The Elixir layer is built on OTP (Open Telecom Platform) and provides:
\begin{itemize}[leftmargin=*]
\item \textbf{Entity servers.} Each octad is managed by a GenServer
process that coordinates access across modalities. OTP supervision
ensures fault tolerance: if an entity server crashes, it is