Skip to content

Latest commit

 

History

History
654 lines (481 loc) · 17.2 KB

File metadata and controls

654 lines (481 loc) · 17.2 KB

VQL Formal Semantics

Table of Contents

1. Overview

This document specifies the formal operational semantics and static type system for VeriSim Query Language (VQL). VQL has two execution paths with different semantic guarantees:

  1. Dependent-Type Path - Queries with PROOF clause, formally verified via ZKP

  2. Slipstream Path - Queries without PROOF clause, fast but unverified

2. Notation

We use standard mathematical notation for formal semantics:

Notation Meaning

\(\Gamma\)

Type environment (context)

\(\Gamma \vdash e : \tau\)

Expression e has type \(\tau\) in context \(\Gamma\)

\(e \Rightarrow v\)

Expression e evaluates to value v (big-step semantics)

\(e \rightarrow e'\)

Expression e reduces to e' (small-step semantics)

\(\llbracket e \rrbracket_{\rho}\)

Denotational semantics of e in environment \(\rho\)

\(\pi : \phi\)

Proof term \(\pi\) for proposition \(\phi\)

\(\{x : \tau \mid \phi(x)\}\)

Refinement type: values of type \(\tau\) satisfying predicate \(\phi\)

3. Syntax (Abstract)

We define the abstract syntax of VQL (concrete syntax in vql-grammar.ebnf):

Query       ::= SELECT Modalities FROM Source [WHERE Condition]
                [PROOF ProofSpec] [LIMIT Int] [OFFSET Int]

Modalities  ::= * | Modality, ...
Modality    ::= GRAPH | VECTOR | TENSOR | SEMANTIC | DOCUMENT | TEMPORAL

Source      ::= HEXAD UUID
              | FEDERATION Pattern [WITH DRIFT DriftMode]
              | STORE ID

Condition   ::= SimpleCond | CompoundCond | (Condition)
SimpleCond  ::= GraphCond | VectorCond | TensorCond | ...
CompoundCond::= Condition AND Condition
              | Condition OR Condition
              | NOT Condition

ProofSpec   ::= ProofType(Contract) [WITH Params]
ProofType   ::= EXISTENCE | CITATION | ACCESS | INTEGRITY | PROVENANCE | CUSTOM

4. Type System (Static Semantics)

4.1. Base Types

VQL has the following base types:

τ ::= UUID                          -- 128-bit identifier
    | String                        -- UTF-8 string
    | Int                           -- Integer
    | Float                         -- IEEE 754 double
    | Bool                          -- Boolean
    | Vector[n]                     -- n-dimensional vector
    | Tensor[d₁, ..., dₙ]          -- n-dimensional tensor
    | Timestamp                     -- ISO 8601 datetime
    | RDFTriple                     -- (subject, predicate, object)
    | OctadRef                      -- Reference to a Octad
    | Modality                      -- GRAPH | VECTOR | TENSOR | ...
    | Proof[φ]                      -- Proof of proposition φ

4.2. Modal Type System

Each modality has an associated type projection:

\[\begin{aligned} \text{Graph}(h) &: \text{Set}(\text{RDFTriple}) \\ \text{Vector}(h) &: \text{Vector}[n] \\ \text{Tensor}(h) &: \text{Tensor}[d_1, \ldots, d_n] \\ \text{Semantic}(h) &: \text{Set}(\text{TypeAnnotation}) \\ \text{Document}(h) &: \text{String} \\ \text{Temporal}(h) &: \text{List}(\text{Version}) \end{aligned}\]

4.3. Octad Type

A Octad is a dependent product over modalities:

\[\text{Octad} = \{h : \text{UUID} \mid \exists m_1, \ldots, m_k \in \text{Modalities}. \text{Valid}(h, m_1) \land \cdots \land \text{Valid}(h, m_k)\}\]

Where \(\text{Valid}(h, m)\) asserts that octad h has a valid representation in modality m.

4.4. Refinement Types for Queries

4.4.1. Query Result Type

A query result has a refinement type that depends on the modalities selected:

\[\text{QueryResult}[\mathcal{M}] = \text{List}(\text{Octad}_\mathcal{M})\]

Where \(\mathcal{M} \subseteq \{\text{GRAPH}, \text{VECTOR}, \ldots\}\) and:

\[\text{Octad}_\mathcal{M} = \{h : \text{Octad} \mid \forall m \in \mathcal{M}. \text{Valid}(h, m)\}\]

4.4.2. Dependent-Type Query Result

For queries with PROOF clause, the result type includes a proof term:

\[\text{ProvedResult}[\mathcal{M}, \phi] = \{(r, \pi) : \text{QueryResult}[\mathcal{M}] \times \text{Proof}[\phi] \mid \pi : \phi(r)\}\]

Where \(\phi\) is the proposition specified by the PROOF clause.

4.5. Typing Rules

4.5.1. T-Query (Slipstream Path)

\[\frac{ \Gamma \vdash \text{modalities} : \mathcal{M} \quad \Gamma \vdash \text{source} : \text{Source} \quad \Gamma \vdash \text{condition} : \text{Octad} \to \text{Bool} }{ \Gamma \vdash \texttt{SELECT } \mathcal{M} \texttt{ FROM } \text{source} \texttt{ WHERE } \text{condition} : \text{QueryResult}[\mathcal{M}] }\]

Interpretation: A slipstream query returns a list of octads with selected modalities, without proof guarantees.

4.5.2. T-ProvedQuery (Dependent-Type Path)

\[\frac{ \Gamma \vdash \text{modalities} : \mathcal{M} \quad \Gamma \vdash \text{source} : \text{Source} \quad \Gamma \vdash \text{condition} : \text{Octad} \to \text{Bool} \quad \Gamma \vdash \text{proof-spec} : \text{ProofType} \times \text{Contract} }{ \Gamma \vdash \texttt{SELECT } \mathcal{M} \texttt{ ... PROOF } \text{proof-spec} : \text{ProvedResult}[\mathcal{M}, \phi_{\text{proof-spec}}] }\]

Interpretation: A dependent-type query returns results with proof that the contract \(\phi_{\text{proof-spec}}\) holds.

4.5.3. T-Condition (Graph Condition Example)

\[\frac{ \Gamma \vdash \text{pattern} : \text{SPARQLPattern} }{ \Gamma \vdash \texttt{WHERE } \text{pattern} : \text{Octad} \to \text{Bool} }\]

4.5.4. T-VectorSimilarity

\[\frac{ \Gamma \vdash v : \text{Vector}[n] \quad \Gamma \vdash \text{threshold} : \text{Float} }{ \Gamma \vdash \texttt{WHERE h.embedding SIMILAR TO } v \texttt{ WITHIN } \text{threshold} : \text{Octad} \to \text{Bool} }\]

4.5.5. T-ProofSpec (EXISTENCE)

\[\frac{ \Gamma \vdash \text{contract} : \text{ExistenceContract} }{ \Gamma \vdash \texttt{PROOF EXISTENCE(contract)} : \text{ProofSpec}[\exists h. \text{Valid}(h)] }\]

4.5.6. T-ProofSpec (CITATION)

\[\frac{ \Gamma \vdash \text{contract} : \text{CitationContract} }{ \Gamma \vdash \texttt{PROOF CITATION(contract)} : \text{ProofSpec}[\forall h. \text{CitationValid}(h)] }\]

5. Operational Semantics (Dynamic Semantics)

5.1. Evaluation Environments

We define two evaluation environments:

  1. Store Environment \(\Sigma\): Maps UUIDs to Octad data

  2. Proof Environment \(\Pi\): Maps contracts to proof functions

\[\begin{aligned} \Sigma &: \text{UUID} \rightharpoonup \text{OctadData} \\ \Pi &: \text{Contract} \rightharpoonup (\text{OctadData} \to \text{Proof}) \end{aligned}\]

5.2. Big-Step Semantics (Slipstream Path)

5.2.1. E-SlipstreamQuery

\[\frac{ \llbracket \text{source} \rrbracket_\Sigma = \{h_1, \ldots, h_n\} \quad \forall i. \llbracket \text{condition}(h_i) \rrbracket_\Sigma = \text{true} \Rightarrow h_i \in R \quad \forall i. R_i = \text{project}_\mathcal{M}(h_i) }{ \llbracket \texttt{SELECT } \mathcal{M} \texttt{ FROM } \text{source} \texttt{ WHERE } \text{condition} \rrbracket_\Sigma = [R_1, \ldots, R_k] }\]

Interpretation: 1. Retrieve candidate octads from source 2. Filter by condition 3. Project to selected modalities 4. No proof generation

5.2.2. E-ProjectModality

\[\frac{ \Sigma(h) = \text{octad-data} \quad \text{modality} \in \mathcal{M} }{ \text{project}_\mathcal{M}(h)[\text{modality}] = \text{extract}(\text{octad-data}, \text{modality}) }\]

5.3. Big-Step Semantics (Dependent-Type Path)

5.3.1. E-ProvedQuery

\[\frac{ \llbracket \text{source} \rrbracket_\Sigma = \{h_1, \ldots, h_n\} \quad \forall i. \llbracket \text{condition}(h_i) \rrbracket_\Sigma = \text{true} \Rightarrow h_i \in R \quad \forall i. R_i = \text{project}_\mathcal{M}(h_i) \quad \pi = \text{prove}_\Pi(\text{proof-spec}, [R_1, \ldots, R_k]) \quad \pi : \phi_{\text{proof-spec}}([R_1, \ldots, R_k]) }{ \llbracket \texttt{SELECT } \mathcal{M} \texttt{ ... PROOF } \text{proof-spec} \rrbracket_{\Sigma,\Pi} = ([R_1, \ldots, R_k], \pi) }\]

Interpretation: 1. Same filtering as slipstream path 2. Generate proof \(\pi\) that contract holds 3. Return results with proof term

5.3.2. E-Prove (EXISTENCE)

\[\frac{ \text{contract} = \text{ExistenceContract} \quad \forall h \in R. \Sigma(h) \neq \bot }{ \text{prove}_\Pi(\texttt{EXISTENCE(contract)}, R) = \pi_{\exists} }\]

Where \(\pi_{\exists}\) is a ZKP proving existence without revealing data.

5.3.3. E-Prove (CITATION)

\[\frac{ \text{contract} = \text{CitationContract} \quad \forall h \in R. \text{validateCitationChain}_\Sigma(h) = \text{true} }{ \text{prove}_\Pi(\texttt{CITATION(contract)}, R) = \pi_{\text{cite}} }\]

Where \(\pi_{\text{cite}}\) is a ZKP proving citation chain validity.

5.4. Small-Step Semantics (Drift Handling)

For federated queries with drift, we model evaluation as a state machine:

\[\langle Q, \Sigma_1 \parallel \cdots \parallel \Sigma_n, \text{DRIFT-MODE} \rangle \rightarrow^* \langle R, \Pi \rangle\]

5.4.1. S-DriftDetect

\[\frac{ \Sigma_i(h)[\text{modality}_1] \neq \Sigma_j(h)[\text{modality}_1] \quad i \neq j }{ \langle h, \Sigma_1 \parallel \cdots \parallel \Sigma_n \rangle \xrightarrow{\text{drift}} \langle h, \text{INCONSISTENT} \rangle }\]

5.4.2. S-DriftRepair-LatestWins

\[\frac{ \text{DRIFT-MODE} = \texttt{LATEST} \quad \Sigma_k(h)[\text{timestamp}] = \max_i \Sigma_i(h)[\text{timestamp}] }{ \langle h, \text{INCONSISTENT} \rangle \xrightarrow{\text{repair}} \langle h, \Sigma_k(h) \rangle }\]

5.4.3. S-DriftRepair-Quorum

\[\frac{ \text{DRIFT-MODE} = \texttt{REPAIR} \quad \exists v. |\{i \mid \Sigma_i(h)[\text{field}] = v\}| > n/2 }{ \langle h, \text{INCONSISTENT} \rangle \xrightarrow{\text{quorum}} \langle h, v \rangle }\]

6. Proof Obligations (Dependent-Type Path)

6.1. ZKP Contract Specifications

For each proof type, we specify the proposition \(\phi\) that must be proven:

6.1.1. EXISTENCE Contract

\[\phi_{\text{EXISTENCE}}(h) \equiv \exists \text{data} \in \Sigma. \Sigma(h) = \text{data} \land \text{accessible}(h)\]

Informal: The octad exists in at least one store and is accessible to the querier.

6.1.2. CITATION Contract

\[\phi_{\text{CITATION}}(h) \equiv \forall h' \in \text{citations}(h). \text{ValidCitation}(h, h') \land \phi_{\text{EXISTENCE}}(h')\]

Informal: All citations are valid and cited octads exist.

6.1.3. ACCESS Contract

\[\phi_{\text{ACCESS}}(h, u) \equiv \text{hasPermission}(u, h, \text{READ})\]

Informal: User u has read permission for octad h.

6.1.4. INTEGRITY Contract

\[\phi_{\text{INTEGRITY}}(h) \equiv \forall t \in \text{versions}(h). \text{Hash}(\text{data}(h, t)) = \text{CommittedHash}(h, t)\]

Informal: Data has not been tampered with since commitment.

6.1.5. PROVENANCE Contract

\[\phi_{\text{PROVENANCE}}(h) \equiv \exists \text{lineage}. \text{ValidLineage}(h, \text{lineage}) \land \forall h' \in \text{lineage}. \phi_{\text{INTEGRITY}}(h')\]

Informal: Full lineage is traceable and all ancestors maintain integrity.

6.2. Proof Generation

For dependent-type queries, VQL delegates to proven-library (external ZKP framework):

\[\text{prove}_\Pi(\text{proof-spec}, R) = \text{proven.generate}(\phi_{\text{proof-spec}}, R, \text{witness})\]

Where: - \(\phi_{\text{proof-spec}}\) is the proposition from the contract - R is the query result (public input) - witness is private data used for proof construction

7. Soundness Theorem

7.1. Slipstream Path

Theorem (Slipstream Correctness):

If \(\Gamma \vdash Q : \text{QueryResult}[\mathcal{M}\)] and \(\llbracket Q \rrbracket_\Sigma = R\), then:

\[\forall h \in R. \exists \text{data} \in \Sigma. h = \text{project}_\mathcal{M}(\text{data})\]

Informal: All results are valid projections from the store (but no additional guarantees).

7.2. Dependent-Type Path

Theorem (Dependent-Type Soundness):

If \(\Gamma \vdash Q : \text{ProvedResult}[\mathcal{M}, \phi\)] and \(\llbracket Q \rrbracket_{\Sigma,\Pi} = (R, \pi)\), then:

\[\pi : \phi(R) \land \text{verify}(\pi, \phi(R), \text{public-params}) = \text{true}\]

Informal: The proof term \(\pi\) is a valid proof of proposition \(\phi\) applied to results R, and it verifies correctly.

Proof sketch: By induction on the derivation of \(\Gamma \vdash Q : \text{ProvedResult}[\mathcal{M}, \phi\)]. The critical step uses the soundness of the underlying ZKP system (proven-library).

8. Semantic Equivalence

8.1. Query Equivalence

Two queries \(Q_1\) and \(Q_2\) are semantically equivalent (\(Q_1 \equiv Q_2\)) if:

\[\forall \Sigma. \llbracket Q_1 \rrbracket_\Sigma = \llbracket Q_2 \rrbracket_\Sigma\]

Example equivalences:

  1. Commutativity of AND:

    \[\texttt{WHERE } C_1 \texttt{ AND } C_2 \equiv \texttt{WHERE } C_2 \texttt{ AND } C_1\]
  2. Idempotence of modality selection:

    \[\texttt{SELECT GRAPH, GRAPH} \equiv \texttt{SELECT GRAPH}\]
  3. Filter distribution:

    \[\texttt{WHERE } C_1 \texttt{ AND } (C_2 \texttt{ OR } C_3) \equiv (\texttt{WHERE } C_1 \texttt{ AND } C_2) \texttt{ OR } (\texttt{WHERE } C_1 \texttt{ AND } C_3)\]

8.2. Proof Equivalence

Two proofs \(\pi_1\) and \(\pi_2\) are equivalent if they prove the same proposition:

\[\pi_1 \sim \pi_2 \iff \pi_1 : \phi \land \pi_2 : \phi\]

Note: ZKPs may have different witnesses but prove the same statement.

9. Drift Semantics

9.1. Consistency Levels

We define a consistency lattice for federated queries:

\[\begin{aligned} \text{STRICT} &\sqsubset \text{LATEST} \\ \text{STRICT} &\sqsubset \text{REPAIR} \\ \text{LATEST} &\sqsubset \text{TOLERATE} \\ \text{REPAIR} &\sqsubset \text{TOLERATE} \end{aligned}\]

Where \(A \sqsubset B\) means "A provides stronger guarantees than B".

9.2. Drift-Aware Semantics

For federated queries with drift mode \(D\):

\[\llbracket Q \rrbracket_{\Sigma_1 \parallel \cdots \parallel \Sigma_n, D} = \text{merge}_D(\llbracket Q \rrbracket_{\Sigma_1}, \ldots, \llbracket Q \rrbracket_{\Sigma_n})\]

Where \(\text{merge}_D\) implements the drift policy:

9.2.1. STRICT

\[\text{merge}_{\text{STRICT}}(R_1, \ldots, R_n) = \begin{cases} R_1 & \text{if } R_1 = \cdots = R_n \\ \bot & \text{otherwise} \end{cases}\]

9.2.2. LATEST

\[\text{merge}_{\text{LATEST}}(R_1, \ldots, R_n) = R_i \text{ where } \text{timestamp}(R_i) = \max_j \text{timestamp}(R_j)\]

9.2.3. REPAIR

\[\text{merge}_{\text{REPAIR}}(R_1, \ldots, R_n) = \text{quorum}(R_1, \ldots, R_n)\]

9.2.4. TOLERATE

\[\text{merge}_{\text{TOLERATE}}(R_1, \ldots, R_n) = R_1 \cup \cdots \cup R_n \text{ (with drift annotations)}\]

10. Denotational Semantics (Alternative View)

We can also give a denotational semantics by interpreting queries as functions on stores:

\[\llbracket \cdot \rrbracket : \text{Query} \to (\text{Store} \to \mathcal{P}(\text{Octad}))\]

Where \(\mathcal{P}(\text{Octad})\) is the powerset of octads.

10.1. Basic Queries

\[\llbracket \texttt{SELECT } \mathcal{M} \texttt{ FROM HEXAD } u \rrbracket(\Sigma) = \begin{cases} \{\text{project}_\mathcal{M}(\Sigma(u))\} & \text{if } u \in \text{dom}(\Sigma) \\ \emptyset & \text{otherwise} \end{cases}\]

10.2. Filtered Queries

\[\llbracket \texttt{SELECT } \mathcal{M} \texttt{ ... WHERE } C \rrbracket(\Sigma) = \{h \in \llbracket \texttt{SELECT } \mathcal{M} \texttt{ ...} \rrbracket(\Sigma) \mid \llbracket C \rrbracket(h) = \text{true}\}\]

10.3. Compositional Semantics

Conditions compose via logical operators:

\[\begin{aligned} \llbracket C_1 \texttt{ AND } C_2 \rrbracket(h) &= \llbracket C_1 \rrbracket(h) \land \llbracket C_2 \rrbracket(h) \\ \llbracket C_1 \texttt{ OR } C_2 \rrbracket(h) &= \llbracket C_1 \rrbracket(h) \lor \llbracket C_2 \rrbracket(h) \\ \llbracket \texttt{NOT } C \rrbracket(h) &= \neg \llbracket C \rrbracket(h) \end{aligned}\]

11. Future Extensions

11.1. Dependent Types with Refinements

Future versions may support refinement predicates in query syntax:

SELECT GRAPH, VECTOR
FROM verisim:semantic
WHERE octad : {h : Octad | citationCount(h) > 10}
LIMIT 100;

Formal type:

\[\{h : \text{Octad} \mid \text{citationCount}(h) > 10\}\]

11.2. Liquid Types

Integration with liquid types for automatic predicate inference:

\[\Gamma \vdash Q : \text{QueryResult}[\mathcal{M}]\{\nu : \text{List}(\text{Octad}) \mid \phi(\nu)\}\]

Where \(\phi\) is automatically inferred from the query structure.

12. References