Skip to content

rozek/shareable-data-store

Repository files navigation

shareable-data-store

A CRDT-based, offline-capable, real-time-syncable tree-of-data library for Browser and Node.js.

work-in-progress - please ignore for the moment

Data Items are organised in a hierarchy (with optional links between them), can carry arbitrary typed values (text, binary, large blobs), and synchronise across devices and users without conflicts β€” even when peers are offline for extended periods. The library is split into small, composable packages so you can use only what you need.

The CRDT engine is pluggable: choose from three ready-made backends or write your own: Y.js, Loro CRDT or JSON JOY.

Current Roadmap

Next steps:

  • CLI interface - access your stores from the command line
  • MCP server - access your stores from any MCP-capable AI
  • sidecar - tracks store changes and persists them, triggers WebHooks on specific changes
  • applications - some concrete applications (e.g., a shareable notebook similar to Obsidian or Notion)
  • (an AI-driven β€œHyperCard on steroids”, backed by SDS and generating WebApps for SDS)

Packages

Shared Types

package description
@rozek/sds-core Backend-agnostic shared types: SDS_Error, SDS_ChangeSet, SDS_Entry/Data/Link base classes, the SDS_DataStore contract interface, and all provider interfaces. No CRDT engine β€” the interface must be implemented by a backend.

CRDT Backends (choose one)

package CRDT engine description
@rozek/sds-core-jj json-joy 17.x reference backend; ships with a canonical empty snapshot
@rozek/sds-core-yjs Y.js no canonical snapshot; Y.js state-vector cursor
@rozek/sds-core-loro Loro no canonical snapshot; Loro version-vector cursor; Rust/WASM

All three backend packages expose an identical public API. Import from whichever backend suits your project; application code never calls any CRDT library directly.

Infrastructure (Backend-agnostic)

package description
@rozek/sds-persistence-node SQLite persistence for Node.js and Electron
@rozek/sds-persistence-browser IndexedDB persistence for browsers
@rozek/sds-network-websocket WebSocket sync + presence provider
@rozek/sds-network-webrtc WebRTC peer-to-peer sync + presence provider (browser)
@rozek/sds-sync-engine orchestrates persistence, network, and presence
@rozek/sds-websocket-server Hono-based relay server: JWT auth, signalling, token issuance
@rozek/sds-command generic CLI library: one-shot commands, interactive REPL, batch scripts β€” wire in a backend via runCommand()
@rozek/sds-mcp-server generic MCP server library: access and manipulate stores from any MCP-capable AI β€” wire in a backend via runMCPServer()
@rozek/sds-sidecar generic sidecar library: tracks store changes, persists them, fires WebHooks β€” wire in a backend via runSidecar()

Backend-specific Tools (ready-to-run executables)

package binary CRDT backend
@rozek/sds-command-jj sds-jj json-joy
@rozek/sds-command-loro sds-loro Loro
@rozek/sds-command-yjs sds-yjs Y.js
@rozek/sds-mcp-server-jj sds-mcp-server-jj json-joy
@rozek/sds-mcp-server-loro sds-mcp-server-loro Loro
@rozek/sds-mcp-server-yjs sds-mcp-server-yjs Y.js
@rozek/sds-sidecar-jj sds-sidecar-jj json-joy
@rozek/sds-sidecar-loro sds-sidecar-loro Loro
@rozek/sds-sidecar-yjs sds-sidecar-yjs Y.js

Each wrapper is a thin ~40-line entry point that wires the corresponding @rozek/sds-core-* backend into the generic library. Install only the package matching your chosen CRDT backend.


How it works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Your Application                       β”‚
β”‚                                                             β”‚
β”‚   SDS_DataStore  ←── read/write ──►  SDS_SyncEngine         β”‚
β”‚   (any backend)                      β”‚      β”‚       β”‚       β”‚
β”‚                               Persistence Network Presence  β”‚
β”‚                                 Provider Provider Provider  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–² CRDT patches                          β”‚ WebSocket / WebRTC
         β”‚                                       β–Ό
         └──────────────────── SDS WebSocket Server ◄── other peers
                                                    (Browser PWA, Node.js app,
                                                     sds-sidecar, sds-mcp-server, …)

SDS_DataStore is the source of truth. It holds a tree of items and links stored as a conflict-free replicated data type (CRDT). SDS_SyncEngine wires it to any combination of:

  • a persistence provider β€” saves snapshots and patches so the store survives restarts and works offline
  • a network provider β€” exchanges CRDT patches with a relay server or other peers in real time
  • a presence provider β€” shares user name, cursor position, and focus between peers without persisting them

All providers are optional and interchangeable. A browser PWA might use IndexedDB + WebSocket; an Electron app might use SQLite + WebRTC with WebSocket fallback.


Quick start

Choosing a backend

All backends offer the same API. Pick one and import from it:

// Option A β€” json-joy (reference implementation)
import { SDS_DataStore } from '@rozek/sds-core-jj'

// Option B β€” Y.js
import { SDS_DataStore } from '@rozek/sds-core-yjs'

// Option C β€” Loro CRDT
import { SDS_DataStore } from '@rozek/sds-core-loro'

The examples below use @rozek/sds-core-jj but work identically with any backend.

Local-only β€” no network, no server

import { SDS_DataStore } from '@rozek/sds-core-jj'

const DataStore = SDS_DataStore.fromScratch()

const DataItem = DataStore.newItemAt(undefined, DataStore.RootItem)
DataItem.Label = 'My first data'
DataItem.writeValue('Hello, world!')

// serialise to binary and restore later
const StoreSnapshot = DataStore.asBinary()
const restoredStore = SDS_DataStore.fromBinary(StoreSnapshot)

Browser PWA β€” offline-first with WebSocket sync

import { SDS_DataStore }                  from '@rozek/sds-core-jj'
import { SDS_BrowserPersistenceProvider } from '@rozek/sds-persistence-browser'
import { SDS_WebSocketProvider }          from '@rozek/sds-network-websocket'
import { SDS_SyncEngine }                 from '@rozek/sds-sync-engine'

const DataStore   = SDS_DataStore.fromScratch()
const Persistence = new SDS_BrowserPersistenceProvider('my-store')
const Network     = new SDS_WebSocketProvider('my-store')

const SyncEngine = new SDS_SyncEngine(DataStore, {
  PersistenceProvider: Persistence,
  NetworkProvider:     Network,
  PresenceProvider:    Network,  // WebSocket provider doubles as presence provider
})

// restores persisted state before resolving
await SyncEngine.start()

// connect to the relay server with a JWT
await SyncEngine.connectTo('wss://my-server.example.com', { Token:'<jwt>' })

// write items β€” the engine handles persistence and sync transparently
const DataItem = DataStore.newItemAt(undefined, DataStore.RootItem)
DataItem.Label = 'Hello from the browser!'

Node.js / Electron β€” SQLite persistence

import { SDS_DataStore }                  from '@rozek/sds-core-jj'
import { SDS_DesktopPersistenceProvider } from '@rozek/sds-persistence-node'
import { SDS_WebSocketProvider }          from '@rozek/sds-network-websocket'
import { SDS_SyncEngine }                 from '@rozek/sds-sync-engine'

const Store       = SDS_DataStore.fromScratch()
const Persistence = new SDS_DesktopPersistenceProvider('./data/sds.db', 'my-store')
const Network     = new SDS_WebSocketProvider('my-store')

const Engine = new SDS_SyncEngine(Store, {
  PersistenceProvider: Persistence,
  NetworkProvider:     Network,
  PresenceProvider:    Network,
})

await Engine.start()
await Engine.connectTo('wss://my-server.example.com', { Token:'<jwt>' })

Presence β€” see who is editing what

// broadcast your cursor position and user info
Engine.setPresenceTo({
  UserName: 'Alice',
  UserColor:'#e74c3c',
  UserFocus:{
    EntryId: DataItem.Id,
    Property:'Value',
    Cursor:  { from:0, to:5 },
  },
})

// react when any peer's presence changes
Engine.onPresenceChange((PeerId, State, Origin) => {
  if (Origin === 'remote') {
    console.log(`${State?.UserName ?? PeerId} is editing`, State?.UserFocus?.EntryId)
  }
})

Running the relay server

import { createSDSServer } from '@rozek/sds-websocket-server'
import { serve }           from '@hono/node-server'

const { app:App } = createSDSServer({ JWTSecret:'your-secret-at-least-32-chars' })

serve({ fetch:App.fetch, port:3000 }, () => {
  console.log('SDS relay server listening on port 3000')
})

Concepts

Every store starts with three well-known items:

  • RootItem β€” root of the user-visible tree
  • TrashItem β€” deleted entries are moved here; can be permanently purged
  • LostAndFoundItem β€” entries orphaned by a remote peer's purge are rescued here

Each data has:

  • Label β€” plain-text title (collaborative, CRDT string)
  • Type β€” MIME type string (text/plain, text/markdown, image/png, …)
  • Value β€” string, Uint8Array, or a reference to a large blob (stored separately)
  • Info β€” arbitrary key–value metadata map
  • innerEntryList β€” ordered list of inner items and links

Links are pointer entries: they live inside a container data and point to a target data elsewhere in the tree. They are useful for cross-references, shortcuts, and aliases.


Requirements

Functional requirements

Data Store

  • the store holds an ordered, arbitrarily deep tree of entries. Each entry is either an item or a link.
  • an item carries a MIME type, a string or binary value (or no value at all), a plain-text label, and an arbitrary key–value metadata map (Info).
  • a link is a named pointer to another item. It lives inside a container item but does not own its target.
  • the tree always contains three non-deletable, non-movable well-known items: RootItem, TrashItem, and LostAndFoundItem.
  • entries can be created, labelled, moved, and soft-deleted (moved to TrashItem). Soft-deleted entries can be permanently purged.
  • an item that is still referenced by a link from the live tree (reachable from RootItem) is protected: a purge attempt throws SDS_Error('purge-protected').
  • entries in TrashItem are eligible for automatic permanent deletion after a configurable time-to-live (TrashTTLms). The timestamp is recorded in the entry's Info._trashedAt field (synced via CRDT) when the entry is moved to Trash via deleteEntry.
  • after applying a remote patch, any entry whose declared outer item no longer exists is automatically rescued to LostAndFoundItem. Dangling links whose target no longer exists are recreated in LostAndFoundItem so the link remains valid.
  • ordering of inner entries within a container is collaborative and stable: any peer can insert an entry at any position without conflicting with concurrent inserts on other peers.
  • large string and binary values that exceed configurable thresholds are stored as external blobs referenced by a SHA-256 hash, keeping the CRDT compact.
  • the store can be serialised to a compact binary snapshot (gzip-compressed) or to a base64 string, and can be fully reconstructed from either.

Synchronisation

  • all mutations are expressed as binary CRDT patches that can be applied in any order and on any peer without conflict.
  • multiple patches originating from a single transact() call are wrapped in a lightweight binary envelope and treated atomically by the receiver.
  • patches accumulated since any given clock position can be exported for incremental sync.
  • the sync engine persists a rolling snapshot plus incremental patches. A new checkpoint (snapshot + prune) is triggered automatically when accumulated patches exceed 512 KB, and always on clean shutdown.
  • outgoing patches generated while offline are queued in memory and flushed in order as soon as the connection is re-established.

Presence

  • each sync-engine instance has a unique peer id (UUID).
  • presence state (user name, colour, focused entry, cursor position) is broadcasted to all connected peers.
  • remote peers that have not sent an update within the configured timeout (default 120 s) are automatically removed.
  • presence state is not persisted β€” it is ephemeral.

Networking

  • two transport providers are available out of the box: WebSocket (browser + Node.js) and WebRTC with WebSocket fallback (browser only).
  • the server authenticates clients with HS256 JWTs and enforces read / write / admin scope per connection.
  • the server exposes a WebRTC signalling endpoint and an admin API for token issuance.
  • large value blobs are transferred as chunked binary frames independent of the CRDT patch stream.

Non-functional requirements

  • Runtime support β€” browser (modern, ESM-capable) and Node.js 22+; no CommonJS.
  • Module format β€” ESM-only throughout, with TypeScript declaration files (.d.ts).
  • Conflict-free β€” all concurrent edits must converge to the same state on all peers without manual conflict resolution.
  • Offline-first β€” local reads and writes must work without any network connectivity.
  • Composable β€” each concern (persistence, networking, presence) is encapsulated in a separate, exchangeable package.
  • Compact wire format β€” CRDT patches are binary-encoded; snapshots are gzip-compressed.
  • No global state β€” multiple independent store instances must be usable within the same process.
  • Testability β€” all providers are defined as interfaces so they can be replaced by mocks in unit tests.
  • MIT-compatible dependency chain β€” all runtime dependencies must use a permissive open-source licence (Apache 2.0, MIT, or equivalent) to allow inclusion in MIT-licensed projects.

Data model

The store as seen by application code

From the application's perspective the store looks like a document-oriented tree:

RootItem
β”œβ”€β”€ Item (text/plain)  β€” Label="Meeting items", Value="…"
β”‚   β”œβ”€β”€ Item (text/markdown)  β€” Label="Action items"
β”‚   └── Link  ──────────────────────────────────────► Item (text/plain)  β€” Label="Alice's task"
β”œβ”€β”€ Item (image/png)  β€” Label="Screenshot", Value=<binary blob ref>
└── TrashItem
    └── Item (text/plain)  β€” Label="Draft (deleted)"

The three well-known items are always present and have fixed UUIDs:

UUID role
00000000-0000-4000-8000-000000000000 RootItem
00000000-0000-4000-8000-000000000001 TrashItem
00000000-0000-4000-8000-000000000002 LostAndFoundItem

Store rules

The following invariants are maintained at all times:

  1. Acyclicity β€” the outer-item chain of any entry must not form a cycle. moveEntryTo throws SDS_Error('move-would-cycle') if the target item is a descendant of the entry being moved.
  2. Root immobility β€” RootItem, TrashItem, and LostAndFoundItem can never be moved or deleted.
  3. Trash-only purge β€” purgeEntry only accepts direct inner entries of TrashItem. Deeper descendants must be moved to the trash root first.
  4. Link protection β€” an item (and its subtree) is protected if any item within it is the target of a link that is reachable from RootItem. purgeEntry throws SDS_Error('purge-protected') for protected items; purgeExpiredTrashEntries skips them silently.
  5. Trash TTL β€” deleteEntry records a _trashedAt timestamp (ms since epoch) in the entry's Info object. purgeExpiredTrashEntries(TTLms) permanently removes all direct inner entries of TrashItem whose _trashedAt is older than TTLms. When TrashTTLms is passed to SDS_DataStore.fromScratch, an internal timer fires at TrashCheckIntervalMs intervals automatically.
  6. Orphan rescue β€” after applying a remote patch, any entry whose outerItemId points to a non-existent item is immediately moved to LostAndFoundItem by recoverOrphans().
  7. Dangling-link rescue β€” after applying a remote patch, any link whose TargetId points to a non-existent item causes that item to be recreated (empty) in LostAndFoundItem.
  8. Inner-entry ordering β€” the innerEntryList of an item is always sorted by OrderKey (ascending), with the entry Id as a tie-breaker. Order is stable across concurrent insertions on different peers.

Flat map layout

Internally the store uses a flat Entries map (keyed by UUID) rather than a nested tree structure. This is true for all backends. The entry's outerItemId and OrderKey fields record its position in the tree. A move is a single field update; patches stay small regardless of tree depth.

Fractional indexing for collaborative ordering

CRDT maps are inherently unordered. To give users a stable, collaboratively editable inner-entry order without conflicts, each entry records an OrderKey alongside its outerItemId. OrderKey strings are generated by the fractional-indexing npm package.

The algorithm was originally developed by Evan Wallace at Figma and later published by the Rocicorp team. Just as there is always a rational number between any two distinct rationals, there is always a lexicographically ordered string between any two distinct strings. A key can always be generated between any two neighbours without modifying their keys; concurrent inserts at the same position produce different keys resolved by UUID as a tie-breaker.

In-memory secondary indices

Because the CRDT only stores outerItemId inside the inner entry, reconstructing the inner-entry list of a given item requires scanning all entries β€” O(n). The store maintains a #ReverseIndex (Map<outerItemId, Set<innerId>>) and a #LinkTargetIndex (Map<targetId, Set<linkId>>) as in-memory secondary indices. Both are rebuilt from scratch once during construction and then kept in sync incrementally for every mutation.

Remote patches use a forward-index diff: companion maps #ForwardIndex and #LinkForwardIndex record the last-known placement of every entry. After a patch is applied, the store iterates the new view once and compares each entry's current placement against the forward index, touching only the buckets that actually changed.


Backend-specific details

Each backend stores the same logical model in a different CRDT representation. Refer to the individual package READMEs for details:

  • packages/core-jj/README.md β€” json-joy (canonical snapshot, 4-byte cursor)
  • packages/core-yjs/README.md β€” Y.js (state-vector cursor, Y.Text)
  • packages/core-loro/README.md β€” Loro CRDT (version-vector cursor, LoroText, Rust/WASM)

The SDS_SyncCursor abstraction

The persistence interface uses an opaque binary cursor (SDS_SyncCursor = Uint8Array) instead of a raw integer clock. Each backend encodes the cursor differently:

backend cursor encoding
json-joy 4-byte big-endian integer (patch sequence number)
Y.js Y.js state vector (Y.encodeStateVector(doc))
Loro Loro version vector (doc.version().encode())

The sync engine and all persistence providers treat the cursor as an opaque blob β€” they store and pass it without interpretation. This allows the persistence infrastructure to be reused across all backends without change.


Migrating between backends

Binary snapshots and CRDT patches are not cross-compatible between backends. To migrate existing data from one backend to another:

  1. load your data with the old backend:

    // Example: migrate from json-joy to Y.js
    import { SDS_DataStore as oldStore } from '@rozek/sds-core-jj'
    import { SDS_DataStore as newStore } from '@rozek/sds-core-yjs'
    
    const oldContents = oldStore.fromBinary(existingSnapshot)
  2. export all entries as JSON:

    const allEntries = oldContents.asJSON()
  3. Create a fresh store with the new backend and import the JSON:

    const newContents = newStore.fromJSON(allEntries)
  4. re-persist the new store:

    const newSnapshot = newContents.asBinary()
    // save newSnapshot with your persistence provider

Important:

  • the JSON export/import path preserves all Labels, Values, Info keys, MIME types, and the tree structure, but does not preserve CRDT history. Every peer receiving the migrated snapshot will see it as a single atomic origin β€” there is no incremental patch history to replay.
  • if you run multiple peers, all peers must migrate simultaneously to the same backend. A json-joy peer and a Y.js peer cannot exchange patches.
  • after migration, re-initialise your persistence store (clear old patches and snapshots rows and seed with the new binary snapshot).
  • all SDS_SyncCursor values stored in the database are backend-specific and must be discarded on migration.

Implementing a new backend

You can add a new CRDT engine by creating a package that implements the same SDS_DataStore class surface. Here is what is required:

Requirements for a new backend

Static Factory Methods

static fromScratch (Options?:SDS_DataStoreOptions):SDS_DataStore
static fromBinary (Data:Uint8Array, Options?:SDS_DataStoreOptions):SDS_DataStore
static fromJSON (Data:unknown, Options?:SDS_DataStoreOptions):SDS_DataStore

fromScratch() must create the three well-known entries (RootItem, TrashItem, LostAndFoundItem) with their fixed UUIDs and set TrashItem and LostAndFoundItem as children of RootItem. Two independent calls to fromScratch() on different peers must produce stores that can exchange patches and converge.

Serialisation

asBinary ():Uint8Array
asJSON ():unknown

asBinary() returns a self-contained, gzip-compressed snapshot that fromBinary() can restore. asJSON() returns a plain-object representation that fromJSON() can restore (used for backend migration).

Sync

get currentCursor ():SDS_SyncCursor
exportPatch (since?:SDS_SyncCursor):Uint8Array
applyRemotePatch (encodedPatch:Uint8Array):void

exportPatch() with no argument exports a full snapshot patch; with a cursor it exports only the operations added since that cursor. applyRemotePatch() merges a remote patch into the local document and fires change handlers with Origin = 'external'.

Mutation Methods β€” identical to @rozek/sds-core-jj (the reference implementation)

newItemAt (MIMEType:string|undefined, outerItem:SDS_Item, InsertionIndex?:number):SDS_Item
newLinkAt (Target:SDS_Item, outerItem:SDS_Item, InsertionIndex?:number):SDS_Link
EntryWithId (Id:string): SDS_Entry | undefined
moveEntryTo (Entry:SDS_Entry, outerItem:SDS_Item, InsertionIndex?:number):void
deleteEntry (Entry:SDS_Entry):void
purgeEntry (Entry:SDS_Entry):void
deserializeItemInto (Data:unknown, Container:SDS_Item, InsertionIndex?:number):SDS_Item
deserializeLinkInto (Data:unknown, Container:SDS_Item, InsertionIndex?:number):SDS_Link
recoverOrphans ():void
transact (Callback:() => void):void
onChangeInvoke (Handler:ChangeHandler):() => void

Properties

get RootItem ():SDS_Item
get TrashItem ():SDS_Item
get LostAndFoundItem ():SDS_Item

Data Storage Constraints

  • store entries in a flat map keyed by UUID (no nested tree in the CRDT layer).
  • each entry must expose at minimum: Kind, outerItemId, OrderKey, Label, Info, MIMEType, ValueKind and the appropriate value field.
  • use fractional-indexing for OrderKey generation to maintain collaborative ordering.
  • maintain in-memory #ReverseIndex, #ForwardIndex, #LinkTargetIndex, #LinkForwardIndex, and #WrapperCache for efficient traversal and incremental updates.
  • re-use the SDS_Entry, SDS_Item, and SDS_Link classes from @rozek/sds-core (or copy and adapt them) β€” they delegate all CRDT operations back to the store via bracket-notation calls (this._Store['_method']()).

Change Notifications

  • maintain a #TransactDepth counter. Increment it on entry, decrement on exit.
  • collect a SDS_ChangeSet during the transaction.
  • fire all registered ChangeHandlers exactly once when the outermost transaction completes.
  • remote patches must fire handlers with Origin = 'external'; local mutations with Origin = 'internal'.

Package structure

Follow the same layout as packages/core-jj, packages/core-yjs, or packages/core-loro:

packages/core-<name>/
  package.json       ← name: @rozek/sds-core-<name>
  tsconfig.json      ← extends ../../tsconfig.base.json
  vite.config.ts
  src/
    sds-core-<name>.ts   ← entry point (re-export all public symbols)
    store/
      constants.ts
      SDS_DataStore.ts   ← full implementation
      SDS_Entry.ts       ← copied from core, local import
      SDS_Item.ts
      SDS_Link.ts
    error/SDS_Error.ts
    changeset/SDS_ChangeSet.ts
    changeset/SDS_EntryChangeSet.ts
    interfaces/SDS_PersistenceProvider.ts
    tests/
      SDS_Error.test.ts
      SDS_DataStore.construction.test.ts   ← backend-specific
      SDS_DataStore.creation.test.ts
      … (mirror the core test suite)
  TestPlan.md
  TestCases.md
  README.md

Add the new package to pnpm-workspace.yaml under the packages: list and add it as a workspace package to package.json if needed.


Development

# Install all dependencies
pnpm install

# Build all packages
pnpm -r build

# Run all tests
pnpm -r test:run

# Type-check all packages
pnpm -r typecheck

License

MIT License Β© Andreas Rozek

About

a CRDT-based, offline-capable, syncable data tree for Browser and Node.js with JSON JOY, Y.js or Loro underneath

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors