ProjSkep is a high-performance, local-first retrieval engine designed to eliminate the distance between creative intent and audio discovery. It transforms a chaotic filesystem of audio assets into a navigable semantic topology, optimized for deep work and creative continuity.
ProjSkep is built as a layered system, separating signal analysis from persistence and interface. This ensures that even massive libraries (100,000+ assets) remain searchable with sub-millisecond latency without impacting DAW stability.
The engine performs deterministic descriptor extraction using a 2048-point FFT. It computes a 256-bin log-spaced magnitude vector for every audio asset, alongside high-level perceptual descriptors:
- Spectral Centroid (Brightness)
- Spectral Flux (Transient Density)
- Wiener Entropy (Noisiness/Flatness)
- HPCP (Tonal Profile)
Similarity search is powered by a Hierarchical Navigable Small World (HNSW) graph. This allows for approximate nearest neighbor (ANN) retrieval, scaling logarithmically rather than linearly.
- Distance Metric: Cosine Similarity / L2 Space
- Search Complexity: O(log N)
- RAM Strategy: Vector embeddings are pinned in memory for instant retrieval.
A SQLite 3 backend (WAL mode) manages the metadata graph.
- Delta Scanning: xxHash64-based change detection skips unmodified files.
- Atomic Commits: Transactions are batched to ensure database integrity during high-speed ingestion.
- Relational Metadata: Links hashes to filesystem paths, BPM analysis, and tonal descriptors.
The VST3 runtime serves as the operational surface, bringing semantic retrieval directly into the DAW environment (FL Studio, Ableton Live, Logic Pro).
The UI is rendered via a Chromium-based WebView2 instance, isolated from the audio thread.
- Bi-directional Protocol: A JSON-based event bus (BridgeManager) handles communication between the C++ core and the JavaScript frontend.
- Message Marshalling: All UI updates are dispatched via the JUCE Message Thread to ensure thread safety.
- Thread Isolation: Heavy indexing and precomputation occur on a dedicated background thread pool.
A lock-free playback engine enables instant auditioning of results.
- Non-Blocking I/O: Background workers handle file decoding into pre-allocated ring buffers.
- Zero-Allocation DSP: The audio thread reads from atomic buffers, preventing buffer underruns or DAW hanging.
- Seamless Auditioning: Automatic crossfading between preview sources prevents clicks during rapid browsing.
To support 60fps scrolling across large result sets, visual data (waveforms and heatmaps) is hydrated lazily.
- WaveformCache: Precomputes 1024-bin peak maps and 256-frame RMS envelopes.
- Intersection Loading: The UI only requests visual data for assets currently in the viewport.
- Disk-to-UI Streaming: Peaks are streamed from disk on-demand and cached in the browser's memory.
ProjSkep is designed around the psychology of creative focus, aiming to reduce "Retrieval Entropy."
By replacing folder navigation with semantic similarity, the system reduces the cognitive load associated with file management. The producer describes a sound (via text or reference file), and the system retrieves the closest perceptual match.
The system implements a feedback loop for retrieval failures (Phase 1.6 Beta):
- FALSE_COUSIN: Mathematically similar but perceptually unrelated results.
- DUPLICATE_SWAMP: Redundant file clusters masking unique assets.
- GHOST_RESULT: Missing files or corrupted metadata.
- PATHOLOGY_LEDGER: All reported failures are logged for forensic weight adjustment in the RetrievalFusion layer.
- Evidence over Excitement: Systems must prove operational usefulness before advancing.
- Spatial Trust: Semantic topologies must remain stable and learnable.
- Audio Integrity: All visual and indexing operations must remain isolated from the audio thread.
- Core Logic: C++20 / JUCE 8
- UI Layer: Vanilla JavaScript / HTML5 / CSS3 (WebView2)
- Database: SQLite 3 (Write-Ahead Logging)
- Hashing: xxHash (v0.8.2)
- Build System: CMake (FetchContent)
- Operating System: Windows 10/11
- Runtime: Microsoft Edge WebView2 Evergreen Runtime
- Hardware: SSD recommended for library indexing and waveform caching.
The project evolved from a collection of Python-based sorting scripts into a professional C++ VST3 runtime. This transition was driven by the need for "Inspiration Latency" reduction—minimizing the time between having a musical idea and finding the material to execute it.
Focus: High-performance ANN search, spectral analysis, and basic VST3 stability. Goal: Establish retrieval trust through measurable validation.
Focus: Intent-driven retrieval, session memory, and interactive spatial mapping. Goal: Transform retrieval into a collaborative cognitive substrate.
Copyright (c) 2026 SonicOS / ProjSkep. Developed as a high-fidelity creative infrastructure.