A research-driven multimodal biometric system combining Face Recognition, Voice Authentication, Liveness Detection, and Deepfake Detection. The core fusion engine is COSMIC (Categorical Optimal-equilibrium Sheaf-theoretic Multimodal Identity Correspondence) -- a deep equilibrium model that computes provably optimal identity embeddings via a closed-form convex energy minimizer.
Research Highlights:
- COSMIC (CGE): 6.0% EER / 98.1% AUC (449 identities, 6 datasets, 5-fold CV)
- MIMOP-FAME-CEBS: 14.59% EER production fusion (beats FAME 2024 winner's 19.9%)
- Cross-Modal Retrieval: R@5 = 29.4%, AUC = 81.2% on unseen identities from frozen backbones
- Ablation: Removing CGE solver collapses EER from 8.7% to 31.4% -- the formulation is essential
The identity embedding I* is defined as the unique minimizer of a strictly convex energy functional:
I* = argmin_I [ 1/2 I^T H I - b^T I ] = H^{-1} b (closed-form, unique)
where H is provably positive definite by construction (eigenvalue range [18.6, 94.4], condition number 5.1).
┌────────────────────────────────────────────────────────────────────────────┐
│ COSMIC FUSION ARCHITECTURE (v5.0) │
│ Categorical Optimal-equilibrium Sheaf-theoretic Fusion (CGE) │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ FROZEN BACKBONES │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ InsightFace │ │ ECAPA-TDNN │ │ Context Features │ │
│ │ ArcFace (512D) │ │ Voice (192D) │ │ eGeMAPS (88D) │ │
│ │ [frozen] │ │ [frozen] │ │ Prosody (103D) │ │
│ └────────┬─────────┘ └────────┬─────────┘ │ Face AU (27D) │ │
│ │ │ └──────────┬───────────┘ │
│ ▼ ▼ ▼ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ KA-SPLINE PROJECTIONS (Kolmogorov-Arnold) │ │
│ │ face_h = KASpline(512 -> hidden) │ │
│ │ voice_h = KASpline(192 -> hidden) │ │
│ └────────┬──────────────────────┬───────────────────────────────┘ │
│ │ │ │ │
│ │ │ ▼ │
│ │ │ ┌──────────────────────┐ │
│ │ │ │ VIB Context Bottleneck│ │
│ │ │ │ 218D -> mu,sigma │ │
│ │ │ │ z ~ N(mu, sigma^2) │ │
│ │ │ │ -> 64D (disentangled)│ │
│ │ │ │ KL(q||p) penalty │ │
│ │ │ └──────────┬───────────┘ │
│ │ │ │ │
│ ┌────────▼──────────────────────▼───────────────────────▼────────┐ │
│ │ CATEGORICAL KAN EXTENSION FUSION │ │
│ │ │ │
│ │ 6 Cross-Modal Morphisms: phi_fv, phi_vf, phi_fc, phi_cf, │ │
│ │ phi_vc, phi_cv │ │
│ │ │ │
│ │ Equalizer Violation: C = sum ||phi_ij(m_i) - m_j||^2 │ │
│ │ (enforces cross-modal coherence via categorical limit) │ │
│ │ │ │
│ │ f_kan = Perceiver cross-attention fusion target │ │
│ └────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────▼───────────────────────────────────┐ │
│ │ CONVEX ENERGY CGE SOLVER │ │
│ │ │ │
│ │ H = W^T W + beta^2 rho_f^T rho_f │ │
│ │ + gamma^2 rho_v^T rho_v + epsilon I │ │
│ │ │ │
│ │ b = alpha * f_kan + beta^2 * rho_f^T * h_f │ │
│ │ + gamma^2 * rho_v^T * h_v │ │
│ │ │ │
│ │ I* = H^{-1} b (unique global minimum, closed-form) │ │
│ │ I* = I* / ||I*|| (projected to unit sphere S^63) │ │
│ │ │ │
│ │ Effective rank: 62.1/64 | H cond number: 5.1 │ │
│ └────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────▼───────────────────────────────────┐ │
│ │ 9-TASK DB-MTL TRAINING (Dynamic Barrier Multi-Task Learning) │ │
│ │ │ │
│ │ 1. CGE convergence 4. Angular (ArcFace) 7. Integrated │ │
│ │ 2. Sheaf (H0+H1) 5. Metric (triplet) 8. Coherence │ │
│ │ 3. Persistent homol. 6. Regularization 9. SNFC │ │
│ │ │ │
│ │ + Conflict-Averse Gradients [CAGrad] + Kendall weighting │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ COMPACT VARIANT (compact=True): │
│ Strips morphisms, sheaf, persistent cohomology, SNFC. │
│ Retains: CGE solver, KA-Spline, VIB, cross-attention, angular, metric. │
│ 6 tasks instead of 9. Used for extended multi-dataset evaluation. │
└────────────────────────────────────────────────────────────────────────────┘
The SNFC module addresses morphism mean-collapse where MSE-trained morphisms predict population centroids instead of per-identity targets. It uses:
- Z-score normalization per morphism for heterogeneous scale alignment
- Cosine contrastive loss (margin=0.1) that pushes
phi_fv(face_A)towardvoice_Aand away fromvoice_B - Fisher ratio monitoring for real-time discriminability tracking (Fisher v_fv: 0.0016 -> 1.10, a 695x improvement)
┌─────────────────────────────────────────────────────────────────────────┐
│ PRODUCTION CASCADED PIPELINE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ STAGE 0: LIVENESS GATE (ViT-DINO + MediaPipe EAR + Frequency) │
│ vit_liveness.onnx | 9.66ms | 85.9% accuracy │
│ [SPOOF?] --> REJECT │
│ │
│ STAGE 1: DEEPFAKE GATE (Frequency + Texture Consistency) │
│ deepfake_detector.onnx | 10.47ms | 85.9% val acc │
│ [FAKE?] --> REJECT │
│ │
│ STAGE 2: EMBEDDING EXTRACTION │
│ InsightFace ArcFace (512D) + ECAPA-TDNN (192D) │
│ │
│ STAGE 3: MIMOP-FAME-CEBS FUSION (14.59% EER) │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ Face Only │ │ Voice Only │ │ Face + Voice │ │
│ │ (6.8 MB) │ │ (6.5 MB) │ │ (10.1 MB) │ │
│ │ ONNX │ │ ONNX │ │ ONNX │ │
│ └──────────────┘ └──────────────┘ └───────────────────┘ │
│ MIMOP: HSIC identity disentanglement │
│ CEBS-DART: Dempster-Shafer evidential belief fusion │
│ │
│ STAGE 4: CONFIDENCE GATE │
│ 40% < confidence < 60% --> LLM Cognitive Challenge │
│ (Groq LLaMA-3.1 for complex answer verification) │
│ │
│ OUTPUT: GRANTED | DENIED | REQUIRE 2FA │
└─────────────────────────────────────────────────────────────────────────┘
| Configuration | Identities | Datasets | EER (%) | AUC (%) |
|---|---|---|---|---|
| Full COSMIC | 200 | MAV-Celeb | 8.8 +/- 0.9 | 96.8 +/- 0.7 |
| Compact + Extended | 449 | 6 datasets | 6.0 +/- 0.6 | 98.1 +/- 0.2 |
| Zero-shot CREMA-D | 91 | CREMA-D | 5.6 +/- 0.5 | 98.8 +/- 0.2 |
| Variant | EER (%) | AUC (%) | Params |
|---|---|---|---|
| B1-ConcatMLP | 38.1 +/- 2.1 | 67.1 +/- 2.7 | ~327K |
| B2-ScoreFusion | 17.7 +/- 0.6 | 90.4 +/- 0.7 | ~199K |
| B3-CrossAttention | 25.1 +/- 1.6 | 82.6 +/- 1.7 | ~317K |
| A2-NoDEQ (COSMIC w/o CGE) | 31.4 +/- 2.2 | 75.3 +/- 2.5 | ~1.1M |
| A1-Full COSMIC | 8.7 +/- 0.1 | 96.9 +/- 0.1 | ~1.1M |
Removing the CGE solver collapses EER from 8.7% to 31.4%, confirming the convex energy formulation is essential.
| Metric | Value |
|---|---|
| Face-to-Voice R@5 | 29.4% |
| Face-to-Voice R@10 | 51.3% |
| Cross-Modal AUC | 81.2% |
| Chance R@5 (90 IDs) | 5.6% |
| Component | Latency | EER |
|---|---|---|
| MIMOP Authenticator | 2.78 ms | 14.59% |
| ViT Liveness | 9.66 ms | - |
| Deepfake Detector | 10.47 ms | - |
| Full Pipeline | ~24 ms | - |
MIMOP-FAME-CEBS beats the FAME 2024 challenge winner (19.9% EER) by 26.6% relative improvement.
| Model | Dataset | Size | Metric | Purpose |
|---|---|---|---|---|
cosmic_v4_identity_free_*.pt |
6 datasets (449 IDs) | ~1.2M params | 6.0% EER | COSMIC CGE research (5 folds) |
compact_cosmic_mavceleb_only_best.pt |
MAV-Celeb | ~0.8M params | 8.8% EER | Compact variant |
mimop_fame_cebs_fused.onnx |
MAV-Celeb v3 | 10.1 MB | 14.59% EER | Face+Voice ONNX production |
mimop_fame_cebs_face_only.onnx |
MAV-Celeb v3 | 6.8 MB | - | Face-only ONNX |
mimop_fame_cebs_voice_only.onnx |
MAV-Celeb v3 | 6.5 MB | - | Voice-only ONNX |
cebs_darts_best.pt |
MAV-Celeb | - | - | CEBS-DARTS architecture search |
edisc_mimop_cebs_final.pt |
MAV-Celeb | - | - | EDISC-MIMOP-CEBS fusion |
vit_liveness.onnx |
CelebA-Spoof | ~50 MB | 85.9% acc | Spatial liveness gate |
deepfake_detector.onnx |
FaceForensics++ | ~40 MB | 85.9% val | Deepfake detection gate |
backend/
core/
cosmic_sota_v3.py # COSMIC V3 - primary model (9 DB-MTL tasks, SNFC, CGE)
cosmic_sota.py # COSMIC V1 baseline
cosmic_authenticator.py # Production wrapper with audit trail
cosmic_losses.py # Loss functions
detector.py # InsightFace face detection (buffalo_l)
embedder.py # Face embeddings (ArcFace, 512D)
voice_embedder.py # Voice embeddings (ECAPA-TDNN, 192D)
recognizer.py # Face recognition pipeline
liveness.py # Liveness detection
deepfake_spatial.py # Spatial deepfake features
deepfake_temporal.py # Temporal deepfake features
context_disentanglement.py # LEACE/SPLICE/ConditionalLEACE, GeneralizableXMAligner
mimop_fame_cebs_authenticator.py # MIMOP-FAME-CEBS production fusion
...
api/
authenticate.py # Main authentication endpoint
enroll.py # Face/voice enrollment
identify.py # Face identification
gallery.py # Gallery management
admin.py # Admin dashboard
...
db/
models.py # SQLAlchemy ORM (PostgreSQL + pgvector)
session.py # Database session management
crud.py # CRUD operations
scripts/
train_cosmic_unified.py # UnifiedCOSMICClassifier training
train_cosmic_continuous.py # COSMIC continuous training (5-fold)
train_cosmic_lite.py # Minimal reproducible variant
run_comprehensive_ablation.py # Full ablation suite (A1-A7, B1-B4)
run_ablation_study.py # Ablation runner
evaluate_checkpoint.py # Deterministic evaluation
extract_all_features_v2.py # Feature extraction pipeline
finetune_cosmic_v3_local.py # Local fine-tuning
build_gallery.py # Gallery builder
benchmark.py # Performance benchmarks
biometric_evaluation.py # FAR/FRR/EER evaluation
convert_to_onnx.py # PyTorch -> ONNX conversion
train_cebs_darts.py # CEBS-DARTS NAS training
train_edisc_mimop.py # EDISC-MIMOP training
...
tests/ # 158+ tests (pytest)
frontend-next/ # Next.js React frontend
config/thresholds.yaml # Threshold configuration
COSMIC/ # COSMIC research docs & visualizations
models/ # Trained model checkpoints
database/ # Enrollment images & training datasets
| Component | Reference | Role |
|---|---|---|
| CGE Solver (Convex Global Equilibrium) | Bai et al. NeurIPS 2019 | Identity as unique minimizer of convex energy; closed-form I* = H^{-1}b with provable optimality |
| Categorical Kan Extension | Mac Lane 1971; Shiebler 2022 | Cross-modal fusion via right Kan extension; 6 morphisms enforce categorical coherence |
| KA-Spline Projections | Kolmogorov-Arnold representation | Learnable spline-based input projections replacing linear layers |
| VIB Context Bottleneck | Alemi et al. ICLR 2017 | Variational Information Bottleneck strips identity from context (14.7% probe acc) |
| SNFC | Novel | Sheaf-Normalized Fisher Coherence: cosine contrastive morphism training + Fisher monitoring |
| DB-MTL | Liu et al. ICLR 2021 | Dynamic Barrier Multi-Task Learning for 9-task loss balancing |
| CAGrad | Liu et al. NeurIPS 2021 | Conflict-Averse Gradient Descent for multi-task optimization |
| ConditionalLEACE | Ravfogel et al. ICML 2022 | Within-identity nuisance erasure for cross-modal projection |
| Component | Reference | Role |
|---|---|---|
| MIMOP | Gretton et al. 2005 | HSIC-based identity-modality orthogonal projection |
| FAME 2024 | ACM MM 2024 Challenge | Multi-similarity + cross-modal contrastive |
| CEBS-DART | Novel | Confidence-based Evidential Belief + Dempster-Shafer uncertainty |
| SPOS | Guo et al. 2020 | Single-path NAS avoiding DARTS Matthew Effect |
- Face Recognition: InsightFace ArcFace ResNet-100 (512D embeddings, frozen backbone)
- Voice Authentication: SpeechBrain ECAPA-TDNN (192D embeddings, frozen backbone)
- Context Features: eGeMAPS (88D) + Prosody (103D) + Face AU/Pose/Gaze (27D)
- Liveness Detection: ViT-DINO spatial + MediaPipe EAR blink + Frequency/texture analysis
- Deepfake Detection: Frequency domain GAN fingerprinting + texture consistency
- PostgreSQL + pgvector: HNSW-indexed vector similarity search
- ONNX Runtime: GPU-accelerated inference (~24ms full pipeline)
- Docker Compose: Full-stack containerized deployment
- Rate Limiting: IP-based throttling with exponential backoff
| Component | Technology |
|---|---|
| Research Model | COSMIC with CGE Solver (PyTorch, ~1.2M params) |
| Production Fusion | MIMOP-FAME-CEBS (ONNX Runtime) |
| Face Backbone | InsightFace ArcFace ResNet-100 |
| Voice Backbone | SpeechBrain ECAPA-TDNN |
| Backend | FastAPI + SQLAlchemy (Async) |
| Database | PostgreSQL + pgvector (HNSW) / SQLite fallback |
| Frontend | Next.js (React + TypeScript + Tailwind) |
| Container | Docker Compose |
| Security | Bcrypt, Rate Limiting, Session Tokens |
| Endpoint | Method | Description |
|---|---|---|
/identify/ |
POST | Multi-modal identification (Rate Limited) |
/admin/login |
POST | Secure admin login (Bcrypt) |
/admin/logs |
GET | Retrieve security logs |
/admin/stats |
GET | System statistics |
/enroll/ |
POST | Enroll new users/voices |
/gallery/rebuild |
POST | Rebuild embeddings gallery |
- Docker Desktop (WSL2 backend recommended)
- Git LFS (for large model files)
- NVIDIA GPU with CUDA 11.8+ (for training; inference works on CPU)
Place user images in database/<username>/ and voice samples (optional) in database/<username>/voice/.
docker compose up --build- Live Scanner: http://localhost:3000
- Admin Panel: http://localhost:3000/admin
- API Docs: http://localhost:8001/docs
# Extract features (cached in backend/cache/all_features_v2.pkl)
python backend/scripts/extract_all_features_v2.py
# Train full COSMIC V3 (5-fold, ~2h/fold on RTX 4060)
python -u backend/scripts/train_cosmic_continuous.py
# Run ablation study (A1-A7, B1-B4)
python -u backend/scripts/run_ablation_study.py --variants all --folds 3 --epochs 80| Property | Value | Significance |
|---|---|---|
| Effective rank | 62.1 / 64 | Near-full dimensional utilization, no collapse |
| H condition number | 5.1 | Well-conditioned Hessian |
| H eigenvalue range | [18.6, 94.4] | Strict convexity confirmed |
| VIB active dims | 64 / 64 | All bottleneck dimensions utilized |
| Context -> Identity probe | 14.7% | Context successfully disentangled from identity |
| Identity -> Identity probe | 100.0% | Identity signal fully preserved |
Problem: MSE-trained cross-modal morphisms predict population centroids (conditional mean), not per-identity targets. Output variance ratio = 0.44 (<<1). Solution: SNFC module with cosine contrastive loss. Fisher ratio improved 695x (0.0016 -> 1.10), attack classification 47% -> 81.8%.
Problem: InfoNCE on 300 train IDs gives Train R1=89.8% but Val R1=1.0% -- massive overfitting. Solution: GeneralizableXMAligner with MSE centroid alignment + VICReg + OPL. Replaced O(n^2) contrastive with O(n*d) losses.
Problem: Coherence MSE hinge dominates morphism gradients (93% by epoch 5), drowning SNFC signal.
Solution: Wrapped impostor violation computations in torch.no_grad(). Only SNFC's cosine contrastive loss trains morphisms.
Problem: SpeechBrain returns [1,1,192] while InsightFace returns (512,), causing silent broadcasting errors.
Solution: Strict tensor normalization decorator enforcing shape consistency before fusion.
Problem: First inference request takes ~4s (lazy CUDA initialization). Solution: Model warmup routine during container startup lifecycle event.
| Setting | Value | Description |
|---|---|---|
| DB Backend | PostgreSQL | Auto-fallback to SQLite |
| Face Weight | 0.85 | Primary biometric factor |
| Voice Weight | 0.15 | Secondary factor |
| Rate Limit | 120/min | Per-IP throttling |
| Liveness | Blink | Required for "Live" status |
docker compose run --rm backend pytest tests/ -v
# 158+ tests covering biometrics, enrollment, gallery, admin, liveness, deepfake detectionMIT License





