Hierarchical RAG architecture scaling to 693K chunks on consumer hardware (4GB VRAM). Features 3-address routing, hybrid vector+graph fusion, and SetFit classification.
-
Updated
Feb 11, 2026 - Python
Hierarchical RAG architecture scaling to 693K chunks on consumer hardware (4GB VRAM). Features 3-address routing, hybrid vector+graph fusion, and SetFit classification.
One-click Windows installer for Z-Image Turbo AI image generation. Optimized for low-VRAM GPUs (4GB+). Features Gradio web UI, automatic setup, and GGUF model support.
"Adaptive Hybrid Quantization Framework for deploying 7B+ LLMs on low-VRAM devices (e.g., GTX 1050). Features surgical block alignment and Numba-accelerated inference.
A ComfyUI Workflow for low vram users
Lightweight 6GB VRAM Gradio web app with auto-installer for running AuraFlow locally — no cloud, no clutter.
🎥 Generate high-quality videos on budget hardware with the Wan 2.2 14B Low-VRAM Workflow for ComfyUI, optimized for smooth performance and quick results.
Taiwanese Hokkien (Taigi) speech-to-text transcriber - MediaTek Breeze-ASR-26 with faster-whisper, tuned for RTX 3050 4GB low-VRAM GPUs. Gradio UI, CLI, Docker, SRT/VTT/TXT/JSON.
One-click installer for ACE-Step 1.5
Simple FP16 image upscaler for all GPUs (low-mid end users)
Contains the notebooks and workflows configured to run inference from Wan 2.2 Animate with ComfyUI on Kaggle T4 GPUs smoothly
Lightweight Stable Diffusion engine with plugin-based pipelines, VRAM-safe execution, and full 4GB GPU support.
Audit local LLM function calling and agentic reliability. Visual tool-use benchmarking for quantized models on YOUR hardware.
A privacy-first Generative AI pipeline for prototyping 3D-style game assets on consumer hardware. Optimized for low-VRAM (4GB) GPUs using PyTorch, Diffusers, and Streamlit.
Lightweight SDXL LoRA trainer optimized for 8GB VRAM GPUs. GUI with training, auto-captioning (Ollama) and image search (SearXNG).
Technical Showcase: 22B True-MoE Engine running on 6GB VRAM (GTX 1060). Demonstrates "Surgical" NF4 quantization, dynamic expert swapping, and the custom "Grace Hopper" pipeline.
A production-ready, frugal, sovereign AI system that orchestrates India's open-source language models to achieve state-of-the-art reasoning on consumer hardware through Test-Time Compute (TTC) and Cognitive Serialization.
🚀 Run modern 7B LLMs on legacy 4GB GPUs without crashes, breaking the VRAM barrier for developers facing GPU limitations.
Add a description, image, and links to the low-vram topic page so that developers can more easily learn about it.
To associate your repository with the low-vram topic, visit your repo's landing page and select "manage topics."