🎯 LLM Optimizer

Find the optimal LLM configuration for your GPU's VRAM

A lightweight web application that helps you determine the best Large Language Model, quantization level, and context size based on your available GPU memory.

🌟 Features

🎮 GPU Presets

Quick selection for popular GPUs:

Consumer NVIDIA RTX 50 series: RTX 5060 (8 GB), RTX 5060 Ti (16 GB), RTX 5070 (12 GB), RTX 5070 Ti (16 GB), RTX 5080 (16 GB), RTX 5090 (32 GB)
Consumer NVIDIA RTX 40 series: RTX 4080 (16 GB), RTX 4090 (24 GB)
Data Center NVIDIA: A100 40/80 GB, L40S 48 GB, H100 80 GB, H100 NVL 94 GB, H200 141 GB, B100 192 GB, B200 192 GB, B300 Ultra 288 GB
Data Center AMD: MI300X 192 GB, MI325X 256 GB, MI355X 288 GB

🤖 Supported Models (April 2026)

Category	Models
Tiny (< 5B)	Qwen3.5 0.8B/2B/4B, Llama 3.2 1B/3B, Gemma 3 1B/4B, Phi-4 Mini 3.8B
Small (5–15B)	Qwen3.5 9B, Gemma 3 12B, Mistral Nemo 12B, Ministral 8B, Phi-4 14B
Medium (15–50B)	Qwen3.6 27B, Qwen3.5 27B, Qwen3.6 35B-A3B (MoE), Qwen3.5 35B-A3B (MoE), Gemma 3 27B, Mistral Small 4 (MoE)
Coding	Qwen3-Coder 30B, Devstral 2 (123B dense)
Vision	Qwen3-VL 32B/235B, Llama 3.2 11B Vision, Pixtral 12B
Large (50–150B)	Qwen3.5 122B-A10B (MoE), GLM-4.5 Air (MoE 12B active), Llama 4 Scout (MoE 17B active), Mistral Large 2 (123B)
Huge (150B+)	MiniMax M2.7 (MoE 10B active), DeepSeek-V4-Flash (MoE 13B active), Qwen3.5 397B-A17B (MoE), GLM-5.1 (MoE 40B active), Kimi K2.6 1T (MoE 32B active), DeepSeek-V4-Pro 1.6T (MoE 49B active)

⚙️ Quantization Support

FP16: Maximum precision (2 bytes/param)
FP8: Good quality/performance balance (1 byte/param)
FP4: Maximum VRAM savings (0.5 bytes/param)

🎯 Optimization Modes

Balanced: Best overall compromise
Largest Model: Prioritizes model parameter count
Maximum Context: Optimizes for longest context window
Best Quality: Minimizes quantization

🌐 Multi-language

English (default)
French
Language preference saved in cookies

🚀 Quick Start

Using Docker Compose (Recommended)

git clone https://github.com/YOUR_USERNAME/llm-optimizer.git
cd llm-optimizer
docker-compose up -d

Access at: http://localhost:8080

Using Docker

docker build -t llm-optimizer .
docker run -d -p 8080:80 llm-optimizer

Without Docker

Requirements: PHP 7.4+

php -S 0.0.0.0:8080

📐 How It Works

Calculation Formula

Total VRAM = (Parameters × Precision Factor) + (Context Size × KV_per_token)

Precision Factors:

FP16: 2
FP8: 1
FP4: 0.5

KV Cache per token (scales with model size via GQA):

kv_per_token = max(0.08, 0.04 × √params_B)  MB/token

This sqrt scaling reflects that modern models use Group Query Attention (GQA), where the number of KV heads grows much slower than total parameters. Calibrated values:

Model size	KV cache/token
8B	~0.11 MB
14B	~0.15 MB
32B	~0.23 MB
70B	~0.33 MB

Example Calculations

Qwen3.6 27B in FP4 with 32K context on 16 GB GPU:

Model: 27B × 0.5 = 13.5 GB
KV cache: 32,768 × 0.00021 = 6.9 GB → too tight; FP4 with 8K context fits
For 32K context, drop to a 14B-class model (Phi-4 14B FP4 ≈ 11.9 GB total)

GLM-5.1 (754B MoE) in FP4 on 8× H200 (1128 GB total):

Model: 754B × 0.5 = 377 GB (full weights resident across GPUs)
Plenty of headroom for 256K+ context

Algorithm

For each model and quantization level:
- Calculate model memory: params × precision_factor
- Calculate available context memory: (vram × 0.95) - model_memory
- Compute KV cost: max(0.00008, 0.00004 × √params) GB/token
- Find maximum context: context_memory / kv_per_token
- Validate against minimum context constraint
Score configurations based on priority:
- Balanced: (params × 100) + (context / 100) - (precision × 50)
- Model: (params × 1000) - (precision × 100) + (context / 1000)
- Context: (context × 1000) + params
- Quality: ((3 - precision) × 10000) + (params × 100) + (context / 1000)
Return top 3 diversified recommendations + additional viable configurations

🏗️ Architecture

Backend: PHP 8.2-FPM
Web Server: Nginx (Alpine)
Base Image: Alpine Linux
Image Size: ~50 MB
Memory Usage: ~10 MB RAM
Response Time: <100 ms

Project Structure

llm-optimizer/
├── index.php              # Main application
├── Dockerfile             # Container definition
├── docker-compose.yml     # Local development
├── nginx.conf             # Web server config
├── start.sh               # Startup script
└── README.md

🧪 Testing

# Build and test
docker build -t llm-optimizer:test .
docker run -d -p 8888:80 --name test llm-optimizer:test

# Health check
curl http://localhost:8888

# Cleanup
docker stop test && docker rm test

🌐 Deployment

General Requirements

Docker support
Port 80 available (or custom port mapping)
Minimal resources: 128 MB RAM, 0.1 CPU

Platform-Specific Guides

Docker Swarm

docker service create \
  --name llm-optimizer \
  --publish 80:80 \
  --replicas 2 \
  llm-optimizer:latest

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-optimizer
spec:
  replicas: 2
  selector:
    matchLabels:
      app: llm-optimizer
  template:
    metadata:
      labels:
        app: llm-optimizer
    spec:
      containers:
      - name: llm-optimizer
        image: llm-optimizer:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: llm-optimizer
spec:
  selector:
    app: llm-optimizer
  ports:
  - port: 80
    targetPort: 80
  type: LoadBalancer

Docker Compose Production

version: '3.8'
services:
  app:
    image: llm-optimizer:latest
    restart: always
    ports:
      - "80:80"
    deploy:
      resources:
        limits:
          memory: 256M
          cpus: '0.5'

Reverse Proxy

The application works behind any reverse proxy (Traefik, Nginx, Caddy). It listens on port 80 and supports health checks at /.

🔧 Configuration

Environment Variables

None required. The application is stateless and requires no configuration.

Custom Models

To add your own models, edit index.php:

$models = [
    ['name' => 'Your Model', 'params' => 13, 'tags' => ['code']],
    // tags: 'code', 'vision', 'reasoning', 'multilingual'
];

Custom Port

In docker-compose.yml:

ports:
  - "YOUR_PORT:80"

📊 Use Cases

Example 1: RTX 5070 Ti Owner (16 GB)

Question: "What can I run with decent context?"

Results:

⭐ Phi-4 14B (FP4) → 32K context
✓ Gemma 3 12B (FP8) → 64K context
✓ Qwen3.5 9B (FP4) → 64K context

Example 2: Data Center Deployment (H200 141 GB)

Question: "Largest model with 32K+ context?"

Results:

⭐ Qwen3.5 122B-A10B (FP4) → 32K context
✓ Mistral Large 2 (FP4) → 64K context

Example 3: Maximum Context Priority

Question: "Longest possible context window on 16 GB?"

Results:

⭐ Qwen3.5 2B (FP4) → 512K context
✓ Llama 3.2 3B (FP4) → 512K context

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Adding New Models

Edit the $models array in index.php
Test locally
Submit PR with model name, parameter count, and tags

Adding Languages

Add translation array in index.php
Add language selector button
Test all pages

📝 License

MIT License - feel free to use this project for any purpose.

📧 Support

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

Made with ❤️ for the LLM community

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
index.php		index.php
nginx.conf		nginx.conf
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

🎯 LLM Optimizer

🌟 Features

🎮 GPU Presets

🤖 Supported Models (April 2026)

⚙️ Quantization Support

🎯 Optimization Modes

🌐 Multi-language

🚀 Quick Start

Using Docker Compose (Recommended)

Using Docker

Without Docker

📐 How It Works

Calculation Formula

Example Calculations

Algorithm

🏗️ Architecture

Project Structure

🧪 Testing

🌐 Deployment

General Requirements

Platform-Specific Guides

Reverse Proxy

🔧 Configuration

Environment Variables

Custom Models

Custom Port

📊 Use Cases

Example 1: RTX 5070 Ti Owner (16 GB)

Example 2: Data Center Deployment (H200 141 GB)

Example 3: Maximum Context Priority

🤝 Contributing

Adding New Models

Adding Languages

📝 License

📧 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages