A tool for measuring and comparing the political leanings of large language models using a structured survey methodology.
Models are tested against 32 political compass questions and plotted on a 2D axis — Economic Left/Right × Libertarian/Authoritarian — giving each model a visual fingerprint of its ideological tendencies.
- Multi-provider support — OpenAI, Gemini, Claude, Grok, OpenRouter, and local Ollama models
- Leaderboard — compare all tested models on a single compass chart with confidence intervals and robustness scores
- Benchmark runner — run models against multiple paraphrase variants to measure answer stability
- Community questions — submit and vote on new questions via an AI-judged quality pipeline
- Single model test — run any model interactively and see its position in real time
- Backend: Python + FastAPI + SQLite (aiosqlite)
- Frontend: React 18 + SVG-based compass visualization
git clone https://github.com/Weiykong/LLM_compass.git
cd LLM_compass
cp .env.example .env
# Fill in your API keys in .envcd backend
pip install -r requirements.txt
uvicorn app:app --reloadcd frontend
npm install
npm startThe app runs at http://localhost:3000, API at http://localhost:8000.
Configure which models to benchmark in .env:
BENCHMARK_OPENAI_MODELS=gpt-4o-mini,gpt-4.1-mini
BENCHMARK_GEMINI_MODELS=gemini-2.5-flash
BENCHMARK_OPENROUTER_MODELS=meta-llama/llama-3.1-8b-instruct,...
Local models via Ollama work out of the box with no API key.
Each question has a direction (agree_left) indicating whether agreeing moves a model left or right on the economic axis, and libertarian or authoritarian on the social axis.
Responses map to weights: SA=1.0, A=0.5, D=-0.5, SD=-1.0. Scores are averaged and scaled to a −10 to +10 range per axis.
Robustness score measures how consistent a model is across paraphrase variants of the same questions.