Reverse proxy for Perplexity.ai — use your existing Pro/Max subscription cookie to access all models via standard APIs.
Exposes three interfaces:
- OpenAI-compatible REST API (
/v1/chat/completions) — streaming, tool calling, thinking - MCP server (Streamable HTTP + SSE) — 5 built-in tools
- Debug chat UI (
/chat) — test everything with real-time OpenAI format validation
Perplexity's web frontend talks to its backend through an internal SSE endpoint (/rest/sse/perplexity_ask). This proxy authenticates with your session cookie via curl_cffi (Chrome TLS fingerprinting), translates requests/responses into OpenAI and MCP formats, and keeps your session alive automatically.
No official API key needed — just your subscription.
All queries use search_focus: "internet" — Perplexity's built-in web search is always active, so models return real-time data (stock prices, weather, news) directly in their answers.
- Full OpenAI format compliance —
system_fingerprint,logprobs, properusagearithmetic, all fields per spec - Tool calling — OpenAI-style function calling via prompt injection with 3-layer false-positive defense
- Thinking/reasoning —
thinking: trueorreasoning_effortparam, reasoning streamed asreasoning_content - Account tier support — free/pro/max — only exposes models your tier can access
- Auto-discovery — background task checks model health every 24h, auto-upgrades when versions change
- Response cleaning — strips Perplexity citations
[1][2],<grok:*>tags,<?xml?>declarations,<script>tags - Rate limit tracking — tracks Pro Search quota, auto-fallback to free model when exhausted, notices at every 5th decrement
- Session continuity — tracks Perplexity
backend_uuidso follow-up turns skip history/instructions entirely, sending only the new query - Session keep-alive — periodic pings prevent cookie expiry
- Push notifications — ntfy.sh alerts on cookie expiry or model upgrades
- Debug chat UI —
/chatpage with tools toggle, thinking toggle, streaming toggle, and OpenAI format validator - Dynamic model management — add/remove models at runtime via admin API
- Full input validation — proper error messages for every malformed request
git clone https://github.com/jamie950315/pplx-proxy.git
cd pplx-proxy
python3 -m venv venv
venv/bin/pip install -r requirements.txt
cp .env.example .env
# Edit .env — set PPLX_COOKIE and ACCOUNT_TYPE
venv/bin/uvicorn server:app --host 0.0.0.0 --port 8892Then open http://localhost:8892/chat to test with the debug UI.
- Log in to perplexity.ai
- F12 → Application → Cookies →
www.perplexity.ai - Copy
__Secure-next-auth.session-token - Set
PPLX_COOKIE=<value>in.env
| Model ID | Backend | Tier | Thinking Variant |
|---|---|---|---|
auto |
Perplexity Best | free+ | — |
sonar |
Sonar | pro+ | — |
gpt |
GPT-5.4 | pro+ | gpt54_thinking |
sonnet |
Claude Sonnet 4.6 | pro+ | claude46sonnetthinking |
gemini |
Gemini 3.1 Pro | pro+ | always on |
nemotron |
Nemotron 3 Super | pro+ | always on |
opus |
Claude Opus 4.6 | max | claude46opusthinking |
Thinking variants are activated via thinking: true or reasoning_effort parameter — no separate model names needed.
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/health |
No | Health check |
GET |
/chat |
No | Debug chat UI with OpenAI format validator |
GET |
/v1/models |
Yes | List tier-available models |
POST |
/v1/chat/completions |
Yes | Chat (streaming + non-streaming + tools + thinking) |
POST |
/v1/responses |
Yes | OpenAI Responses API compatibility (used by LobeHub web search) |
POST |
/<api-key>/mcp |
Key in URL | MCP Streamable HTTP |
GET |
/<api-key>/sse |
Key in URL | MCP SSE |
GET |
/admin/models |
Yes | Full model map |
POST |
/admin/update-models |
Yes | Add/replace models |
POST |
/admin/refresh-cookie |
Yes | Inject new session token |
POST |
/admin/discover-models |
Yes | Run model discovery |
# Basic chat
curl -X POST http://localhost:8892/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "sonnet", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
# With thinking
curl -X POST http://localhost:8892/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt", "messages": [{"role": "user", "content": "Analyze X"}], "thinking": true}'
# With tool calling
curl -X POST http://localhost:8892/v1/chat/completions \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonnet",
"messages": [{"role": "user", "content": "Weather in Tokyo"}],
"tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}]
}'Open http://localhost:8892/chat (or https://your-domain/chat) in a browser:
- Toggle Tools ON/OFF to test tool calling
- Toggle thinking to test reasoning mode
- Toggle stream for streaming vs non-streaming
- Raw tab: shows full request/response JSON
- Format ✓ tab: validates every response field against the OpenAI spec with PASS/FAIL badges
The API key is part of the URL path for MCP authentication:
# Claude Code
claude mcp add pplx-proxy --transport http http://localhost:8892/YOUR_API_KEY/mcp
# SSE transport
# Connect to http://localhost:8892/YOUR_API_KEY/sseWithout PPLX_PROXY_API_KEY set, MCP falls back to unauthenticated /mcp/mcp and /sse/sse.
MCP Tools:
| Tool | Description |
|---|---|
perplexity_search |
Pro Search with model/source selection |
perplexity_ask |
Quick auto-mode Q&A |
perplexity_reason |
Reasoning with model selection |
perplexity_research |
Deep Research |
perplexity_models |
List available models for your tier |
All responses strictly match the OpenAI Chat Completions API spec:
id(chatcmpl-*),object,created,model,system_fingerprint(null)choices[].index,choices[].logprobs(null),choices[].finish_reasonusage.total_tokens=prompt_tokens+completion_tokens- Streaming: consistent
id,system_fingerprintin every chunk, proper[DONE]termination - Tool calls:
id(call_*),type(function),function.name,function.arguments(valid JSON string)
Use /chat to visually verify — the Format ✓ tab runs 20+ checks per response.
Every PROBE_INTERVAL_HOURS (default 24h), pplx-proxy checks if models are still alive. If one dies, it increments the version number (e.g., gpt54 → gpt55 → ... up to +1.0) and auto-upgrades. Thinking variants are auto-derived from _THINKING_MAP.
Manual trigger: POST /admin/discover-models
| Variable | Default | Description |
|---|---|---|
PPLX_COOKIE |
— | Session token (required) |
PPLX_PROXY_API_KEY |
— | Bearer auth (empty = no auth) |
ACCOUNT_TYPE |
pro |
free, pro, or max |
DEFAULT_MODEL |
gpt |
Default when not specified |
PPLX_PROXY_PORT |
8892 |
Listen port |
CUSTOM_PROMPTS |
file | Local prompt block prepended to every LobeHub request |
KEEPALIVE_HOURS |
6 |
Session ping interval |
PROBE_INTERVAL_HOURS |
24 |
Auto-discovery interval |
NTFY_TOPIC |
pplx-proxy |
ntfy.sh topic |
NTFY_URL |
https://ntfy.sh |
ntfy server URL |
NTFY_COOLDOWN_SECS |
3600 |
Min interval between alerts |
PUBLIC_URL |
http://localhost:8892 |
URL in ntfy messages |
PPLX_API_VERSION |
2.18 |
Perplexity internal API ver |
PPLX_IMPERSONATE |
chrome |
curl_cffi TLS fingerprint |
USER_AGENT |
Chrome/130 | HTTP User-Agent |
COOKIE_MAX_AGE_HOURS |
168 |
Cookie cache max age |
LOG_LEVEL |
INFO |
Logging level |
sudo cp pplx-proxy.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now pplx-proxyManual inject → keep-alive every 6h → session stays alive indefinitely
↓ (if Perplexity force-revokes)
ntfy alert → manual re-inject
Re-inject without SSH:
curl -X POST https://your-domain/admin/refresh-cookie \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"session_token": "NEW_TOKEN"}'Why models say "I can't access real-time data": This proxy must handle three issues that cause Perplexity models to ignore their own search results:
-
search_focus: "internet"must be set in every request. Without it, Perplexity defaults to"writing"mode where models don't incorporate search results. This is the single most important parameter. -
System prompts must be stripped or replaced before sending to Perplexity. Perplexity searches ALL query text — if the system prompt says "You are an AI assistant", Perplexity finds chatbot tutorial pages and the model gets confused. Generic clients keep only whitelist-approved lines; LobeHub requests discard upstream prompt content entirely.
-
LobeHub requests always prepend local
CUSTOM_PROMPTS. The proxy still detectsrole: developerand system-prompt-like user messages so it can classify the request source, but those upstream prompt blocks are never forwarded. Each LobeHub turn sendsinstructions=[CUSTOM_PROMPTS]plus preservedhistoryand currentquery. -
Rate limit tracking uses FlareSolverr (localhost:8191) to poll Perplexity's
/rest/rate-limit/allendpoint with the session cookie. Requires FlareSolverr running locally. Whenremaining_proreaches 0, all non-auto models fall back toauto(free tier).
See CLAUDE.md for the full technical breakdown and MANUAL.md troubleshooting section for diagnosis steps.
Unofficial reverse proxy for personal use. Relies on Perplexity's internal web API which may change without notice. Use responsibly.
MIT