Skip to content

jamie950315/pplx-proxy

Repository files navigation

pplx-proxy

Reverse proxy for Perplexity.ai — use your existing Pro/Max subscription cookie to access all models via standard APIs.

Exposes three interfaces:

  • OpenAI-compatible REST API (/v1/chat/completions) — streaming, tool calling, thinking
  • MCP server (Streamable HTTP + SSE) — 5 built-in tools
  • Debug chat UI (/chat) — test everything with real-time OpenAI format validation

How It Works

Perplexity's web frontend talks to its backend through an internal SSE endpoint (/rest/sse/perplexity_ask). This proxy authenticates with your session cookie via curl_cffi (Chrome TLS fingerprinting), translates requests/responses into OpenAI and MCP formats, and keeps your session alive automatically.

No official API key needed — just your subscription.

All queries use search_focus: "internet" — Perplexity's built-in web search is always active, so models return real-time data (stock prices, weather, news) directly in their answers.

Features

  • Full OpenAI format compliancesystem_fingerprint, logprobs, proper usage arithmetic, all fields per spec
  • Tool calling — OpenAI-style function calling via prompt injection with 3-layer false-positive defense
  • Thinking/reasoningthinking: true or reasoning_effort param, reasoning streamed as reasoning_content
  • Account tier support — free/pro/max — only exposes models your tier can access
  • Auto-discovery — background task checks model health every 24h, auto-upgrades when versions change
  • Response cleaning — strips Perplexity citations [1][2], <grok:*> tags, <?xml?> declarations, <script> tags
  • Rate limit tracking — tracks Pro Search quota, auto-fallback to free model when exhausted, notices at every 5th decrement
  • Session continuity — tracks Perplexity backend_uuid so follow-up turns skip history/instructions entirely, sending only the new query
  • Session keep-alive — periodic pings prevent cookie expiry
  • Push notificationsntfy.sh alerts on cookie expiry or model upgrades
  • Debug chat UI/chat page with tools toggle, thinking toggle, streaming toggle, and OpenAI format validator
  • Dynamic model management — add/remove models at runtime via admin API
  • Full input validation — proper error messages for every malformed request

Quick Start

git clone https://github.com/jamie950315/pplx-proxy.git
cd pplx-proxy
python3 -m venv venv
venv/bin/pip install -r requirements.txt

cp .env.example .env
# Edit .env — set PPLX_COOKIE and ACCOUNT_TYPE

venv/bin/uvicorn server:app --host 0.0.0.0 --port 8892

Then open http://localhost:8892/chat to test with the debug UI.

Getting Your Cookie

  1. Log in to perplexity.ai
  2. F12 → ApplicationCookieswww.perplexity.ai
  3. Copy __Secure-next-auth.session-token
  4. Set PPLX_COOKIE=<value> in .env

Models

Model ID Backend Tier Thinking Variant
auto Perplexity Best free+
sonar Sonar pro+
gpt GPT-5.4 pro+ gpt54_thinking
sonnet Claude Sonnet 4.6 pro+ claude46sonnetthinking
gemini Gemini 3.1 Pro pro+ always on
nemotron Nemotron 3 Super pro+ always on
opus Claude Opus 4.6 max claude46opusthinking

Thinking variants are activated via thinking: true or reasoning_effort parameter — no separate model names needed.

API Endpoints

Method Path Auth Description
GET /health No Health check
GET /chat No Debug chat UI with OpenAI format validator
GET /v1/models Yes List tier-available models
POST /v1/chat/completions Yes Chat (streaming + non-streaming + tools + thinking)
POST /v1/responses Yes OpenAI Responses API compatibility (used by LobeHub web search)
POST /<api-key>/mcp Key in URL MCP Streamable HTTP
GET /<api-key>/sse Key in URL MCP SSE
GET /admin/models Yes Full model map
POST /admin/update-models Yes Add/replace models
POST /admin/refresh-cookie Yes Inject new session token
POST /admin/discover-models Yes Run model discovery

Usage

OpenAI API

# Basic chat
curl -X POST http://localhost:8892/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "sonnet", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

# With thinking
curl -X POST http://localhost:8892/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt", "messages": [{"role": "user", "content": "Analyze X"}], "thinking": true}'

# With tool calling
curl -X POST http://localhost:8892/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonnet",
    "messages": [{"role": "user", "content": "Weather in Tokyo"}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}]
  }'

Debug Chat UI

Open http://localhost:8892/chat (or https://your-domain/chat) in a browser:

  • Toggle Tools ON/OFF to test tool calling
  • Toggle thinking to test reasoning mode
  • Toggle stream for streaming vs non-streaming
  • Raw tab: shows full request/response JSON
  • Format ✓ tab: validates every response field against the OpenAI spec with PASS/FAIL badges

MCP

The API key is part of the URL path for MCP authentication:

# Claude Code
claude mcp add pplx-proxy --transport http http://localhost:8892/YOUR_API_KEY/mcp

# SSE transport
# Connect to http://localhost:8892/YOUR_API_KEY/sse

Without PPLX_PROXY_API_KEY set, MCP falls back to unauthenticated /mcp/mcp and /sse/sse.

MCP Tools:

Tool Description
perplexity_search Pro Search with model/source selection
perplexity_ask Quick auto-mode Q&A
perplexity_reason Reasoning with model selection
perplexity_research Deep Research
perplexity_models List available models for your tier

OpenAI Format Compliance

All responses strictly match the OpenAI Chat Completions API spec:

  • id (chatcmpl-*), object, created, model, system_fingerprint (null)
  • choices[].index, choices[].logprobs (null), choices[].finish_reason
  • usage.total_tokens = prompt_tokens + completion_tokens
  • Streaming: consistent id, system_fingerprint in every chunk, proper [DONE] termination
  • Tool calls: id (call_*), type (function), function.name, function.arguments (valid JSON string)

Use /chat to visually verify — the Format ✓ tab runs 20+ checks per response.

Auto-Discovery

Every PROBE_INTERVAL_HOURS (default 24h), pplx-proxy checks if models are still alive. If one dies, it increments the version number (e.g., gpt54gpt55 → ... up to +1.0) and auto-upgrades. Thinking variants are auto-derived from _THINKING_MAP.

Manual trigger: POST /admin/discover-models

Configuration

Variable Default Description
PPLX_COOKIE Session token (required)
PPLX_PROXY_API_KEY Bearer auth (empty = no auth)
ACCOUNT_TYPE pro free, pro, or max
DEFAULT_MODEL gpt Default when not specified
PPLX_PROXY_PORT 8892 Listen port
CUSTOM_PROMPTS file Local prompt block prepended to every LobeHub request
KEEPALIVE_HOURS 6 Session ping interval
PROBE_INTERVAL_HOURS 24 Auto-discovery interval
NTFY_TOPIC pplx-proxy ntfy.sh topic
NTFY_URL https://ntfy.sh ntfy server URL
NTFY_COOLDOWN_SECS 3600 Min interval between alerts
PUBLIC_URL http://localhost:8892 URL in ntfy messages
PPLX_API_VERSION 2.18 Perplexity internal API ver
PPLX_IMPERSONATE chrome curl_cffi TLS fingerprint
USER_AGENT Chrome/130 HTTP User-Agent
COOKIE_MAX_AGE_HOURS 168 Cookie cache max age
LOG_LEVEL INFO Logging level

Deployment (systemd)

sudo cp pplx-proxy.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now pplx-proxy

Cookie Lifecycle

Manual inject → keep-alive every 6h → session stays alive indefinitely
                                      ↓ (if Perplexity force-revokes)
                                      ntfy alert → manual re-inject

Re-inject without SSH:

curl -X POST https://your-domain/admin/refresh-cookie \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"session_token": "NEW_TOKEN"}'

Critical Implementation Notes

Why models say "I can't access real-time data": This proxy must handle three issues that cause Perplexity models to ignore their own search results:

  1. search_focus: "internet" must be set in every request. Without it, Perplexity defaults to "writing" mode where models don't incorporate search results. This is the single most important parameter.

  2. System prompts must be stripped or replaced before sending to Perplexity. Perplexity searches ALL query text — if the system prompt says "You are an AI assistant", Perplexity finds chatbot tutorial pages and the model gets confused. Generic clients keep only whitelist-approved lines; LobeHub requests discard upstream prompt content entirely.

  3. LobeHub requests always prepend local CUSTOM_PROMPTS. The proxy still detects role: developer and system-prompt-like user messages so it can classify the request source, but those upstream prompt blocks are never forwarded. Each LobeHub turn sends instructions=[CUSTOM_PROMPTS] plus preserved history and current query.

  4. Rate limit tracking uses FlareSolverr (localhost:8191) to poll Perplexity's /rest/rate-limit/all endpoint with the session cookie. Requires FlareSolverr running locally. When remaining_pro reaches 0, all non-auto models fall back to auto (free tier).

See CLAUDE.md for the full technical breakdown and MANUAL.md troubleshooting section for diagnosis steps.

Disclaimer

Unofficial reverse proxy for personal use. Relies on Perplexity's internal web API which may change without notice. Use responsibly.

License

MIT

About

Reverse proxy for Perplexity.ai — OpenAI-compatible API + MCP server using your Pro subscription cookie

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors