Add ChatGPT/Codex subscription tier (loopback proxy)#7401
Add ChatGPT/Codex subscription tier (loopback proxy)#7401Git-on-my-level wants to merge 2 commits into
Conversation
Route proactive LLM through a localhost Codex proxy using ~/.codex auth, add Settings enrollment UX, local memory wiki + FTS for search, backend tier activation, and bundle the proxy in run.sh. Adapted for upstream main without hybrid local-daemon dependencies. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bcdc7bbed2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| raise HTTPException( | ||
| status_code=400, | ||
| detail='Invalid fingerprint: expected lowercase hex SHA-256 (64 chars)', | ||
| ) | ||
| users_db.set_chatgpt_active(uid, data.fingerprint) |
There was a problem hiding this comment.
Verify subscription ownership before setting chatgpt.active
This endpoint grants ChatGPT enrollment after only a format check on fingerprint, so any authenticated user can POST an arbitrary 64-hex string and flip chatgpt.active to true. Because downstream gating (is_chatgpt_active) is what unlocks unlimited subscription/quota paths, this creates a direct paid-feature bypass without proving the caller actually has a ChatGPT/Codex subscription.
Useful? React with 👍 / 👎.
| "model": if requested_model_hint.trim().is_empty() { Value::Null } else { Value::String(requested_model_hint.clone()) }, | ||
| "choices": [{ | ||
| "index": 0, | ||
| "message": { "role": "assistant", "content": assistant_text }, | ||
| "logprobs": null, |
There was a problem hiding this comment.
Preserve tool-call outputs in proxy chat completion responses
The proxy response synthesized here always returns only assistant text content and never includes message.tool_calls, even when the upstream Codex stream represents tool-calling output. CodexLLMClient.performGeminiCompatibleToolRound depends on tool_calls to drive Task/Insight tool loops, so in ChatGPT mode those flows can terminate early or fail on the first required-tool round because no executable tool call is propagated.
Useful? React with 👍 / 👎.
| if isinstance(last_seen, datetime): | ||
| age = (datetime.now(timezone.utc) - last_seen).total_seconds() | ||
| else: | ||
| return False | ||
| return age <= BYOK_HEARTBEAT_TTL_SECONDS |
There was a problem hiding this comment.
Refresh ChatGPT enrollment before TTL-based deactivation
is_chatgpt_active expires enrollment strictly by a 7-day TTL on last_seen_at, but desktop enrollment currently updates that timestamp only during explicit connect flow (not on normal app startup/use). As a result, active users can silently drop out of ChatGPT-unlimited status after seven days unless they manually reconnect, which causes unexpected quota/paywall regressions.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR adds a ChatGPT/Codex subscription tier that routes proactive LLM workloads through a local loopback Rust proxy (
Confidence Score: 2/5Not safe to merge as-is: the backend activation endpoint would let any authenticated user claim unlimited LLM access without a real ChatGPT subscription, and the MemorySearchMode default change would silently disable vector-search deduplication for all desktop users. The backend fingerprint check accepts any syntactically valid SHA-256 hex, allowing any authenticated Omi user to permanently bypass LLM quotas and the trial paywall without owning a ChatGPT subscription. Separately, MemorySearchMode.current falls back to localWiki for users with no stored preference, replacing vector-embedding search globally and breaking deduplication in TaskAssistant and MemoryAssistant for non-ChatGPT users. backend/routers/users.py (activation endpoint lacks subscription validation), desktop/Desktop/Sources/MemoryWikiStorage.swift (wrong default in MemorySearchMode.current)
|
| Filename | Overview |
|---|---|
| backend/routers/users.py | Adds ChatGPT enrollment endpoints. Activation only validates fingerprint format — no server-side proof of a real OpenAI subscription, allowing any authenticated user to bypass LLM quotas and the trial paywall. |
| desktop/Desktop/Sources/MemoryWikiStorage.swift | New local wiki + FTS5 storage for memories. MemorySearchMode.current defaults to localWiki for all users (wrong default), silently replacing vector-search deduplication system-wide. |
| desktop/codex-proxy/src/main.rs | New Rust loopback proxy translating OpenAI chat completions to Codex SSE responses with OAuth token refresh. Contains dead code (codex_body_to_chat_completion) used only in tests. |
| backend/database/users.py | Adds ChatGPT state CRUD following the BYOK Firestore pattern. TTL reuses BYOK_HEARTBEAT_TTL_SECONDS, which is semantically misleading but functionally correct. |
| backend/utils/subscription.py | Integrates ChatGPT tier into trial paywall logic and enforce_chat_quota; chatgpt-active users bypass the 3-day paywall. Mirrors BYOK treatment. |
| desktop/Desktop/Sources/CodexProxyService.swift | Manages the proxy subprocess lifecycle with health monitoring and auto-restart. Stderr is silenced, making startup failures hard to diagnose. |
| desktop/Desktop/Sources/CodexEnrollmentCoordinator.swift | Orchestrates ChatGPT enrollment flow. Rollback on failure is correct (clears enrollment, stops proxy). |
| desktop/Desktop/Sources/Rewind/Core/RewindDatabase.swift | Adds createMemoryWikiPages GRDB migration with FTS5 virtual table and three sync triggers. Pattern is consistent with existing FTS migrations. |
Sequence Diagram
sequenceDiagram
participant User as Desktop User
participant Enroll as EnrollmentCoordinator
participant AuthFile as CodexAuthService
participant Proxy as CodexProxyService
participant Backend as Omi Backend
participant OpenAI as Codex API
User->>Enroll: connect()
Enroll->>AuthFile: loadSnapshot()
alt auth file missing
Enroll->>Enroll: launch Terminal (codex login)
Enroll->>AuthFile: poll every 2s
end
AuthFile-->>Enroll: AuthSnapshot
Enroll->>Proxy: ensureRunning()
Proxy-->>Proxy: spawn omi-codex-proxy on loopback port
Enroll->>Backend: POST chatgpt-active (fingerprint)
Backend-->>Backend: format-only validation, write Firestore
Backend-->>Enroll: "active=true"
Note over User, OpenAI: Proactive AI inference
User->>Proxy: POST v1 chat completions
Proxy->>OpenAI: codex responses (SSE)
OpenAI-->>Proxy: SSE stream
Proxy-->>User: OpenAI-compat JSON
Reviews (1): Last reviewed commit: "Add ChatGPT/Codex plan support via local..." | Re-trigger Greptile
| let raw = UserDefaults.standard.string(forKey: "memory_search_mode") ?? "local_wiki" | ||
| return raw == "vector" ? .vectorEmbeddings : .localWiki |
There was a problem hiding this comment.
The fallback value
"local_wiki" causes all desktop users (regardless of ChatGPT tier) to use the local wiki search by default. Any user without a stored memory_search_mode preference gets .localWiki, which disables vector embedding search system-wide. The PR description states local wiki should only activate when ChatGPT tier is active, so non-ChatGPT users would lose vector-search deduplication in both TaskAssistant and MemoryAssistant.
| let raw = UserDefaults.standard.string(forKey: "memory_search_mode") ?? "local_wiki" | |
| return raw == "vector" ? .vectorEmbeddings : .localWiki | |
| let raw = UserDefaults.standard.string(forKey: "memory_search_mode") ?? "vector" | |
| return raw == "vector" ? .vectorEmbeddings : .localWiki |
| @router.post('/v1/users/me/chatgpt-active', tags=['v1']) | ||
| def activate_chatgpt_endpoint( | ||
| data: ChatGPTActivateRequest, uid: str = Depends(auth.get_current_user_uid_no_byok_validation) | ||
| ): | ||
| """Enroll ChatGPT / Codex subscription tier (LLM workloads only; no provider keys stored).""" | ||
| if not _SHA256_HEX_RE.match(data.fingerprint): | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail='Invalid fingerprint: expected lowercase hex SHA-256 (64 chars)', | ||
| ) | ||
| users_db.set_chatgpt_active(uid, data.fingerprint) | ||
| clear_trial_paywall_cache(uid) | ||
| return {"active": True} |
There was a problem hiding this comment.
No server-side validation of ChatGPT subscription
The activation endpoint accepts any 64-char hex string as a valid enrollment fingerprint. The fingerprint is stored in Firestore, but is_chatgpt_active (and therefore enforce_chat_quota) never re-verifies it against OpenAI — it only checks that active=True and the Firestore timestamp is fresh. Any authenticated Omi user can POST an arbitrary SHA-256 hex to permanently bypass the 3-day trial paywall and monthly LLM quota without ever having a ChatGPT subscription. Unlike BYOK, there is no per-request proof-of-possession; once enrolled, the bypass is unconditional.
| if isinstance(last_seen, datetime): | ||
| age = (datetime.now(timezone.utc) - last_seen).total_seconds() | ||
| else: | ||
| return False | ||
| return age <= BYOK_HEARTBEAT_TTL_SECONDS | ||
|
|
||
|
|
||
| def set_chatgpt_active(uid: str, fingerprint: str): |
There was a problem hiding this comment.
The ChatGPT TTL reuses
BYOK_HEARTBEAT_TTL_SECONDS, a constant whose name and comment are semantically tied to the BYOK feature. If the two TTLs ever diverge, this shared constant would be the wrong one to change. A dedicated constant makes the intent explicit.
| if isinstance(last_seen, datetime): | |
| age = (datetime.now(timezone.utc) - last_seen).total_seconds() | |
| else: | |
| return False | |
| return age <= BYOK_HEARTBEAT_TTL_SECONDS | |
| def set_chatgpt_active(uid: str, fingerprint: str): | |
| if isinstance(last_seen, datetime): | |
| age = (datetime.now(timezone.utc) - last_seen).total_seconds() | |
| else: | |
| return False | |
| return age <= CHATGPT_HEARTBEAT_TTL_SECONDS | |
| def set_chatgpt_active(uid: str, fingerprint: str): |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| fn codex_body_to_chat_completion(model_fallback: &str, bytes: &[u8]) -> Result<Value, String> { | ||
| let v: Value = serde_json::from_slice(bytes).map_err(|e| format!("upstream json: {e}"))?; | ||
|
|
||
| if v.get("choices").is_some() { | ||
| let mut enriched = v; | ||
| if enriched.get("id").and_then(Value::as_str).is_none() | ||
| || enriched.get("id") == Some(&Value::Null) | ||
| { | ||
| enriched["id"] = Value::String(new_chat_completion_id()); | ||
| } | ||
| if enriched.get("object").and_then(Value::as_str).is_none() | ||
| || enriched.get("object") == Some(&Value::Null) | ||
| { | ||
| enriched["object"] = Value::from("chat.completion"); | ||
| } | ||
| if enriched.get("created").and_then(Value::as_i64).is_none() | ||
| || enriched.get("created") == Some(&Value::Null) | ||
| { | ||
| enriched["created"] = Value::Number(unix_secs().into()); | ||
| } | ||
| Ok(enriched) | ||
| } else { | ||
| let text = extract_assistant_text(&v) | ||
| .ok_or_else(|| serde_json::to_string(&v).unwrap_or_else(|_| "(unprintable)".into()))?; | ||
| let model = chat_model_choice(&v, model_fallback)?; | ||
| Ok(json!({ | ||
| "id": new_chat_completion_id(), | ||
| "object": "chat.completion", | ||
| "created": unix_secs(), | ||
| "model": model, | ||
| "choices": [{ | ||
| "index": 0, | ||
| "message": { "role": "assistant", "content": text}, | ||
| "logprobs": null, | ||
| "finish_reason": infer_finish_reason(&v), | ||
| }], | ||
| "usage": v.get("usage").cloned().unwrap_or(Value::Null), | ||
| })) | ||
| } | ||
| } |
There was a problem hiding this comment.
codex_body_to_chat_completion is dead production code
invoke_codex assembles the response inline using collect_text_from_codex_sse and never calls this function. It is referenced only by the maps_responses_like_output_message unit test. Keeping it creates a diverging code path that can mislead future contributors into thinking the proxy has a non-SSE fallback mode.
| proc.standardOutput = FileHandle.nullDevice | ||
| proc.standardError = FileHandle.nullDevice |
There was a problem hiding this comment.
Proxy stderr silenced — startup failures are undiagnosable
proc.standardError = FileHandle.nullDevice discards all error output from the proxy process. If the proxy crashes on startup (missing auth file, port conflict, bad token format), the only signal the Swift side sees is a health-check timeout with the generic message "Codex proxy failed to start". Routing stderr to a pipe would allow the error message to be surfaced in lastError.
Require X-ChatGPT-Fingerprint on requests for quota and subscription bypass, refresh enrollment heartbeat from desktop launch and throttled server updates, resolve Codex transport at call time in GeminiClient, label wiki search hits distinctly from tasks, and include memory id in wiki slugs to avoid collisions.
Summary
/v1/users/me/chatgpt-active) with LLM quota bypass (separate from four-key BYOK; transcription gates unchanged).desktop/codex-proxy) and desktop Settings UX to sign in viacodex login, enroll, and route proactive LLM workloads through the proxy.Scope / upstream notes
This PR is cherry-picked from local hybrid work and does not include the local-daemon /
HybridLLMClientstack. Proactive AI (GeminiClient) usesCodexLLMClientwhen enrolled; main-window chat still uses the existing pi-mono bridge (ChatGPT tier for chat can follow in a later PR).Test plan
backend/tests/unit/test_chatgpt_enrollment.pycd desktop/codex-proxy && cargo build --releasecodex login→ verify proxy health and subscription shows unlimited LLMMade with Cursor