fix(codex): harden tier-aware model catalog refresh#2024
fix(codex): harden tier-aware model catalog refresh#2024excelwang wants to merge 5 commits intorouter-for-me:devfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 41cf3135ce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for _, key := range []string{"plan_type", "chatgpt_plan_type"} { | ||
| if raw, ok := metadata[key].(string); ok { | ||
| if plan := NormalizeCodexPlanType(raw); plan != "" { | ||
| if current, _ := metadata["plan_type"].(string); NormalizeCodexPlanType(current) != plan { | ||
| metadata["plan_type"] = plan |
There was a problem hiding this comment.
Recompute plan_type from refreshed id_token
EnsureCodexPlanTypeMetadata returns immediately when plan_type/chatgpt_plan_type is already present, so it never reconciles against a newer JWT payload. This becomes incorrect after token refreshes because CodexExecutor.Refresh updates metadata.id_token but does not refresh metadata.plan_type; if a user’s subscription changes (especially downgrade), the stale stored plan can continue to register higher-tier Codex models until the auth file is manually reset. Please reconcile plan_type with the current id_token before returning.
Useful? React with 👍 / 👎.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly hardens the Codex model catalog refresh and registration process. It introduces a dedicated, asynchronously updated catalog with embedded fallback data, ensuring model availability and preventing startup blocks. A core focus is on implementing tier-aware model visibility based on user plan types, which is crucial for security and correct model routing. The changes also enhance robustness by providing intelligent fallbacks for unknown plan types and API keys, and by respecting proxy configurations during network operations. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly hardens the Codex model catalog handling by introducing tier-aware models, asynchronous catalog refreshing, and persisting the user's plan type. However, the implementation introduces several security and reliability weaknesses. The file persistence logic in the filestore is not atomic, which could lead to credential loss during updates. Additionally, the tier-aware restriction can be bypassed due to the proxy falling back to granting access to all models when the plan type is unknown, and its reliance on unverified JWT data. There are also opportunities to improve error handling and logging for better debuggability and robustness.
sdk/auth/filestore.go
Outdated
| if file, errOpen := os.OpenFile(path, os.O_WRONLY|os.O_TRUNC, 0o600); errOpen == nil { | ||
| _, _ = file.Write(raw) | ||
| _ = file.Close() | ||
| } |
There was a problem hiding this comment.
The current logic for updating the Codex plan type in readAuthFile overwrites the authentication file non-atomically using os.O_TRUNC. This poses a significant reliability risk, as an interruption during the write operation could corrupt the file and lead to permanent credential loss. To ensure data integrity and atomicity, it is strongly recommended to write the updated metadata to a temporary file and then use os.Rename to replace the original file. Additionally, the current implementation silently ignores several potential errors during file persistence, including JSON marshaling and file I/O, which could prevent plan_type from being saved and hinder debugging. Robust error handling and logging for these operations would improve the system's resilience and debuggability.
| codexPlanType := registry.ResolveCodexPlanType(a.Attributes, a.Metadata) | ||
| if a.Attributes == nil { | ||
| a.Attributes = make(map[string]string) | ||
| } | ||
| if codexPlanType != "" { | ||
| a.Attributes["plan_type"] = codexPlanType | ||
| models = registry.GetCodexModelsForPlan(codexPlanType) | ||
| } else { | ||
| delete(a.Attributes, "plan_type") | ||
| models = registry.GetCodexModelsUnion() | ||
| } |
There was a problem hiding this comment.
The Codex model registration logic contains a fallback mechanism that bypasses tier-based restrictions. If the plan_type is unrecognized or cannot be determined (e.g., due to a missing or invalid id_token), the proxy defaults to registry.GetCodexModelsUnion(), which grants access to all models across all tiers. This behavior defeats the intended security hardening for tier-aware registration. Additionally, the plan_type is extracted from a JWT that is decoded without signature verification (in internal/registry/codex_catalog.go), making it trivial to forge. It is recommended to use a safe default (e.g., the 'free' model set) when the plan type is unknown and to verify JWT signatures if they are used for access control decisions.
| if err != nil { | ||
| return "", false | ||
| } |
There was a problem hiding this comment.
The error from extractCodexPlanTypeFromJWT is silently ignored. This can make it difficult to debug issues with JWT parsing, such as an invalid format or claims structure. It would be beneficial to log this error at a debug level for better observability.
if err != nil {
log.Debugf("failed to extract codex plan type from JWT: %v", err)
return "", false
}
luispater
left a comment
There was a problem hiding this comment.
Summary
Thanks for the hardening follow-up. The tier-aware registration and fallback direction looks right, but I found one blocking issue and one compatibility issue before merge.
Key findings
- Blocking: startup is still synchronously blocked by
registry.StartModelsUpdater(...)incmd/server/main.go, so the new async Codex refresh does not actually remove the existing network wait on process startup. - Non-blocking: plan aliases like
business,enterprise, andeduare normalized toteamfor routing, but device login still persists the raw JWT plan into the credential filename and metadata, which can create inconsistent filenames and skip the team-account hash naming path.
Test plan
- go build -o test-output ./cmd/server
- go test ./...
Code ReviewThis PR introduces tier-aware model catalog refresh with embedded fallback. The architecture is sound. Strengths
Potential Issues1. Race Condition in Initial LoadThe init() function parses embedded catalog, but StartCodexModelsUpdater runs asynchronously. There is a window where:
The consumer gets the embedded catalog but never receives the onUpdated signal. Consider: func StartCodexModelsUpdater(ctx context.Context, cfg *sdkconfig.SDKConfig, onUpdated func()) {
codexUpdaterOnce.Do(func() {
go func() {
// Always call onUpdated, even if refresh fails
err := refreshCodexModelsCatalog(ctx, cfg)
if err != nil {
log.Warnf("codex models refresh failed, using embedded catalog: %v", err)
}
if onUpdated != nil {
onUpdated()
}
}()
})
}2. No Retry LogicIf the fetch fails, it is never retried. Consider adding exponential backoff: func refreshCodexModelsCatalog(ctx context.Context, cfg *sdkconfig.SDKConfig) error {
const maxRetries = 3
const baseDelay = 1 * time.Second
for attempt := 0; attempt < maxRetries; attempt++ {
if err := tryFetchCodexModels(ctx, cfg); err == nil {
return nil
} else if attempt < maxRetries-1 {
time.Sleep(baseDelay * time.Duration(1<<uint(attempt)))
}
}
return fmt.Errorf("failed after %d retries", maxRetries)
}3. No Health CheckConsumers cannot tell if the catalog is using embedded vs remote data. Consider adding a status endpoint or health check that reports:
4. Memory UsageClone on every getter creates unnecessary allocations. Consider caching clones or returning immutable data structures. Recommendations
Good architecture overall. Just needs some operational hardening. |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Addressed the two blocking points from the latest review. What changed
Tests
Additional coverage added
Please take another look when convenient. |
|
Code Review: Overview: Strengths:
Issues:
Recommendations:
Test Coverage: |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Follow-up on xkonjin's review: I picked up the two actionable points from that comment set. Updated
On the nil-check point: Validation
|
Summary
This is a follow-up hardening PR for #2013.
It keeps the network-backed Codex model catalog direction, but fixes the parts that make model visibility and routing unsafe or inconsistent.
What this changes
plan_typefrom Codex OAuth metadata /id_tokenand persists it back to auth filesplan_typeis actually knownplan_typeis unknown, preserving manual model routing compatibilitycodexmodel lookup search the full Codex union instead of a single implicit tierWhy this is needed
After #2013, the model catalog direction is correct, but there are still a few sharp edges:
chatgpt_plan_type, so applying OAuth tier logic to them breaks existing manual model selection in config.This PR fixes those issues without changing the overall network-catalog design.
Behavior after this PR
codex.models[]config still has highest priorityplan_type=> tier-aware registrationplan_type=> full Codex union fallbackmodels[]keep manual model-name compatibilityTests
Added coverage for:
plan_typefrom Codexid_tokenmodels[]are configuredmodels[]overriding union fallback