Skip to content

fix(auth): persist refresh status/message and reduce redundant refresh attempts#2052

Open
xx299x wants to merge 15 commits intorouter-for-me:mainfrom
xx299x:fix/refresh-message
Open

fix(auth): persist refresh status/message and reduce redundant refresh attempts#2052
xx299x wants to merge 15 commits intorouter-for-me:mainfrom
xx299x:fix/refresh-message

Conversation

@xx299x
Copy link

@xx299x xx299x commented Mar 11, 2026

  • Persist a standardized refresh_message (and refresh_status) when Codex token refresh fails, so main exposes consistent, actionable failure context via the management API/UI.
  • Avoid repeated refresh work after a permanent failure by short-circuiting subsequent refresh attempts, improving stability and reducing wasted concurrent refresh traffic on main.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness and observability of Codex token refresh mechanisms. It ensures that permanent refresh failures are clearly communicated through the management API and UI by persisting detailed status and messages. Furthermore, it optimizes the refresh process by preventing repeated attempts for tokens that have already failed permanently, and introduces configurable controls for refresh concurrency and burst timing, leading to a more stable and efficient system.

Highlights

  • Enhanced Error Reporting: Standardized and persisted refresh_message and refresh_status for Codex token refresh failures, making failure context accessible via the management API and UI.
  • Reduced Redundant Refreshes: Implemented logic to short-circuit subsequent token refresh attempts for tokens that have permanently failed, improving stability and reducing unnecessary traffic.
  • Configurable Refresh Behavior: Introduced RefreshConcurrency and RefreshBurstDelay configuration options to control the rate and parallelism of token refresh operations.
  • Expanded Non-Retryable Error Conditions: Updated the IsNonRetryableRefreshErr function to include HTTP 401 status codes as non-retryable refresh errors.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • internal/api/handlers/management/auth_files.go
    • Added refresh_status and refresh_message from auth.Metadata to the auth entry returned by buildAuthFileEntry.
  • internal/auth/codex/openai_auth.go
    • Renamed isNonRetryableRefreshErr to IsNonRetryableRefreshErr (making it public).
    • Extended IsNonRetryableRefreshErr to consider "status 401" as a non-retryable error.
  • internal/auth/codex/token.go
    • Added RefreshStatus and RefreshMessage fields to the CodexTokenStorage struct, with omitempty JSON tags.
  • internal/config/config.go
    • Added RefreshConcurrency and RefreshBurstDelay fields to the Config struct.
    • Set default values for RefreshConcurrency (16) and RefreshBurstDelay (0) during config loading.
  • internal/runtime/executor/codex_executor.go
    • Imported encoding/json and os packages.
    • Added a check in Refresh to skip refresh attempts if refresh_status in metadata is "failed".
    • Modified Refresh to persist refresh_status as "failed" and refresh_message with the error details to the auth's metadata and the corresponding auth file if a non-retryable error occurs.
    • Updated the auth.Status and auth.StatusMessage for UI/API when a permanent refresh failure happens.
  • sdk/cliproxy/auth/conductor.go
    • Modified NewManager to initialize refreshSemaphore with a dynamically configured concurrency value instead of a fixed constant.
    • Introduced getRefreshConcurrency and getRefreshBurstDelay methods to retrieve refresh configuration from the runtime config.
    • Implemented a burst delay mechanism in checkRefreshes before starting individual refresh goroutines.
    • Added logic in refreshAuthWithLimit to dynamically adjust the capacity of the refreshSemaphore if the configured concurrency changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the token refresh mechanism by persisting failure states to avoid redundant retries for permanently failed tokens and exposing this status via the management API. It also introduces configuration for refresh concurrency and burst delays. My review focuses on the correctness and robustness of these changes. I've identified a critical issue with direct file persistence that bypasses error handling and a medium-severity issue related to brittle error message string matching. The changes to make refresh behavior configurable are well-implemented.

Copy link
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary
This change improves Codex refresh error classification, but the current implementation does not reliably persist the new refresh failure metadata and it also treats permanent refresh failures as successful refreshes in the manager flow.

Key findings

  • Blocking: refresh_status / refresh_message are not persisted consistently across store backends. FileTokenStore injects auth.Metadata into auth.Storage before SaveTokenToFile, but GitTokenStore, ObjectTokenStore, and PostgresStore still write auth.Storage directly. In those backends, Codex auths that still carry auth.Storage will drop the new metadata instead of persisting it.
  • Blocking: CodexExecutor.Refresh returns (auth, nil) for non-retryable refresh failures and for already-failed auths. Manager.refreshAuth treats any nil error as a successful refresh, updates LastRefreshedAt, clears LastError, and persists the auth. That breaks the documented meaning of LastRefreshedAt as the last successful refresh and also means expired permanent-failure auths will keep re-entering the refresh loop.

Test plan

  • Reviewed the PR diff and the affected persistence / refresh paths.
  • Ran go test ./... on the PR head.

@xx299x xx299x requested a review from luispater March 11, 2026 13:21
@xkonjin
Copy link

xkonjin commented Mar 11, 2026

Code Review

Solid improvement to Codex refresh error classification and persistence. Found two blocking issues.

Blocking Issues

1. Inconsistent metadata persistence across backends

refresh_status / refresh_message are injected into auth.Storage in FileTokenStore:

auth.Metadata["refresh_status"] = status
auth.Metadata["refresh_message"] = msg
saveTokenToFile(auth.Storage)

But other backends write auth.Storage directly:

  • GitTokenStore.SaveToken(auth.Storage)
  • ObjectTokenStore.Put(auth.Storage)
  • PostgresStore.SaveToken(auth.Storage)

These backends will DROP the new metadata instead of persisting it.

Fix: Ensure all backends include auth.Metadata when persisting, or unify persistence through a common layer.

2. Permanent failures treated as successful refreshes

CodexExecutor.Refresh:

if isNonRetryableRefreshError(err) || auth.RefreshStatus == RefreshStatusPermanentFailure {
    return auth, nil  // Returns nil error!
}

Manager.refreshAuth:

auth, err := executor.Refresh(ctx, cred.Auth)
if err == nil {  // nil error = success
    cred.Auth.LastRefreshedAt = now
    cred.Auth.LastError = ""
    saveCredential(...)
}

Problems:

  1. Permanent failures update LastRefreshedAt as if successful, breaking its documented meaning
  2. Failed auths will keep re-entering the refresh loop every refresh-interval since LastError is cleared
  3. The persisted "permanent failure" state gets overwritten with "success"

Fix: Return an error from Refresh for non-retryable/permanent failures, so Manager.refreshAuth correctly handles them.

Summary

The direction is right, but these two issues break the durability and semantics of refresh failure tracking. Fix persistence consistency and error handling before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants