Skip to content

fix(management): harden state updates and config limits#1923

Open
shenshuoyaoyouguang wants to merge 3 commits intorouter-for-me:mainfrom
shenshuoyaoyouguang:pr/management-config-limits
Open

fix(management): harden state updates and config limits#1923
shenshuoyaoyouguang wants to merge 3 commits intorouter-for-me:mainfrom
shenshuoyaoyouguang:pr/management-config-limits

Conversation

@shenshuoyaoyouguang
Copy link
Contributor

Summary

  • harden management state mutation paths
  • enforce and test configuration limits in management/server flows
  • add regression coverage for handler state and server config boundaries

Business value

  • reduces management-side state corruption risk
  • gives maintainers a self-contained PR for config safety instead of mixing it with auth/watcher changes
  • improves rollback safety because the scope stays within management/config handling

Tests

  • go test ./internal/api/handlers/management ./internal/api ./internal/config

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly hardens the management API's state handling and configuration limits. It introduces robust concurrency controls to prevent state corruption during concurrent updates and enforces maximum values for critical configuration settings like log file sizes. These changes improve the overall stability and security of the application by reducing the risk of unexpected behavior due to race conditions or invalid configurations, while also providing better diagnostic information for misconfigured authentication.

Highlights

  • State Synchronization: Introduced a new sync.RWMutex (stateMu) in the management handler to serialize access to runtime state, preventing race conditions during concurrent updates.
  • Configuration Limits Enforcement: Implemented validation and clamping for LogsMaxTotalSizeMB configuration, both at the API endpoint and during configuration loading, to enforce defined maximum limits.
  • Robust Server Initialization: Improved server initialization by ensuring a default access manager is created if none is provided, enhancing application robustness.
  • Enhanced Authentication Middleware: Enhanced error handling in AuthMiddleware to explicitly reject requests with a 500 Internal Server Error and log a warning if the access manager is unconfigured.
  • Comprehensive Test Coverage: Added comprehensive regression tests for state serialization, configuration limits, and access manager initialization to ensure stability and correctness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • internal/api/handlers/management/config_basic.go
    • Added validation to PutLogsMaxTotalSizeMB to reject values exceeding a defined maximum.
    • Added a trailing newline character to the file.
  • internal/api/handlers/management/handler.go
    • Introduced stateMu (a sync.RWMutex) for thread-safe access to handler state.
    • Added authDirProvider interface and ResolveEffectiveAuthDir function for dynamic authentication directory resolution.
    • Implemented StateMiddleware to serialize state access for management endpoints.
    • Updated state-modifying methods (SetConfig, SetAuthManager, SetUsageStatistics, SetLocalPassword, SetLogDirectory, SetPostAuthHook) to use stateMu for synchronization.
    • Added effectiveAuthDir method to resolve the authentication directory.
  • internal/api/handlers/management/handler_state_test.go
    • Added a new test file for management handler state.
    • Added TestStateMiddleware_BlocksConcurrentSetConfig to ensure concurrent SetConfig calls are blocked by the state middleware.
    • Added TestPutLogsMaxTotalSizeMB_RejectsOversizedValue to verify that the API rejects configuration values exceeding the maximum allowed size.
  • internal/api/server.go
    • Imported the configaccess package.
    • Created a writePendingOAuthCallbackFile helper function for consistent OAuth callback file writing.
    • Modified NewServer to initialize a default accessManager if a nil manager is provided.
    • Applied StateMiddleware to the management routes group to ensure synchronized state access.
    • Updated AuthMiddleware to handle a nil manager by returning a 500 Internal Server Error and logging the issue.
  • internal/api/server_test.go
    • Added TestAuthMiddleware_NilManagerRejectsRequest to test the updated AuthMiddleware behavior when the manager is nil.
    • Added TestNewServer_CreatesAccessManagerWhenNil to verify that NewServer correctly initializes an accessManager if none is provided.
  • internal/config/config.go
    • Defined MaxLogsMaxTotalSizeMB constant to specify the maximum allowed log file size.
    • Implemented clamping logic for LogsMaxTotalSizeMB during configuration loading to ensure it does not exceed the defined maximum.
    • Added a trailing newline character to the file.
  • internal/config/config_limits_test.go
    • Added a new test file for configuration limits.
    • Added TestLoadConfig_ClampsLogsMaxTotalSizeMB to confirm that LogsMaxTotalSizeMB is correctly clamped during configuration loading.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant hardening for management state mutations and configuration limits, improving robustness and security. While the introduction of stateMu and StateMiddleware for serialized access and enhanced configuration validation are positive steps, a potential Denial of Service (DoS) vulnerability was identified. This vulnerability stems from middleware ordering in management routes, where state locking occurs before authentication, which could allow unauthenticated users to tie up management resources. Furthermore, a critical race condition and a likely compilation error were found in handler.go, and a constant in config.go has a suspiciously large value.

Comment on lines +183 to 192
func (h *Handler) effectiveAuthDir() string {
if h == nil {
return ""
}
store := h.tokenStoreWithBaseDir()
if h.cfg == nil {
return ResolveEffectiveAuthDir("", store)
}
return ResolveEffectiveAuthDir(h.cfg.AuthDir, store)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This function has a potential race condition and a reference to an undefined method.

  1. It reads h.cfg without acquiring a lock on h.stateMu. Since other parts of the code modify h.cfg under a write lock (e.g., SetConfig), this is a race condition. A read lock (h.stateMu.RLock()) should be used to ensure thread-safe access.
  2. The method h.tokenStoreWithBaseDir() is not defined in the provided code. It seems likely that this should be h.tokenStore.
Suggested change
func (h *Handler) effectiveAuthDir() string {
if h == nil {
return ""
}
store := h.tokenStoreWithBaseDir()
if h.cfg == nil {
return ResolveEffectiveAuthDir("", store)
}
return ResolveEffectiveAuthDir(h.cfg.AuthDir, store)
}
func (h *Handler) effectiveAuthDir() string {
if h == nil {
return ""
}
h.stateMu.RLock()
defer h.stateMu.RUnlock()
store := h.tokenStore
if h.cfg == nil {
return ResolveEffectiveAuthDir("", store)
}
return ResolveEffectiveAuthDir(h.cfg.AuthDir, store)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 81f497b. effectiveAuthDir now reads handler state under RLock, and I also snapshot the management auth/config values before request auth checks. The tokenStoreWithBaseDir helper already existed in auth_files.go, so that part of the warning was a false positive rather than a missing implementation.


mgmt := s.engine.Group("/v0/management")
mgmt.Use(s.managementAvailabilityMiddleware(), s.mgmt.Middleware())
mgmt.Use(s.managementAvailabilityMiddleware(), s.mgmt.StateMiddleware(), s.mgmt.Middleware())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The StateMiddleware, which acquires an exclusive lock on the management state, is placed before the Middleware which handles authentication. This allows an unauthenticated attacker to acquire the lock by sending requests to management endpoints, potentially causing a Denial of Service (DoS) on the management API. Furthermore, since StateMiddleware is held during the entire request processing, an attacker can send a large request body to endpoints like PutConfigYAML to hold the lock for an extended period, blocking legitimate administrative access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 81f497b. I reordered the management middleware chain so auth/availability run before StateMiddleware, preventing unauthenticated requests from taking the exclusive management state lock.

const (
DefaultPanelGitHubRepository = "https://github.com/router-for-me/Cli-Proxy-API-Management-Center"
DefaultPprofAddr = "127.0.0.1:8316"
MaxLogsMaxTotalSizeMB = 1024 * 1024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value of MaxLogsMaxTotalSizeMB is set to 1024 * 1024, which is 1,048,576 MB (or ~1 TB). This seems excessively large for a log size limit. Is this value intentional, or was it perhaps intended to be a smaller value, for example 1024 (for 1 GB)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 81f497b. I tightened MaxLogsMaxTotalSizeMB from ~1 TB down to 1024 MB so the management-side clamp now enforces a realistic upper bound.

@shenshuoyaoyouguang shenshuoyaoyouguang marked this pull request as ready for review March 7, 2026 09:39
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@shenshuoyaoyouguang
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly hardens the management API by introducing proper state synchronization using mutexes, enforcing configuration limits, and improving the robustness of the server and authentication middleware. However, a potential path traversal vulnerability was identified in the newly introduced writePendingOAuthCallbackFile function, where the state parameter from user-supplied query strings is used in file-writing operations without apparent sanitization, which could allow an attacker to perform arbitrary file writes via path traversal sequences. Additionally, consider improving the robustness of the new locking mechanism by using defer for unlocking mutexes, a common Go idiom for safer concurrency.

Comment on lines +46 to +49
func writePendingOAuthCallbackFile(configAuthDir, provider, state, code, errStr string) {
authDir := managementHandlers.ResolveEffectiveAuthDir(configAuthDir, sdkAuth.GetTokenStore())
_, _ = managementHandlers.WriteOAuthCallbackFileForPendingSession(authDir, provider, state, code, errStr)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The state parameter, which originates from unvalidated user input in OAuth callback routes (e.g., line 381), is passed to WriteOAuthCallbackFileForPendingSession via the writePendingOAuthCallbackFile wrapper. This function likely uses the state value to construct a filename within the authDir. Without proper sanitization, an attacker could provide a malicious state value containing path traversal sequences (e.g., ../../) to write or overwrite files outside the intended directory. It is recommended to sanitize the state parameter to ensure it only contains expected alphanumeric characters before using it in file system operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This is now guarded in the latest head: both the management callback endpoint and the top-level callback routes go through WriteOAuthCallbackFileForPendingSession(...), and the write path still validates state with ValidateOAuthState(...) before constructing the callback filename.

The latest fix also binds the effective auth dir to the OAuth session itself, so the callback writer no longer reuses a raw config path while the waiting side is polling another location. I also added regression coverage for the callback path in the latest update.

Comment on lines +135 to +137
h.stateMu.Lock()
h.cfg = cfg
h.stateMu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using defer to unlock a mutex is a common Go idiom that makes the code more robust. It ensures the mutex is unlocked even if the function panics or has multiple return paths added in the future.

This suggestion also applies to SetAuthManager, SetUsageStatistics, SetLocalPassword, SetLogDirectory, and SetPostAuthHook.

Suggested change
h.stateMu.Lock()
h.cfg = cfg
h.stateMu.Unlock()
h.stateMu.Lock()
defer h.stateMu.Unlock()
h.cfg = cfg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — using defer here is more idiomatic and safer against future early-return edits.

I kept the latest patch scoped to the blocking auth-dir inconsistency so I did not mix a style/cleanup-only change into the same fix, but I agree with the suggestion and can fold this cleanup into a follow-up if needed.

@shenshuoyaoyouguang
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively hardens the management API by introducing state protection with mutexes and enforcing configuration limits, significantly improving the robustness and safety of the server's management functionality. The changes are well-structured and include valuable regression tests. However, two security vulnerabilities were identified: the configuration file is stored with insecure (world-readable) permissions, potentially exposing sensitive API keys, and the management middleware's reliance on c.ClientIP() for access control is susceptible to IP spoofing if trusted proxies are not properly configured. Additionally, the locking strategy in the new StateMiddleware could be refined to improve performance and availability of the management endpoints.

Comment on lines +125 to +130
func (h *Handler) StateMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
h.stateMu.Lock()
defer h.stateMu.Unlock()
c.Next()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The StateMiddleware uses a write lock (h.stateMu.Lock()) for all management API routes. While this is safe, it is overly restrictive for read-only (GET) endpoints and can lead to performance and availability issues. For example, a slow GET request like GetLatestVersion (which makes an external network call) would hold a write lock for its entire duration, blocking all other management API calls and critical operations like server hot-reloads (SetConfig).

To improve this, I suggest using a more granular locking strategy:

  1. Rename StateMiddleware to WriteStateMiddleware.
  2. Create a new ReadStateMiddleware that uses a read lock (h.stateMu.RLock()).
  3. In internal/api/server.go, apply ReadStateMiddleware to GET routes and WriteStateMiddleware to mutation routes (PUT, POST, PATCH, DELETE) individually, instead of applying one middleware to the entire group.

This would allow concurrent execution of read-only handlers, improving throughput and preventing long-running read operations from blocking state mutations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the tradeoff. In this PR I intentionally kept the first step focused on correctness and state consistency under hot reload / mutation paths.

I would prefer to split read/write middleware in a follow-up PR because it changes route wiring across the management surface and is a broader behavior change than the current bug fix. The latest head already keeps auth/availability ahead of state locking, so unauthenticated requests no longer take the lock, but I agree GET vs mutation lock separation is the right next refinement for throughput and availability.

Comment on lines 208 to 209
clientIP := c.ClientIP()
localClient := clientIP == "127.0.0.1" || clientIP == "::1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The management middleware relies on c.ClientIP() to identify local clients and bypass remote management restrictions. In many production environments, ClientIP() can be spoofed via HTTP headers (e.g., X-Forwarded-For) if the application is behind a proxy and Gin's trusted proxies are not correctly configured. This could allow remote attackers to bypass the allow-remote-management security control.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — this check depends on Gin trusted proxy configuration and is not ideal as a hard security boundary on its own.

I kept the current PR scoped to state/callback correctness, but I agree the management locality check should be hardened in a follow-up so it does not rely on potentially forwarded client IP headers in misconfigured proxy deployments.

@shenshuoyaoyouguang
Copy link
Contributor Author

Review Navigation: This PR mainly focuses on management state / config hardening. Compared to others, this has broader design implications, suggested for second-tier review priority; current CI has passed, key sequencing and state access issues mentioned in earlier automated reviews have been resolved in subsequent commits.

Copy link
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:
The state-mutation hardening and config limit coverage look good overall, but I found one blocking issue in the OAuth callback path.

Key findings:

  • Blocking: the new effective auth-dir handling is only applied to the top-level callback writers (internal/api/server.go:46), while the management callback endpoint still writes to h.cfg.AuthDir (internal/api/handlers/management/oauth_callback.go:90) and the waiting goroutines still poll h.cfg.AuthDir (internal/api/handlers/management/auth_files.go:1054, :1190, :1458). For mirrored stores that override AuthDir() (for example Postgres/Object-backed stores), this means the callback is written to one directory while the login flow waits in another, so OAuth login from the TUI path (internal/tui/oauth_tab.go:299) still breaks.
  • Non-blocking: please run gofmt on the touched files before merge.

Test plan:

  • go test ./internal/api/handlers/management ./internal/api ./internal/config

@shenshuoyaoyouguang
Copy link
Contributor Author

@luispater Thanks for the careful review — you were right about the remaining auth-dir inconsistency in the OAuth callback flow.

I fixed it by binding the effective auth directory to the OAuth session itself, so the callback writer and the waiting goroutine now always use the same session-scoped auth dir instead of re-reading cfg.AuthDir at different points.

Concretely:

  • OAuth sessions now store the resolved effective auth dir at registration time
  • management callback persistence now writes via the session-bound auth dir
  • top-level callback routes also write via the session-bound auth dir
  • waiting goroutines for Anthropic / Gemini / Codex / Antigravity / iFlow now read from that same session-bound auth dir

This closes the mirrored-store case you called out, where AuthDir() can differ from cfg.AuthDir and the callback file could previously be written to one location while the login flow was polling another.

I also added regression coverage for:

  • management OAuth callback writing to the session auth dir
  • top-level callback routes writing to the session auth dir

Validation:

  • go test ./internal/api/handlers/management ./internal/api ./internal/config

I also ran gofmt on the touched files.

@shenshuoyaoyouguang
Copy link
Contributor Author

/gemini review

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants