fix(management): harden state updates and config limits#1923
fix(management): harden state updates and config limits#1923shenshuoyaoyouguang wants to merge 3 commits intorouter-for-me:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly hardens the management API's state handling and configuration limits. It introduces robust concurrency controls to prevent state corruption during concurrent updates and enforces maximum values for critical configuration settings like log file sizes. These changes improve the overall stability and security of the application by reducing the risk of unexpected behavior due to race conditions or invalid configurations, while also providing better diagnostic information for misconfigured authentication. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant hardening for management state mutations and configuration limits, improving robustness and security. While the introduction of stateMu and StateMiddleware for serialized access and enhanced configuration validation are positive steps, a potential Denial of Service (DoS) vulnerability was identified. This vulnerability stems from middleware ordering in management routes, where state locking occurs before authentication, which could allow unauthenticated users to tie up management resources. Furthermore, a critical race condition and a likely compilation error were found in handler.go, and a constant in config.go has a suspiciously large value.
| func (h *Handler) effectiveAuthDir() string { | ||
| if h == nil { | ||
| return "" | ||
| } | ||
| store := h.tokenStoreWithBaseDir() | ||
| if h.cfg == nil { | ||
| return ResolveEffectiveAuthDir("", store) | ||
| } | ||
| return ResolveEffectiveAuthDir(h.cfg.AuthDir, store) | ||
| } |
There was a problem hiding this comment.
This function has a potential race condition and a reference to an undefined method.
- It reads
h.cfgwithout acquiring a lock onh.stateMu. Since other parts of the code modifyh.cfgunder a write lock (e.g.,SetConfig), this is a race condition. A read lock (h.stateMu.RLock()) should be used to ensure thread-safe access. - The method
h.tokenStoreWithBaseDir()is not defined in the provided code. It seems likely that this should beh.tokenStore.
| func (h *Handler) effectiveAuthDir() string { | |
| if h == nil { | |
| return "" | |
| } | |
| store := h.tokenStoreWithBaseDir() | |
| if h.cfg == nil { | |
| return ResolveEffectiveAuthDir("", store) | |
| } | |
| return ResolveEffectiveAuthDir(h.cfg.AuthDir, store) | |
| } | |
| func (h *Handler) effectiveAuthDir() string { | |
| if h == nil { | |
| return "" | |
| } | |
| h.stateMu.RLock() | |
| defer h.stateMu.RUnlock() | |
| store := h.tokenStore | |
| if h.cfg == nil { | |
| return ResolveEffectiveAuthDir("", store) | |
| } | |
| return ResolveEffectiveAuthDir(h.cfg.AuthDir, store) | |
| } |
There was a problem hiding this comment.
Addressed in 81f497b. effectiveAuthDir now reads handler state under RLock, and I also snapshot the management auth/config values before request auth checks. The tokenStoreWithBaseDir helper already existed in auth_files.go, so that part of the warning was a false positive rather than a missing implementation.
internal/api/server.go
Outdated
|
|
||
| mgmt := s.engine.Group("/v0/management") | ||
| mgmt.Use(s.managementAvailabilityMiddleware(), s.mgmt.Middleware()) | ||
| mgmt.Use(s.managementAvailabilityMiddleware(), s.mgmt.StateMiddleware(), s.mgmt.Middleware()) |
There was a problem hiding this comment.
The StateMiddleware, which acquires an exclusive lock on the management state, is placed before the Middleware which handles authentication. This allows an unauthenticated attacker to acquire the lock by sending requests to management endpoints, potentially causing a Denial of Service (DoS) on the management API. Furthermore, since StateMiddleware is held during the entire request processing, an attacker can send a large request body to endpoints like PutConfigYAML to hold the lock for an extended period, blocking legitimate administrative access.
There was a problem hiding this comment.
Fixed in 81f497b. I reordered the management middleware chain so auth/availability run before StateMiddleware, preventing unauthenticated requests from taking the exclusive management state lock.
internal/config/config.go
Outdated
| const ( | ||
| DefaultPanelGitHubRepository = "https://github.com/router-for-me/Cli-Proxy-API-Management-Center" | ||
| DefaultPprofAddr = "127.0.0.1:8316" | ||
| MaxLogsMaxTotalSizeMB = 1024 * 1024 |
There was a problem hiding this comment.
Fixed in 81f497b. I tightened MaxLogsMaxTotalSizeMB from ~1 TB down to 1024 MB so the management-side clamp now enforces a realistic upper bound.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly hardens the management API by introducing proper state synchronization using mutexes, enforcing configuration limits, and improving the robustness of the server and authentication middleware. However, a potential path traversal vulnerability was identified in the newly introduced writePendingOAuthCallbackFile function, where the state parameter from user-supplied query strings is used in file-writing operations without apparent sanitization, which could allow an attacker to perform arbitrary file writes via path traversal sequences. Additionally, consider improving the robustness of the new locking mechanism by using defer for unlocking mutexes, a common Go idiom for safer concurrency.
internal/api/server.go
Outdated
| func writePendingOAuthCallbackFile(configAuthDir, provider, state, code, errStr string) { | ||
| authDir := managementHandlers.ResolveEffectiveAuthDir(configAuthDir, sdkAuth.GetTokenStore()) | ||
| _, _ = managementHandlers.WriteOAuthCallbackFileForPendingSession(authDir, provider, state, code, errStr) | ||
| } |
There was a problem hiding this comment.
The state parameter, which originates from unvalidated user input in OAuth callback routes (e.g., line 381), is passed to WriteOAuthCallbackFileForPendingSession via the writePendingOAuthCallbackFile wrapper. This function likely uses the state value to construct a filename within the authDir. Without proper sanitization, an attacker could provide a malicious state value containing path traversal sequences (e.g., ../../) to write or overwrite files outside the intended directory. It is recommended to sanitize the state parameter to ensure it only contains expected alphanumeric characters before using it in file system operations.
There was a problem hiding this comment.
Good catch. This is now guarded in the latest head: both the management callback endpoint and the top-level callback routes go through WriteOAuthCallbackFileForPendingSession(...), and the write path still validates state with ValidateOAuthState(...) before constructing the callback filename.
The latest fix also binds the effective auth dir to the OAuth session itself, so the callback writer no longer reuses a raw config path while the waiting side is polling another location. I also added regression coverage for the callback path in the latest update.
| h.stateMu.Lock() | ||
| h.cfg = cfg | ||
| h.stateMu.Unlock() |
There was a problem hiding this comment.
Using defer to unlock a mutex is a common Go idiom that makes the code more robust. It ensures the mutex is unlocked even if the function panics or has multiple return paths added in the future.
This suggestion also applies to SetAuthManager, SetUsageStatistics, SetLocalPassword, SetLogDirectory, and SetPostAuthHook.
| h.stateMu.Lock() | |
| h.cfg = cfg | |
| h.stateMu.Unlock() | |
| h.stateMu.Lock() | |
| defer h.stateMu.Unlock() | |
| h.cfg = cfg |
There was a problem hiding this comment.
Agreed — using defer here is more idiomatic and safer against future early-return edits.
I kept the latest patch scoped to the blocking auth-dir inconsistency so I did not mix a style/cleanup-only change into the same fix, but I agree with the suggestion and can fold this cleanup into a follow-up if needed.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request effectively hardens the management API by introducing state protection with mutexes and enforcing configuration limits, significantly improving the robustness and safety of the server's management functionality. The changes are well-structured and include valuable regression tests. However, two security vulnerabilities were identified: the configuration file is stored with insecure (world-readable) permissions, potentially exposing sensitive API keys, and the management middleware's reliance on c.ClientIP() for access control is susceptible to IP spoofing if trusted proxies are not properly configured. Additionally, the locking strategy in the new StateMiddleware could be refined to improve performance and availability of the management endpoints.
| func (h *Handler) StateMiddleware() gin.HandlerFunc { | ||
| return func(c *gin.Context) { | ||
| h.stateMu.Lock() | ||
| defer h.stateMu.Unlock() | ||
| c.Next() | ||
| } |
There was a problem hiding this comment.
The StateMiddleware uses a write lock (h.stateMu.Lock()) for all management API routes. While this is safe, it is overly restrictive for read-only (GET) endpoints and can lead to performance and availability issues. For example, a slow GET request like GetLatestVersion (which makes an external network call) would hold a write lock for its entire duration, blocking all other management API calls and critical operations like server hot-reloads (SetConfig).
To improve this, I suggest using a more granular locking strategy:
- Rename
StateMiddlewaretoWriteStateMiddleware. - Create a new
ReadStateMiddlewarethat uses a read lock (h.stateMu.RLock()). - In
internal/api/server.go, applyReadStateMiddlewaretoGETroutes andWriteStateMiddlewareto mutation routes (PUT,POST,PATCH,DELETE) individually, instead of applying one middleware to the entire group.
This would allow concurrent execution of read-only handlers, improving throughput and preventing long-running read operations from blocking state mutations.
There was a problem hiding this comment.
Agreed on the tradeoff. In this PR I intentionally kept the first step focused on correctness and state consistency under hot reload / mutation paths.
I would prefer to split read/write middleware in a follow-up PR because it changes route wiring across the management surface and is a broader behavior change than the current bug fix. The latest head already keeps auth/availability ahead of state locking, so unauthenticated requests no longer take the lock, but I agree GET vs mutation lock separation is the right next refinement for throughput and availability.
| clientIP := c.ClientIP() | ||
| localClient := clientIP == "127.0.0.1" || clientIP == "::1" |
There was a problem hiding this comment.
The management middleware relies on c.ClientIP() to identify local clients and bypass remote management restrictions. In many production environments, ClientIP() can be spoofed via HTTP headers (e.g., X-Forwarded-For) if the application is behind a proxy and Gin's trusted proxies are not correctly configured. This could allow remote attackers to bypass the allow-remote-management security control.
There was a problem hiding this comment.
Agreed — this check depends on Gin trusted proxy configuration and is not ideal as a hard security boundary on its own.
I kept the current PR scoped to state/callback correctness, but I agree the management locality check should be hardened in a follow-up so it does not rely on potentially forwarded client IP headers in misconfigured proxy deployments.
|
Review Navigation: This PR mainly focuses on management state / config hardening. Compared to others, this has broader design implications, suggested for second-tier review priority; current CI has passed, key sequencing and state access issues mentioned in earlier automated reviews have been resolved in subsequent commits. |
luispater
left a comment
There was a problem hiding this comment.
Summary:
The state-mutation hardening and config limit coverage look good overall, but I found one blocking issue in the OAuth callback path.
Key findings:
- Blocking: the new effective auth-dir handling is only applied to the top-level callback writers (
internal/api/server.go:46), while the management callback endpoint still writes toh.cfg.AuthDir(internal/api/handlers/management/oauth_callback.go:90) and the waiting goroutines still pollh.cfg.AuthDir(internal/api/handlers/management/auth_files.go:1054,:1190,:1458). For mirrored stores that overrideAuthDir()(for example Postgres/Object-backed stores), this means the callback is written to one directory while the login flow waits in another, so OAuth login from the TUI path (internal/tui/oauth_tab.go:299) still breaks. - Non-blocking: please run gofmt on the touched files before merge.
Test plan:
- go test ./internal/api/handlers/management ./internal/api ./internal/config
|
@luispater Thanks for the careful review — you were right about the remaining auth-dir inconsistency in the OAuth callback flow. I fixed it by binding the effective auth directory to the OAuth session itself, so the callback writer and the waiting goroutine now always use the same session-scoped auth dir instead of re-reading Concretely:
This closes the mirrored-store case you called out, where I also added regression coverage for:
Validation:
I also ran gofmt on the touched files. |
|
/gemini review |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Summary
Business value
Tests