You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(cache): add OAuth client cache with redis-aside support (#155)
* feat(cache): add OAuth client cache with redis-aside support
Add a new Cache[OAuthApplication] instance that caches client lookups
by client_id using the cache-aside pattern. store.GetClient() is called
20+ times across all OAuth flows (device code, authorization code,
token exchange, client credentials) — this was the hottest uncached
DB query path.
Key design decisions:
- GetClient() returns cached copy with ClientSecret stripped (defense-in-depth)
- GetClientWithSecret() bypasses cache for secret-verification flows
- Explicit invalidation on all mutations (create, update, delete,
approve, reject, secret regeneration)
- Inject ClientService into DeviceService, TokenService, and
AuthorizationService to replace direct store.GetClient() calls
Configuration: CLIENT_CACHE_TYPE, CLIENT_CACHE_TTL (5m default),
CLIENT_CACHE_CLIENT_TTL (30s), CLIENT_CACHE_SIZE_PER_CONN (32MB)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(cache): use closure vars in fetchFunc and add DB fallback on cache errors
- Use clientID/hash closure variables instead of key param in GetWithFetch
fetchFuncs to avoid using redis-aside prefixed keys for DB lookups
- Add cache-error fallback in GetClient to distinguish infrastructure
failures from genuine not-found, mirroring getAccessTokenByHash pattern
- Apply same prefixed-key fix to getAccessTokenByHash in TokenService
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(cache): move GetClient fallback rationale to doc comment
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(cache): remove unreachable DB fallback in GetClient
fetchThrough already calls the fetch function on any cache Get error,
so the explicit fallback path could never execute for cache backend
failures. When the DB itself fails, calling it twice is wasteful.
Remove the dead fallback and drop the now-unused gorm import.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(cache): restore DB fallback in GetClient for redis-aside outages
- Restore gorm.ErrRecordNotFound check and DB fallback in GetClient
- RueidisAsideCache.GetWithFetch can return an error without calling
fetchFunc when Redis/RESP3 is unavailable, so the fallback is needed
to avoid treating infrastructure failures as "client not found"
- Add tests: secret stripping, cache hit (fetchFunc called once),
cache invalidation on UpdateClient and RegenerateSecret
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style(services): wrap long function signature in client_test.go
- Break GetWithFetch method signature across multiple lines to satisfy golines formatter
* fix(services): distinguish store errors from cache-backend errors in GetClient
- Wrap fetchFunc store errors with clientFetchErr sentinel to prevent redundant DB retry
- Return the original store error instead of masking it as ErrClientNotFound
- Fix DB fallback to propagate non-ErrRecordNotFound store errors correctly
* refactor(services): remove unreachable ErrRecordNotFound check in GetClient
- fetchFunc always wraps store errors in clientFetchErr, so a raw
gorm.ErrRecordNotFound can never reach this branch
* style(services): remove redundant inline comment in GetClient fetchFunc
* fix(services): evict corrupted cache entry on ErrInvalidValue in GetClient
- On cache.ErrInvalidValue (unmarshal failure), delete the bad key before
falling back to DB so subsequent requests re-populate the cache correctly
instead of hot-looping through the DB fallback on every call
* fix(services): log Delete errors and fix ErrInvalidValue eviction in token cache
- Log cache Delete errors on ErrInvalidValue eviction in GetClient (was silently discarded)
- Apply same ErrInvalidValue + eviction pattern to TokenService.getAccessTokenByHash
to prevent corrupted token cache entries from hot-looping through the DB fallback
* style(services): mask token hash in eviction log to match invalidateTokenCache pattern
* refactor(services): add ctx parameter to GetClient and GetClientWithSecret
- Propagate caller context through cache I/O and DB fallback so that
request timeouts/cancellation are respected and tracing can propagate
- Handlers pass c.Request.Context(); service callers pass their ctx;
methods without a context use context.Background() as a fallback
* refactor(services): propagate ctx through GetClientByUserCode, ValidateAuthorizationRequest, AuthenticateClient
- All three methods called from HTTP handlers but lacked ctx parameter;
context.Background() replaced with the actual request context so
cancellation/timeout from handlers flows through to cache and DB
* fix(services): propagate real DB errors from GetClientWithSecret
- Preserve non-404 store errors instead of masking them as ErrClientNotFound
- Remove unnecessary cache invalidation from CreateClient (new clients are never cached)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(services): wrap token store errors to prevent double DB hit
Use tokenFetchErr sentinel (parallel to clientFetchErr) so transient DB
errors inside GetWithFetch fetchFunc are distinguished from cache-backend
failures and short-circuited instead of triggering a redundant DB fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(services): merge clientFetchErr and tokenFetchErr into shared fetchErr
Both types were identical wrappers used to distinguish store errors from
cache-backend errors inside GetWithFetch callbacks. Extract once into
errors.go and remove the per-file duplicates.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test(config): add CLIENT_CACHE_TYPE validation test coverage
Cover all validation branches: invalid type, redis/redis-aside without
REDIS_ADDR, zero CLIENT_CACHE_TTL, and redis-aside with zero CLIENT_CACHE_CLIENT_TTL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(services): deep-copy RedirectURIs slice when caching OAuthApplication
Prevent callers from accidentally corrupting cached backing arrays via
in-place slice mutations. The cached entry now has its own independent
StringArray so modifications to the returned value cannot affect the cache.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add Client Cache and Token Cache sections to CONFIGURATION.md
- Add ## Client Cache section covering backends, configuration vars, TTL
trade-offs, and multi-pod recommendations for CLIENT_CACHE_* settings
- Add ## Token Cache section covering the opt-in token verification cache
with TOKEN_CACHE_* settings, revocation invalidation, and RESP3 notes
- Add both sections to the table of contents
- Mention CLIENT_CACHE_TYPE and TOKEN_CACHE_TYPE in README Scalability section
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -549,7 +549,7 @@ docker run -d \
549
549
550
550
-**SQLite**: Suitable for < 1000 concurrent devices, single-instance deployments
551
551
-**PostgreSQL**: Recommended for production, supports horizontal scaling
552
-
-**Multi-Pod**: Use PostgreSQL + Redis for rate limitingand user cache across pods (`RATE_LIMIT_STORE=redis`, `USER_CACHE_TYPE=redis` or `redis-aside`). Note: `redis-aside` requires Redis >= 7.0.
552
+
-**Multi-Pod**: Use PostgreSQL + Redis for rate limiting, user cache, client cache, and token cache across pods (`RATE_LIMIT_STORE=redis`, `USER_CACHE_TYPE=redis` or `redis-aside`, `CLIENT_CACHE_TYPE=redis` or `redis-aside`, `TOKEN_CACHE_TYPE=redis` or `redis-aside`). Note: `redis-aside` requires Redis >= 7.0.
@@ -722,6 +724,151 @@ USER_CACHE_SIZE_PER_CONN=32 # Adjust based on available memory per pod
722
724
723
725
---
724
726
727
+
## Client Cache
728
+
729
+
Every OAuth flow (device code, authorization code, token exchange, client credentials) queries the `OAuthApplication` record to validate the client. Caching these lookups reduces database pressure on busy deployments.
730
+
731
+
The cache is always enabled with no feature flag required. Mutations (create, update, delete, secret regeneration, approve/reject) always invalidate the cache entry immediately.
732
+
733
+
### How It Works
734
+
735
+
The cache uses a **cache-aside pattern**:
736
+
737
+
1. On the first request for a client ID, the DB is queried and the result is stored in cache with a TTL
738
+
2. Client secrets are **stripped before caching** (defense-in-depth — secrets are never stored in the cache backend)
739
+
3. Cache entries are invalidated immediately on any write operation (create, update, delete, secret rotation)
# Cache backend: memory (default), redis, or redis-aside
753
+
CLIENT_CACHE_TYPE=memory
754
+
755
+
# How long a cached client record is valid (default: 5m); must be > 0
756
+
# Mutations always invalidate immediately, so this is only a fallback TTL.
757
+
CLIENT_CACHE_TTL=5m
758
+
759
+
# Client-side TTL for redis-aside mode only (default: 30s); must be > 0
760
+
CLIENT_CACHE_CLIENT_TTL=30s
761
+
762
+
# Client-side cache size per connection in MB for redis-aside mode only (default: 32MB)
763
+
# Total memory per pod = cache_size × connections (~10 based on GOMAXPROCS) → default ~320MB
764
+
CLIENT_CACHE_SIZE_PER_CONN=32
765
+
```
766
+
767
+
Redis-based backends also require the shared Redis settings:
768
+
769
+
```bash
770
+
REDIS_ADDR=localhost:6379
771
+
REDIS_PASSWORD=
772
+
REDIS_DB=0
773
+
```
774
+
775
+
### Multi-Pod Recommendation
776
+
777
+
```bash
778
+
# 2–5 pods: Redis shared cache
779
+
CLIENT_CACHE_TYPE=redis
780
+
REDIS_ADDR=redis-service:6379
781
+
782
+
# 5+ pods or DDoS protection: redis-aside with client-side caching
783
+
CLIENT_CACHE_TYPE=redis-aside
784
+
REDIS_ADDR=redis-service:6379
785
+
CLIENT_CACHE_CLIENT_TTL=30s
786
+
CLIENT_CACHE_SIZE_PER_CONN=32 # Adjust based on available memory per pod
787
+
```
788
+
789
+
> **Note**: `redis-aside` uses RESP3 client-side caching for automatic invalidation across all pods and requires **Redis >= 7.0**. Memory usage per pod is `CLIENT_CACHE_SIZE_PER_CONN × ~10 connections` (default ~320MB).
790
+
791
+
---
792
+
793
+
## Token Cache
794
+
795
+
`/oauth/tokeninfo` and every request protected by token-based auth call `GetAccessTokenByHash`, which hits the database on every validation. The token cache absorbs these lookups, reducing DB load significantly on high-traffic deployments.
796
+
797
+
The token cache is **disabled by default** (`TOKEN_CACHE_ENABLED=false`). Enable it for production deployments with significant token validation traffic.
798
+
799
+
### How It Works
800
+
801
+
The cache uses a **cache-aside pattern**:
802
+
803
+
1. On the first validation of a token hash, the DB is queried and the result is stored in cache with a TTL
804
+
2. Subsequent validations within the TTL window are served from cache
805
+
3. Token revocation, rotation, and status changes always **explicitly invalidate** the cache entry — the TTL is a fallback only
# Or redis-aside for real-time invalidation across all pods (requires Redis >= 7.0)
861
+
TOKEN_CACHE_ENABLED=true
862
+
TOKEN_CACHE_TYPE=redis-aside
863
+
REDIS_ADDR=redis-service:6379
864
+
TOKEN_CACHE_CLIENT_TTL=1h
865
+
TOKEN_CACHE_SIZE_PER_CONN=32
866
+
```
867
+
868
+
> **Note**: `redis-aside` uses RESP3 client-side caching with **real-time invalidation** — when a token is revoked, all pods drop their client-side cache entry immediately via RESP3 push notifications. This requires **Redis >= 7.0**. Memory usage per pod is `TOKEN_CACHE_SIZE_PER_CONN × ~10 connections` (default ~320MB).
869
+
870
+
---
871
+
725
872
## Rate Limiting
726
873
727
874
AuthGate includes built-in rate limiting to protect against brute force attacks, credential stuffing, and API abuse. The rate limiting system is production-ready with support for both single-instance and distributed deployments.
0 commit comments