You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add AWS Valkey (ElastiCache) support to the stack
Background & Context
Why Valkey
Valkey is an open-source, Apache 2.0-licensed fork of Redis 7.x, created after Redis Ltd. changed Redis's license to non-OSS (SSPL) in March 2024. Valkey is maintained by the Linux Foundation and backed by AWS, Google, Oracle, and others. It is wire-compatible with Redis 7, meaning existing go-redis/v9 clients work without code changes.
Valkey 8.x key improvements over Redis 7:
36–91% lower latency in I/O-bound workloads (I/O threads redesign)
Active memory defragmentation improvements reducing memory bloat by up to 15%
Official clustering improvements reducing split-brain scenarios
First-class support for WAIT and WAITAOF in cluster mode
Continued development of modules API (vs. Redis which is closing this)
Why AWS ElastiCache (Managed) over Self-hosted
Dimension
Managed ElastiCache
Self-hosted (e.g., Helm chart on EKS)
Failover
Automatic (<60s)
Manual or operator-managed
Persistence
Snapshots + backups out of the box
Custom setup required
Patching
AWS applies security patches
Manual
Networking
VPC-native, SG-based isolation
Pod network, harder to restrict
Cost at scale
Predictable
Underestimated (ops + k8s overhead)
Observability
CloudWatch metrics built-in
Prometheus exporter + dashboards
TLS
Managed certs, in-transit encryption
Self-signed or cert-manager
Redis Protocol Compatibility
ElastiCache Valkey speaks the Redis protocol. The existing go-redis/v9 usage in lib-commons/commons/redis and in pool-manager requires zero client-side changes to connect to ElastiCache Valkey. The lib-commons redis.ConfigTopology.Cluster field already supports cluster-mode addresses.
Idempotency (internal/adapters/http/middleware/idempotency.go): Redis SET NX for request deduplication (TTL 300s)
Multi-tenant config caching (internal/bootstrap/wire_infra_redis.go): SecretsCache backed by Redis, TTL driven by MULTI_TENANT_CACHE_TTL (default 24h)
Cache invalidation API (internal/adapters/http/handler/cache_handler.go): SCAN + DEL for pattern-based cache clearing
Tenant settings caching (internal/adapters/http/handler/tenant_service_handler.go): Redis-backed tenant settings with TTL
Connection pattern:lib-commons/v4/commons/redis.Client (standalone topology) OR direct go-redis.Client when REDIS_USERNAME is set (ACL auth path). Both paths ping Valkey at startup.
No direct Redis dependency — it is a pure frontend/BFF
Auth: WorkOS (cookie-based sessions)
Cache interaction: The backoffice-console has a CacheRepository (apps/backoffice/src/infra/repositories/cache-repository.impl.ts) that calls the pool-manager API (/cache, /cache/keys, /cache/pattern) to view and invalidate Valkey cache entries. It has no direct Redis connection.
No Redis/cache infrastructure changes needed in backoffice-console beyond updating environment variables that point to the pool-manager API.
Go 1.25.7, shared library used by all Lerian services
Already has redis/go-redis/v9 v9.18.0
Already has go-redsync/redsync/v4 (distributed locks)
Already has alicebob/miniredis/v2 (in-memory Redis for tests)
Existing packages:
commons/redis/: Full Redis client wrapper (Client struct) supporting Standalone, Sentinel, and Cluster topologies, TLS, GCP IAM auth, circuit breaker, reconnection, OpenTelemetry metrics. Already cluster-mode capable.
commons/tenant-manager/cache/: ConfigCache interface (in-memory, process-local) for tenant config caching
commons/tenant-manager/valkey/: Key helpers for tenant-namespaced keys (tenant:{tenantID}:{key})
Gap: The commons/redis package wraps go-redis.UniversalClient and supports cluster mode via ClusterTopology, but there is no higher-level CacheClient interface abstraction that services can depend on without importing go-redis directly. Services currently couple to redis.UniversalClient from go-redis. A clean CacheClient interface in lib-commons would decouple services and make testing trivial.
go-redis client:ClusterClient (or UniversalClient with cluster addresses)
Client configuration
// Cluster mode — lib-commons redis.Config ClusterTopologycfg:= redis.Config{
Topology: redis.Topology{
Cluster: &redis.ClusterTopology{
Addresses: strings.Split(os.Getenv("CACHE_ADDRS"), ","),
},
},
}
// go-redis UniversalClient auto-detects cluster when >1 address is given// or when Cluster topology is selected.
When to use
Production environments requiring high availability
High-throughput workloads (>10k ops/sec)
Data sets >26 GB that must be horizontally sharded
Pros
✅ Horizontal scaling (add shards without downtime)
✅ Automatic failover per shard (<30s)
✅ Total memory = N shards × node RAM (e.g., 3 × r7g.large = ~57 GB usable)
✅ lib-commons ClusterTopology already implemented
Cons
❌ Multi-key operations (MGET, DEL multiple keys, SCAN) require all keys in same slot
❌ Transactions (MULTI/EXEC) only work within a single slot
❌ SCAN iterates one node at a time — the SCAN pattern-delete in pool-manager must iterate all nodes
❌ Higher cost (~3× single instance for 3 shards)
❌ Key space must use {} hash tags for cross-key operations on the same slot
Key space constraint
Pool-manager currently uses SCAN for cache invalidation. In cluster mode, SCAN only covers one shard. The invalidation logic must be updated to scan all cluster nodes. go-redis ClusterClient.ForEachMaster is the correct API.
Staging: Option A (Single Instance, cache.r7g.large, 1 replica) Production: Option B (Cluster Mode, 3 shards × cache.r7g.large, 1 replica each)
Rationale: pool-manager's primary use cases (API key caching, rate limiting, idempotency, settings cache) are read-heavy with small values. The key space is tenant-namespaced (tenant:{tenantID}:{key}) which maps cleanly to cluster sharding via hash tags if needed. The existing SCAN-based invalidation must be updated for cluster mode, but that is a one-time fix in cache_handler.go. The lib-commons/commons/redis package already supports both topologies via redis.UniversalClient — no new library code needed for connectivity.
Changes Required in lib-commons
New package: commons/cache
Create a clean CacheClient interface that services depend on instead of redis.UniversalClient. This enables:
Easy mocking in unit tests (no miniredis needed for simple tests)
// Copyright (c) 2026 Lerian Studio. All rights reserved.// Use of this source code is governed by the Elastic License 2.0// that can be found in the LICENSE file.// Package cache provides a unified interface for distributed cache operations// backed by Valkey (Redis-compatible) via AWS ElastiCache.package cache
import (
"context""time"
)
// CacheMode selects the Valkey deployment topology.typeCacheModestringconst (
// SingleInstance connects to a single Valkey node or replication group primary endpoint.SingleInstanceCacheMode="single"// Cluster connects to a Valkey cluster via cluster configuration endpoint.ClusterCacheMode="cluster"
)
// CacheClient defines the contract for cache operations.// All implementations must be safe for concurrent use by multiple goroutines.//// Available implementations:// - NewCacheClient(cfg CacheConfig): factory for SingleInstance and Cluster modes// - MockCacheClient: test double for unit tests (use go:generate with mockgen)typeCacheClientinterface {
// Get retrieves a string value by key.// Returns ErrCacheMiss if the key does not exist or has expired.Get(ctx context.Context, keystring) (string, error)
// Set stores a value with the given TTL.// A zero TTL means the key never expires.Set(ctx context.Context, keystring, valueinterface{}, ttl time.Duration) error// Del removes one or more keys. Returns nil if keys do not exist.Del(ctx context.Context, keys...string) error// Exists reports how many of the given keys exist in the cache.Exists(ctx context.Context, keys...string) (int64, error)
// Incr atomically increments an integer value by 1.Incr(ctx context.Context, keystring) (int64, error)
// Expire sets (or resets) the TTL on an existing key.Expire(ctx context.Context, keystring, ttl time.Duration) error// Close releases all resources held by this client.Close() error
}
// CacheConfig configures a CacheClient.typeCacheConfigstruct {
// Addrs is the list of Valkey addresses.// SingleInstance: ["host:port"]// Cluster: ["host1:port", "host2:port", ...] (cluster cfg endpoint or seed nodes)Addrs []string// Password is the Valkey AUTH password (or ACL password when Username is also set).Passwordstring// Username is the Valkey ACL username. Leave empty for default AUTH.Usernamestring// TLSEnabled enables TLS for the connection to ElastiCache.TLSEnabledbool// CACertBase64 is the Base64-encoded PEM CA certificate for TLS verification.// Required when TLSEnabled is true and using a custom CA (e.g., ElastiCache in-transit).CACertBase64string// Mode selects SingleInstance or Cluster topology.ModeCacheMode// PoolSize is the maximum number of connections per node.// Defaults to 10 when zero.PoolSizeint// MaxRetries is the maximum number of retries on command failure.// Defaults to 3 when zero.MaxRetriesint// DialTimeout is the timeout for establishing a connection.// Defaults to 5s when zero.DialTimeout time.Duration// ReadTimeout is the timeout for socket reads.// Defaults to 3s when zero.ReadTimeout time.Duration// WriteTimeout is the timeout for socket writes.// Defaults to 3s when zero.WriteTimeout time.Duration
}
// ErrCacheMiss is returned by Get when the key does not exist or has expired.varErrCacheMiss=errCacheMiss("cache miss")
typeerrCacheMissstringfunc (eerrCacheMiss) Error() string { returnstring(e) }
go generate ./commons/cache/...
# //go:generate go run go.uber.org/mock/mockgen -source=cache.go -destination=mock_cache.go -package=cache
go.mod: No new dependency needed — github.com/redis/go-redis/v9 is already in go.mod.
Changes Required in pool-manager
Wire up cache.CacheClient
Currently pool-manager uses redis.UniversalClient directly from go-redis. This should be replaced with the lib-commons cache.CacheClient interface for testability.
internal/bootstrap/wire.go — add CacheClient field to Application:
internal/bootstrap/wire_infra_redis.go — after connecting, wrap in CacheClient:
import lcache "github.com/LerianStudio/lib-commons/v4/commons/cache"// After obtaining redisClient (UniversalClient), wrap it:app.CacheClient, err=lcache.NewCacheClientFromUniversal(redisClient)
Or simply pass CacheConfig derived from env vars and call lcache.NewCacheClient(cfg) directly.
Specific use cases to wire:
Component
File
Current
After
API key cache
middleware/apikey.go
redis.UniversalClient
cache.CacheClient
Rate limiter
middleware/ratelimit.go
redis.UniversalClient
cache.CacheClient
Idempotency
middleware/idempotency.go
redis.UniversalClient
cache.CacheClient
Settings cache
handler/tenant_service_handler.go
redis.UniversalClient
cache.CacheClient
Cache handler SCAN
handler/cache_handler.go
direct SCAN/DEL
keep redis.UniversalClient for SCAN (cluster: ForEachMaster)
Note on SCAN in cluster mode: The cache invalidation handler uses SCAN + DEL to clear patterns. In cluster mode, SCAN only covers the connected shard. The handler must be updated to use ClusterClient.ForEachMaster when cluster mode is active. This is the only code change required specifically for cluster mode compatibility.
env:
# Existing Redis config (replace with CACHE_* vars)CACHE_MODE: "single"CACHE_ADDRS: ""# populated from k8s SecretCACHE_PASSWORD: ""# populated from k8s SecretCACHE_TLS_ENABLED: "true"CACHE_POOL_SIZE: "10"CACHE_MAX_RETRIES: "3"envFrom:
- secretRef:
name: valkey-credentials # k8s Secret with CACHE_ADDRS, CACHE_PASSWORD
Changes Required in backoffice-console
The backoffice-console has no direct Redis connection. It calls the pool-manager API to manage cache entries (/cache, /cache/keys, /cache/pattern). No Redis-specific changes are required.
Only changes needed:
Update API_URL / NEXT_PUBLIC_TENANT_MANAGER_API_URL in Helm values to point to the correct pool-manager service endpoint (no change in logic, just confirming the endpoint is reachable)
Verify /cache/keys list endpoint works correctly when pool-manager is connected to ElastiCache (SCAN pagination — confirm cursor handling is compatible)
Helm values update (charts/backoffice-console/values.yaml — no cache vars needed, just confirm):
env:
NEXT_PUBLIC_TENANT_MANAGER_API_URL: "https://api.your-domain.com"# No CACHE_* vars needed — console is a pure frontend
Infrastructure / Helm
Kubernetes Secret
Create a valkey-credentials Secret in each namespace (staging, production):
pool-manager already uses Valkey (self-hosted via docker-compose locally). The production deployment method was not found in the cloned repo (no Helm charts in the repo — likely in a separate gitops/infra repo). The migration path assumes moving from self-hosted (container) to managed ElastiCache.
Steps
Audit existing data — Valkey is used for ephemeral cache only (API keys, rate limits, idempotency, settings). All data is reconstructible. No data migration needed.
Provision ElastiCache (staging) via Terraform. Verify connectivity from EKS nodes using a debug pod:
kubectl run valkey-test --image=valkey/valkey:8 --rm -it -- \
valkey-cli -h <elasticache-endpoint> -p 6379 -a <auth-token> --tls ping
Update Helm values for pool-manager in staging to use new CACHE_* env vars pointing to ElastiCache.
Deploy and validate — monitor:
checks["redis"] in /health/ready endpoint
Rate limiter functionality (smoke test: hit an endpoint >RATE_LIMIT_MAX times)
Idempotency (replay same request ID twice, expect 200 on replay)
Cache hit metrics in CloudWatch (ElastiCache CacheHits, CacheMisses)
Switch production — deploy production ElastiCache (cluster mode), update Helm values, deploy.
Decommission self-hosted — remove old Valkey container/pod from infrastructure.
Env var renaming
The existing REDIS_* env vars in pool-manager will be replaced by CACHE_* vars. During the transition, pool-manager can support both for one release cycle by checking CACHE_ADDRS first, falling back to REDIS_HOST:REDIS_PORT if absent.
Testing Requirements
Unit Tests (lib-commons commons/cache)
Use the generated MockCacheClient (mockgen) to test all callers without a real server
Test factory function NewCacheClient with invalid configs (empty addrs, bad TLS cert)
Test ErrCacheMiss is returned correctly when go-redis returns Nil
Coverage requirement: ≥ 80%
// Example mock usage in pool-manager testsfuncTestAPIKeyMiddleware_CacheHit(t*testing.T) {
ctrl:=gomock.NewController(t)
deferctrl.Finish()
mockCache:=cachemock.NewMockCacheClient(ctrl)
mockCache.EXPECT().Get(gomock.Any(), "apikey:sha256:abc123").Return(`{"valid":true}`, nil)
// ... test middleware
}
Integration Tests
Use testcontainers-go with a real Valkey container:
feat: Add AWS Valkey (ElastiCache) support to the stack
Background & Context
Why Valkey
Valkey is an open-source, Apache 2.0-licensed fork of Redis 7.x, created after Redis Ltd. changed Redis's license to non-OSS (SSPL) in March 2024. Valkey is maintained by the Linux Foundation and backed by AWS, Google, Oracle, and others. It is wire-compatible with Redis 7, meaning existing
go-redis/v9clients work without code changes.Valkey 8.x key improvements over Redis 7:
WAITandWAITAOFin cluster modeWhy AWS ElastiCache (Managed) over Self-hosted
Redis Protocol Compatibility
ElastiCache Valkey speaks the Redis protocol. The existing
go-redis/v9usage inlib-commons/commons/redisand inpool-managerrequires zero client-side changes to connect to ElastiCache Valkey. The lib-commonsredis.ConfigTopology.Clusterfield already supports cluster-mode addresses.Current State (Based on Repo Analysis)
pool-manager (module:
github.com/LerianStudio/tenant-manager)Tech stack:
redis/go-redis/v9 v9.18.0rabbitmq/amqp091-go)Current Valkey usage (ACTIVE, single-node, standalone topology):
The service already uses Valkey in docker-compose (
valkey/valkey:8-alpine) and in.env.example:Use cases confirmed in source code:
internal/adapters/http/middleware/apikey.go): Redis GET to cache API key validation results, reducing MongoDB round-trips per requestinternal/adapters/http/middleware/ratelimit.go): Redis-backed per-tenant/per-tier rate limiterinternal/adapters/http/middleware/idempotency.go): Redis SET NX for request deduplication (TTL 300s)internal/bootstrap/wire_infra_redis.go):SecretsCachebacked by Redis, TTL driven byMULTI_TENANT_CACHE_TTL(default 24h)internal/adapters/http/handler/cache_handler.go): SCAN + DEL for pattern-based cache clearinginternal/adapters/http/handler/tenant_service_handler.go): Redis-backed tenant settings with TTLConnection pattern:
lib-commons/v4/commons/redis.Client(standalone topology) OR directgo-redis.ClientwhenREDIS_USERNAMEis set (ACL auth path). Both paths ping Valkey at startup.AWS resources in use:
No ElastiCache yet. Valkey is self-hosted (Docker Compose locally, deployment method in production TBD).
backoffice-console (module: Next.js/TypeScript monorepo)
Tech stack:
Cache interaction: The backoffice-console has a
CacheRepository(apps/backoffice/src/infra/repositories/cache-repository.impl.ts) that calls the pool-manager API (/cache,/cache/keys,/cache/pattern) to view and invalidate Valkey cache entries. It has no direct Redis connection.No Redis/cache infrastructure changes needed in backoffice-console beyond updating environment variables that point to the pool-manager API.
lib-commons (module:
github.com/LerianStudio/lib-commons/v4)Tech stack:
redis/go-redis/v9 v9.18.0go-redsync/redsync/v4(distributed locks)alicebob/miniredis/v2(in-memory Redis for tests)Existing packages:
commons/redis/: Full Redis client wrapper (Clientstruct) supportingStandalone,Sentinel, andClustertopologies, TLS, GCP IAM auth, circuit breaker, reconnection, OpenTelemetry metrics. Already cluster-mode capable.commons/tenant-manager/cache/:ConfigCacheinterface (in-memory, process-local) for tenant config cachingcommons/tenant-manager/valkey/: Key helpers for tenant-namespaced keys (tenant:{tenantID}:{key})Gap: The
commons/redispackage wrapsgo-redis.UniversalClientand supports cluster mode viaClusterTopology, but there is no higher-levelCacheClientinterface abstraction that services can depend on without importinggo-redisdirectly. Services currently couple toredis.UniversalClientfrom go-redis. A cleanCacheClientinterface in lib-commons would decouple services and make testing trivial.Proposed Architecture
Option A: Single Instance (Standalone / Replication Group with 1 shard)
Configuration
cache.t4g.small(2 vCPU, 1.37 GB RAM) orcache.r7g.largefor production-grade singlemy-valkey.xxxxx.use2.cache.amazonaws.com:6379)Client configuration
When to use
Pros
StandaloneTopologyworks as-isCons
Approximate cost (us-east-2)
cache.t4g.small: ~$24/month (on-demand), ~$15/month (reserved 1yr)cache.r7g.large(recommended staging): ~$110/monthOption B: Cluster Mode (ElastiCache Valkey Cluster)
Configuration
my-valkey-cluster.xxxxx.clustercfg.use2.cache.amazonaws.com:6379)ClusterClient(orUniversalClientwith cluster addresses)Client configuration
When to use
Pros
ClusterTopologyalready implementedCons
MULTI/EXEC) only work within a single slotSCANpattern-delete in pool-manager must iterate all nodes{}hash tags for cross-key operations on the same slotKey space constraint
Pool-manager currently uses SCAN for cache invalidation. In cluster mode, SCAN only covers one shard. The invalidation logic must be updated to scan all cluster nodes. go-redis
ClusterClient.ForEachMasteris the correct API.Approximate cost (us-east-2, 3 shards × 1 replica each)
cache.r7g.large× 6: ~$660/month (on-demand), ~$420/month (reserved 1yr)cache.t4g.medium× 6 (smaller prod): ~$180/monthRecommendation
Staging: Option A (Single Instance,
cache.r7g.large, 1 replica)Production: Option B (Cluster Mode, 3 shards ×
cache.r7g.large, 1 replica each)Rationale: pool-manager's primary use cases (API key caching, rate limiting, idempotency, settings cache) are read-heavy with small values. The key space is tenant-namespaced (
tenant:{tenantID}:{key}) which maps cleanly to cluster sharding via hash tags if needed. The existing SCAN-based invalidation must be updated for cluster mode, but that is a one-time fix incache_handler.go. Thelib-commons/commons/redispackage already supports both topologies viaredis.UniversalClient— no new library code needed for connectivity.Changes Required in lib-commons
New package:
commons/cacheCreate a clean
CacheClientinterface that services depend on instead ofredis.UniversalClient. This enables:File:
commons/cache/cache.goFile:
commons/cache/valkey.go— SingleInstance implementation:Generate mock:
go generate ./commons/cache/... # //go:generate go run go.uber.org/mock/mockgen -source=cache.go -destination=mock_cache.go -package=cachego.mod: No new dependency needed —
github.com/redis/go-redis/v9is already ingo.mod.Changes Required in pool-manager
Wire up
cache.CacheClientCurrently pool-manager uses
redis.UniversalClientdirectly from go-redis. This should be replaced with the lib-commonscache.CacheClientinterface for testability.internal/bootstrap/wire.go— addCacheClientfield toApplication:internal/bootstrap/wire_infra_redis.go— after connecting, wrap in CacheClient:Or simply pass
CacheConfigderived from env vars and calllcache.NewCacheClient(cfg)directly.Specific use cases to wire:
middleware/apikey.goredis.UniversalClientcache.CacheClientmiddleware/ratelimit.goredis.UniversalClientcache.CacheClientmiddleware/idempotency.goredis.UniversalClientcache.CacheClienthandler/tenant_service_handler.goredis.UniversalClientcache.CacheClienthandler/cache_handler.goredis.UniversalClientfor SCAN (cluster:ForEachMaster)Helm values update (
charts/pool-manager/values.yaml):Changes Required in backoffice-console
The backoffice-console has no direct Redis connection. It calls the pool-manager API to manage cache entries (
/cache,/cache/keys,/cache/pattern). No Redis-specific changes are required.Only changes needed:
API_URL/NEXT_PUBLIC_TENANT_MANAGER_API_URLin Helm values to point to the correct pool-manager service endpoint (no change in logic, just confirming the endpoint is reachable)/cache/keyslist endpoint works correctly when pool-manager is connected to ElastiCache (SCAN pagination — confirm cursor handling is compatible)Helm values update (
charts/backoffice-console/values.yaml— no cache vars needed, just confirm):Infrastructure / Helm
Kubernetes Secret
Create a
valkey-credentialsSecret in each namespace (staging, production):Reference in Helm values:
Terraform Module
Use the AWS
terraform-aws-modules/elasticache/awsmodule:Security Group
Environment Variables
Complete Reference
CACHE_MODEsingleorclusterCACHE_ADDRShost:portlistCACHE_PASSWORDCACHE_USERNAMECACHE_TLS_ENABLEDCACHE_CA_CERT_BASE64CACHE_POOL_SIZECACHE_MAX_RETRIESCACHE_DIAL_TIMEOUT5s)CACHE_READ_TIMEOUT3s)CACHE_WRITE_TIMEOUT3s)Examples by Environment
Local development (docker-compose):
Staging (ElastiCache single node):
Production (ElastiCache cluster):
Local Development
Add to
docker-compose.ymlin pool-manager (already done — usingvalkey/valkey:8-alpine). For other services that adopt the cache client, use:For local cluster simulation (optional, for cluster-mode testing):
Migration Path
Current situation
pool-manager already uses Valkey (self-hosted via docker-compose locally). The production deployment method was not found in the cloned repo (no Helm charts in the repo — likely in a separate gitops/infra repo). The migration path assumes moving from self-hosted (container) to managed ElastiCache.
Steps
Audit existing data — Valkey is used for ephemeral cache only (API keys, rate limits, idempotency, settings). All data is reconstructible. No data migration needed.
Provision ElastiCache (staging) via Terraform. Verify connectivity from EKS nodes using a debug pod:
Update Helm values for pool-manager in staging to use new
CACHE_*env vars pointing to ElastiCache.Deploy and validate — monitor:
checks["redis"]in/health/readyendpointCacheHits,CacheMisses)Switch production — deploy production ElastiCache (cluster mode), update Helm values, deploy.
Decommission self-hosted — remove old Valkey container/pod from infrastructure.
Env var renaming
The existing
REDIS_*env vars in pool-manager will be replaced byCACHE_*vars. During the transition, pool-manager can support both for one release cycle by checkingCACHE_ADDRSfirst, falling back toREDIS_HOST:REDIS_PORTif absent.Testing Requirements
Unit Tests (lib-commons
commons/cache)MockCacheClient(mockgen) to test all callers without a real serverNewCacheClientwith invalid configs (empty addrs, bad TLS cert)ErrCacheMissis returned correctly when go-redis returnsNilIntegration Tests
Use
testcontainers-gowith a real Valkey container:Cluster Mode Tests
Use a 3-node Valkey cluster container (or
testcontainers-goRedis cluster module) to verify:ForEachMasterSCAN works across all shardsExistswith keys on different shards returns correct countDefinition of Done
lib-commons:commons/cachepackage created withCacheClientinterface (cache.go)lib-commons:CacheModetype (SingleInstance|Cluster) definedlib-commons:CacheConfigstruct defined with all fields documentedlib-commons:NewCacheClient(cfg CacheConfig) (CacheClient, error)factory implemented usinggo-redis UniversalClientlib-commons: SingleInstance mode working with standalone ElastiCache endpointlib-commons: Cluster mode working with cluster configuration endpointlib-commons:ErrCacheMisssentinel error defined and returned byGeton cache misslib-commons:MockCacheClientgenerated viago:generate mockgenlib-commons: Unit tests with mock, coverage ≥ 80%lib-commons: Integration tests with real Valkey container (testcontainers-go)lib-commons: go.mod unchanged (go-redis/v9 already present)lib-commons: CHANGELOG and MIGRATION_MAP updatedpool-manager:Application.CacheClientfield added (typecache.CacheClient)pool-manager:initRedisupdated to callcache.NewCacheClientwith env-driven configpool-manager:CACHE_*env vars defined inConfigstruct (replacing / aliasingREDIS_*)pool-manager: API key middleware wired tocache.CacheClientpool-manager: Rate limiter middleware wired tocache.CacheClientpool-manager: Idempotency middleware wired tocache.CacheClientpool-manager: Settings/secrets cache wired tocache.CacheClientpool-manager:cache_handler.goSCAN updated to useForEachMasterin cluster modepool-manager: Helm values updated withCACHE_*env vars andvalkey-credentialssecretRefpool-manager: Integration tests updated to use new CacheClient interfacepool-manager:.env.exampleupdated withCACHE_*variablesbackoffice-console: Verified/cacheAPI endpoints work with ElastiCache-backed pool-managerbackoffice-console: Helm values confirmed (no CACHE_* vars needed — BFF only)cache.r7g.large, TLS enabled)cache.r7g.large, 1 replica each, TLS enabled, multi-AZ)valkey-credentialscreated in staging and production namespacesEngineCPUUtilization > 80%,CurrConnections > 1000,DatabaseMemoryUsagePercentage > 80%/cache/patternAPI orvalkey-cli FLUSHDB)CACHE_ADDRS)auth_token_update_strategy = ROTATEin Terraform)