Skip to content

feat: Redis cluster support.#6295

Open
daanvinken wants to merge 4 commits intorunatlantis:mainfrom
daanvinken:redis-cluster-support
Open

feat: Redis cluster support.#6295
daanvinken wants to merge 4 commits intorunatlantis:mainfrom
daanvinken:redis-cluster-support

Conversation

@daanvinken
Copy link
Copy Markdown

what

  • Add Redis Cluster mode support, auto-detected from config (no explicit mode flag)
  • Introduce Config struct + NewWithConfig() constructor, deprecate old New()
  • Change RedisDB.client to redis.Cmdable interface (common to Client and ClusterClient)
  • Replace Keys() with Scan() in lock key migration (Scan fans out across cluster nodes, Keys doesn't)
  • Add --redis-username flag for Redis 6+ ACL auth
  • Add --redis-cluster-addresses flag accepting comma-separated node addresses
  • /healthz now returns 503 when database is unreachable (added Ping() to Database interface)
  • Apply lock check fails closed — rejects apply when lock backend is unreachable instead of silently proceeding
  • Handle Kubernetes service link env var collisions (ATLANTIS_REDIS_PORT=tcp://... parsed as int)

why

  • Picks up where feat: support redis cluster deployment #5529 left off, addressing reviewer feedback (auto-detection over explicit --redis-deployment flag)
  • Keys() is incompatible with Redis Cluster — only queries the connected node. Scan() via go-redis ClusterClient fans out across all nodes.
  • Cmdable interface is the narrowest type that covers both client types. Tradeoff: Close() requires a type switch since Cmdable doesn't expose it.
  • Fail-open on apply lock check meant a Redis outage silently disabled the global apply lock — dangerous in production.
  • K8s auto-creates ATLANTIS_REDIS_PORT=tcp://10.x.x.x:6379 for services in the same namespace; Viper picks it up via the ATLANTIS_ prefix and fails to parse as int.

tests

  • Unit tests for NewWithConfig: single-node, cluster, empty-address edge cases
  • Unit test for sanitizeKubernetesServiceLinks: URL detection and reset to defaults
  • Updated TestApplyCommandRunner_IsLocked to verify fail-closed behavior
  • Deployed against a 3-node Redis Cluster (Bitnami helm chart) in staging. Here I've tested some failovers, locking/unlocking on different PRs etc.

When Redis cluster down:
image

New logging:

{"level":"info","ts":"2026-03-09T15:26:43.365Z","caller":"server/server.go:505","msg":"Utilizing Redis DB in cluster mode, addresses: atlantis-redis-cluster-headless.atlantis.svc.cluster.local:6379","json":{}}

references

Copilot AI review requested due to automatic review settings March 9, 2026 15:36
@dosubot dosubot Bot added feature New functionality/enhancement go Pull requests that update Go code labels Mar 9, 2026
@github-actions github-actions Bot added the docs Documentation label Mar 9, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Redis Cluster support and improves operational safety by enhancing health checks and making apply-lock enforcement fail closed when the lock backend is unreachable.

Changes:

  • Introduces Redis Config + NewWithConfig() to auto-select single-node vs cluster client and adds username/cluster address configuration.
  • Updates lock key migration to use SCAN (cluster-compatible) and adds Database.Ping() for /healthz 503 behavior.
  • Adds Kubernetes service-link env var sanitization to avoid ATLANTIS_REDIS_PORT=tcp://... parsing issues and updates apply-lock error behavior/tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
server/user_config.go Adds config fields for Redis username and cluster addresses.
server/server.go Wires Redis config into server creation and makes /healthz depend on DB reachability.
server/events/apply_command_runner.go Fails closed on apply-lock check errors and posts user-facing PR comment.
server/events/apply_command_runner_test.go Updates expectations for fail-closed apply-lock behavior.
server/core/redis/redis.go Adds Config, cluster support, SCAN migration, Ping(), and Cmdable client abstraction.
server/core/redis/redis_test.go Adds unit tests for NewWithConfig behavior and edge cases.
server/core/db/db.go Extends Database interface with Ping().
server/core/db/mocks/mock_database.go Updates mock to implement Ping().
server/core/boltdb/boltdb.go Implements Ping() for BoltDB.
cmd/server.go Adds flags for Redis username/cluster addresses and sanitizes K8s service-link env collisions.
cmd/server_test.go Adds tests for Kubernetes service-link sanitization.
runatlantis.io/docs/server-configuration.md Documents new Redis username and cluster address options.

Comment thread server/server.go Outdated
Comment thread server/core/redis/redis.go Outdated
Comment thread server/core/redis/redis.go Outdated
Comment thread server/core/redis/redis.go
Comment thread runatlantis.io/docs/server-configuration.md Outdated
@daanvinken daanvinken force-pushed the redis-cluster-support branch from 1451213 to df9221b Compare March 9, 2026 15:44
@daanvinken daanvinken changed the title Redis cluster support feat: Redis cluster support. Mar 9, 2026
Comment thread server/server.go
// and returns 503 if the database is unreachable.
func (s *Server) Healthz(w http.ResponseWriter, _ *http.Request) {
w.Header().Set("Content-Type", "application/json")
if err := s.database.Ping(); err != nil {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this also works for BoltdB and performs a simple read.

@daanvinken daanvinken force-pushed the redis-cluster-support branch from e6213a5 to 1d19bc8 Compare March 9, 2026 16:04
@daanvinken
Copy link
Copy Markdown
Author

@jamengual @javking07 do you mind having a look here?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

server/server.go:535

  • The locking-db-type switch has no default case, so an unsupported value leaves database nil and the server continues initializing. With the new /healthz behavior this can now cause a nil dereference (and even without it, later DB usage will be unsafe). Add a default that returns an error for unknown db types (or otherwise enforce that database is always initialized).
	switch dbtype := userConfig.LockingDBType; dbtype {
	case "redis":
		var clusterAddrs []string
		if userConfig.RedisClusterAddresses != "" {
			rawClusterAddrs := strings.Split(userConfig.RedisClusterAddresses, ",")
			for _, addr := range rawClusterAddrs {
				trimmed := strings.TrimSpace(addr)
				if trimmed == "" {
					continue
				}
				clusterAddrs = append(clusterAddrs, trimmed)
			}
		}
		switch {
		case len(clusterAddrs) > 0:
			logger.Info(fmt.Sprintf("Utilizing Redis DB in cluster mode, addresses: %s", userConfig.RedisClusterAddresses))
		default:
			logger.Info(fmt.Sprintf("Utilizing Redis DB in single-node mode, host: %s, port: %d", userConfig.RedisHost, userConfig.RedisPort))
		}
		database, err = redis.NewWithConfig(redis.Config{
			Hostname:           userConfig.RedisHost,
			Port:               userConfig.RedisPort,
			Password:           userConfig.RedisPassword,
			Username:           userConfig.RedisUsername,
			TLSEnabled:         userConfig.RedisTLSEnabled,
			InsecureSkipVerify: userConfig.RedisInsecureSkipVerify,
			DB:                 userConfig.RedisDB,
			ClusterAddresses:   clusterAddrs,
		})
		if err != nil {
			return nil, err
		}
	case "boltdb":
		logger.Info("Utilizing BoltDB")
		database, err = boltdb.New(userConfig.DataDir)
		if err != nil {
			return nil, err
		}
	}

Comment thread server/core/redis/redis.go
Comment thread server/core/redis/redis.go
Comment thread runatlantis.io/docs/server-configuration.md
Comment thread server/server.go
Comment thread server/server.go
Comment thread server/server.go
Comment thread server/core/redis/redis.go
@daanvinken daanvinken force-pushed the redis-cluster-support branch from 1d19bc8 to 748f230 Compare March 10, 2026 14:40
Add support for Redis Cluster deployments alongside the existing
single-node mode. The mode is auto-detected from configuration:
when --redis-cluster-addresses is set, cluster mode is used.

- Introduce Config struct and NewWithConfig() constructor
- Deprecate old New() function (wraps NewWithConfig)
- Change RedisDB.client type to redis.Cmdable interface
- Replace Keys() with Scan() in lock key migration for cluster compat
- Add --redis-username flag for ACL-based authentication
- Add --redis-cluster-addresses flag for cluster node addresses
- Update server-configuration.md with new flag documentation

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Kubernetes automatically creates environment variables for services
in the same namespace. When a service named e.g. "atlantis-redis"
exists, K8s sets ATLANTIS_REDIS_PORT=tcp://10.x.x.x:6379 which
viper picks up via the ATLANTIS_ env prefix and fails to parse as
an integer.

Detect these URL-formatted values on int flags before unmarshalling
and reset them to their defaults.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
Signed-off-by: Daan Vinken <daanvinken@tythus.com>
- Trim whitespace from comma-separated cluster addresses
- Check Set/Del errors during lock key migration
- Use interface assertion for Close() instead of type switch
- Include connection address in ping error message
- Add 5s timeout to Ping() context
- Log sanitized cluster addresses instead of raw input
- TrimSpace in NewWithConfig address filter
- Fix docs: redis-password is optional

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
@daanvinken daanvinken force-pushed the redis-cluster-support branch from 748f230 to e62e255 Compare March 10, 2026 21:39
@matthewmrichter
Copy link
Copy Markdown
Contributor

This would be really wonderful to be able to get horizontal autoscaling going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation feature New functionality/enhancement go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants