feat(hpa): Add horizontal pod autoscaling for DragonFly instances by smunukutla-mycarrier · Pull Request #343 · dragonflydb/dragonfly-operator

smunukutla-mycarrier · 2025-08-14T21:48:25Z

This pull request resolves #320. It adds autoscaling support to Dragonfly operator, allowing Horizontal Pod Autoscaler to be used with Dragonfly instances.

Autoscaling (HPA) support

Added AutoscalerSpec to DragonflySpec in dragonfly_types.go, allowing users to configure HPA settings such as enabling autoscaling, min/max replicas, target CPU/memory utilization, scaling behavior, and metrics.
Implemented logic in dragonfly_instance.go to create, update, or delete HPA resources based on the AutoscalerSpec, including cleanup of HPA when autoscaling is disabled and preservation of replica counts during transitions.
Updated RBAC rules and controller setup to manage HPA resources, including new permissions in role.yaml and controller ownership of HorizontalPodAutoscaler objects.

API and code generation

Added imports and deepcopy methods for HPA types in dragonfly_types.go and zz_generated.deepcopy.go to support the new autoscaler configuration.

Documentation and examples

Updated README.md to advertise HPA support as a main feature.
Added a sample manifest v1alpha1_dragonfly_autoscaler.yaml demonstrating how to configure autoscaling for a Dragonfly instance.

E2E tests and minor improvements

Improved secret cleanup in e2e tests and added ImagePullPolicy to several test cases for consistency.
Minor log formatting improvement in cmd/main.go.

List of E2E tests for autoscaler

Should create Dragonfly instance with autoscaler enabled
Should create HPA resource
Should create StatefulSet with correct initial replica count
Should wait for all pods to be ready and have correct roles
Should preserve HPA-modified replica count
Should configure new pod as replica
Should handle master failover when scaled
Should handle replica deletion and recreation
Should handle HPA scaling down to minimum replicas
Should handle HPA scaling up to maximum replicas
Should preserve HPA scaling during operator reconciliation
Should handle rapid scaling events
Should update HPA when autoscaler spec changes
Should disable autoscaler and remove HPA
Should support custom metrics configuration
Should handle multiple concurrent pod deletions

… test-single in makefile

Signed-off-by: Siva Munukutla <smunukutla@mycarrier.io>

Abhra303 · 2025-08-18T06:30:48Z

Hi @smunukutla-mycarrier, thanks for the PR! can you please resolve the conflicts?

Signed-off-by: Siva Munukutla <smunukutla@mycarrier.io>

smunukutla-mycarrier · 2025-08-18T12:45:48Z

@Abhra303 done. Thanks!

smunukutla-mycarrier · 2025-08-22T14:34:06Z

@Abhra303 I would really appreciate it if we could merge this soon. We're looking forward to rolling out autoscaling for dragonfly. Please let me know if there any issues/concerns. Thanks! :)

ldiego73 · 2025-08-28T14:42:30Z

@smunukutla-mycarrier, when do you think we can merge this PR? Where I am working, this is a feature we want to implement.

smunukutla-mycarrier · 2025-08-29T02:33:29Z

@ldiego73 I'm not a maintainer on this repo. I'm also waiting for a review from the maintainers, when they get a chance.

myc-jhicks · 2025-09-02T15:34:15Z

bumping this, please merge.

bcarlock-mycarrier · 2025-09-02T16:33:25Z

Adding my support for this feature. Please merge.

Abhra303 · 2025-09-10T07:18:25Z

Hi @smunukutla-mycarrier, the test is not fixed yet. Also can you elaborate more about the reason for supporting HPA? Dragonfly doesn't support multiple masters. So this can only scale reads. Is this the reason you want to support hpa for?

smunukutla-mycarrier · 2025-09-12T02:38:15Z

@Abhra303 thanks for the feedback. Yes, we need to be able to (horizontally) scale secondary nodes automatically based on traffic/utilization. I'll push a fix for the tests this weekend, hopefully.

Signed-off-by: Siva Munukutla <smunukutla@mycarrier.io>

smunukutla-mycarrier · 2025-10-01T16:28:37Z

@Abhra303
I’ve pushed a fix for the issue:
CustomResourceDefinition "dragonflies.dragonflydb.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes.
The root cause was an overly long description in the CRD. I’ve set crd:maxDescLen to 0 to prevent hitting the annotation length limit again, especially since it may grow further over time as we keep adding features.

I also merged latest from main branch and resolved conflicts. Please run the workflow again when you have some time.

Mwogi · 2025-10-07T21:55:06Z

When is this being merged? It will be a great addition. Autoscaling is a great way to have peace of mind. I don't want to be worried about needing someone scaling othe pods manually when traffic increases.

ashotland · 2025-10-19T10:18:02Z

api/v1alpha1/dragonfly_types.go

+	// +kubebuilder:validation:Maximum=100
+	TargetCPUUtilizationPercentage *int32 `json:"targetCPUUtilizationPercentage,omitempty"`
+
+	// Target memory utilization percentage


does this makes sense ?

how would adding replica reduce memory usage ? as data is replicated to all.

100% agreed. It doesn't make sense to provide memory utilization as a target. I've updated it along with the tests.

ashotland · 2025-10-19T10:18:18Z

api/v1alpha1/dragonfly_types.go

+	// +kubebuilder:validation:Optional
+	// +kubebuilder:validation:Minimum=1
+	// +kubebuilder:validation:Maximum=100
+	TargetCPUUtilizationPercentage *int32 `json:"targetCPUUtilizationPercentage,omitempty"`


Note that adding replicas can only increase your read throughput

Yes. The feature to scale horizontally is to accommodate cache reads as traffic increases - reducing load on master and existing replicas. It would be great to handle that automatically rather than updating replicas manually.

@ashotland please review and run the workflow when you get a chance. The tests passed for me locally.

…c and related files

Copilot

Pull request overview

This pull request adds Horizontal Pod Autoscaling (HPA) support to the Dragonfly operator, enabling automatic scaling of Dragonfly instances based on CPU/memory utilization or custom metrics. The implementation includes API changes to support autoscaler configuration, controller logic to manage HPA resources, proper replica count handling during autoscaling transitions, and comprehensive e2e tests.

Key Changes:

Added AutoscalerSpec to the Dragonfly CRD with support for min/max replicas, metrics, and scaling behavior
Implemented HPA resource generation and lifecycle management in the controller
Enhanced StatefulSet replica handling to preserve HPA-managed replica counts
Added RBAC permissions for managing HPA resources

Reviewed changes

Copilot reviewed 13 out of 16 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
api/v1alpha1/dragonfly_types.go	Added AutoscalerSpec type with validation, preservation annotations for Kubernetes objects
api/v1alpha1/zz_generated.deepcopy.go	Generated deepcopy methods for AutoscalerSpec
internal/resources/resources.go	Implemented HPA generation logic and StatefulSet replica initialization based on autoscaler config
internal/controller/dragonfly_instance.go	Added HPA deletion logic and StatefulSet replica preservation during reconciliation
internal/controller/dragonfly_controller.go	Added HPA ownership tracking and requeue logic for HPA deletion
manifests/dragonfly-operator.yaml	Updated CRD with autoscaler schema (removed descriptions to reduce size) and RBAC rules
config/rbac/role.yaml	Added HPA resource permissions
config/samples/v1alpha1_dragonfly_autoscaler.yaml	Added sample autoscaler configuration
e2e/*.go	Added ImagePullPolicy and improved secret cleanup in tests
README.md	Updated to advertise HPA support
Makefile	Added test-single target and maxDescLen=0 to reduce CRD size
cmd/main.go	Minor log formatting fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Abhradeep Chakraborty <abhradeep@dragonflydb.io>

Abhra303 · 2026-01-14T08:47:21Z

@smunukutla-mycarrier I see some fields in the manifest files are missing. Can you please check?

smunukutla-mycarrier added 8 commits August 13, 2025 12:15

add horizontal autoscaler

35ca48d

Merge branch 'dragonflydb:main' into smunukutla-autoscale-dragonflydb

0c04c77

WIP: added e2e tests for hpa and fixed replica control issue

4a5874e

fix hpa deletion issue

591104e

remove unnecessary tests

9f14841

fix bug with transition between autoscaler to just using replica, add…

fde45ad

… test-single in makefile

cleanup makefile

dd177ea

Merge branch 'main' into smunukutla-autoscale-dragonflydb

4f745a4

Signed-off-by: Siva Munukutla <smunukutla@mycarrier.io>

Merge branch 'main' into smunukutla-autoscale-dragonflydb

5958933

Signed-off-by: Siva Munukutla <smunukutla@mycarrier.io>

smunukutla-mycarrier added 4 commits October 1, 2025 10:23

fix: set crd:MaxDescLen to 0 to avoid annotation max length issue

7459a3b

Merge branch 'main' into smunukutla-autoscale-dragonflydb

9ed5a35

Signed-off-by: Siva Munukutla <smunukutla@mycarrier.io>

fix merge issue

a3f278f

updated manifests - removed descriptions to avoid max length issue

84d61d7

ashotland reviewed Oct 19, 2025

View reviewed changes

refactor: remove targetMemoryUtilizationPercentage from AutoscalerSpe…

36fda64

…c and related files

smunukutla-mycarrier requested a review from ashotland November 6, 2025 05:58

Abhra303 requested a review from Copilot December 15, 2025 07:50

Copilot started reviewing on behalf of Abhra303 December 15, 2025 07:50 View session

Copilot AI reviewed Dec 15, 2025

View reviewed changes

Merge branch 'main' into smunukutla-autoscale-dragonflydb

09a8ac4

Signed-off-by: Abhradeep Chakraborty <abhradeep@dragonflydb.io>

Conversation

smunukutla-mycarrier commented Aug 14, 2025

Autoscaling (HPA) support

API and code generation

Documentation and examples

E2E tests and minor improvements

List of E2E tests for autoscaler

Uh oh!

Abhra303 commented Aug 18, 2025

Uh oh!

smunukutla-mycarrier commented Aug 18, 2025

Uh oh!

smunukutla-mycarrier commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldiego73 commented Aug 28, 2025

Uh oh!

smunukutla-mycarrier commented Aug 29, 2025

Uh oh!

myc-jhicks commented Sep 2, 2025

Uh oh!

bcarlock-mycarrier commented Sep 2, 2025

Uh oh!

Abhra303 commented Sep 10, 2025

Uh oh!

smunukutla-mycarrier commented Sep 12, 2025

Uh oh!

smunukutla-mycarrier commented Oct 1, 2025

Uh oh!

Mwogi commented Oct 7, 2025

Uh oh!

ashotland Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

smunukutla-mycarrier Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

ashotland Oct 19, 2025

Choose a reason for hiding this comment

Uh oh!

smunukutla-mycarrier Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

smunukutla-mycarrier Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Abhra303 commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

smunukutla-mycarrier commented Aug 22, 2025 •

edited

Loading

smunukutla-mycarrier Nov 6, 2025 •

edited

Loading