Skip to content

Add standalone ResolverRuleAssociation CRD#65

Merged
ack-prow[bot] merged 14 commits into
aws-controllers-k8s:mainfrom
Mais316:main
Jun 2, 2026
Merged

Add standalone ResolverRuleAssociation CRD#65
ack-prow[bot] merged 14 commits into
aws-controllers-k8s:mainfrom
Mais316:main

Conversation

@Mais316

@Mais316 Mais316 commented May 24, 2026

Copy link
Copy Markdown
Contributor

The existing inline ResolverRule.Spec.Associations field requires ownership of the ResolverRule CR to manage VPC associations. This does not work for shared rules (e.g. rules shared via AWS RAM or managed by a central platform team) where individual clusters only need to associate an existing rule with their VPC.

This adds a standalone ResolverRuleAssociation CRD that enables fleet-scale adoption: each cluster independently associates a shared rule with its own VPC without needing to manage the rule itself.

AWS API mapping:

  • Create: AssociateResolverRule
  • Read: GetResolverRuleAssociation
  • Delete: DisassociateResolverRule
  • Update: N/A (all fields are immutable)

Spec fields:

  • resolverRuleID (required, immutable, with ResolverRule reference support)
  • vpcID (required, immutable)
  • name (optional, immutable)

Status fields:

  • id (primary key, AWS-assigned)
  • status
  • statusMessage

Includes E2E tests for:

  • Basic create/delete lifecycle
  • Multiple VPCs associated with the same rule

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

The existing inline ResolverRule.Spec.Associations field requires ownership
of the ResolverRule CR to manage VPC associations. This does not work for
shared rules (e.g. rules shared via AWS RAM or managed by a central platform
team) where individual clusters only need to associate an existing rule with
their VPC.

This adds a standalone ResolverRuleAssociation CRD that enables fleet-scale
adoption: each cluster independently associates a shared rule with its own
VPC without needing to manage the rule itself.

AWS API mapping:
- Create: AssociateResolverRule
- Read:   GetResolverRuleAssociation
- Delete: DisassociateResolverRule
- Update: N/A (all fields are immutable)

Spec fields:
- resolverRuleID (required, immutable, with ResolverRule reference support)
- vpcID (required, immutable)
- name (optional, immutable)

Status fields:
- id (primary key, AWS-assigned)
- status
- statusMessage

Includes E2E tests for:
- Basic create/delete lifecycle
- Multiple VPCs associated with the same rule
@ack-prow ack-prow Bot requested review from gustavodiaz7722 and sapphirew May 24, 2026 17:27
@ack-prow ack-prow Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 24, 2026
@ack-prow

ack-prow Bot commented May 24, 2026

Copy link
Copy Markdown

Hi @Mais316. Thanks for your PR.

I'm waiting for a aws-controllers-k8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Mais316

Mais316 commented May 24, 2026

Copy link
Copy Markdown
Contributor Author

@michaelhtm appreciate your kind review when you have time please 🙏

@Mais316

Mais316 commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

@michaelhtm gentle reminder 🙏

@Mais316

Mais316 commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

@michaelhtm adding more details here as well as per slack

Thanks again :) for the pointer to ResolverRule. Let me explain our case so it's clearer why I think we still need a standalone association.

Our setup is hub-and-spoke. The resolver endpoints and rules live in the hub VPC (DMZ subnet, since that's the only place allowed to send DNS queries out). The rules are then shared to all spoke accounts via AWS RAM. So from a spoke's perspective the rule is already there, read-only. We can't create it locally because the endpoint has to sit in the DMZ.
What each spoke actually needs is just the association of its own VPC to that shared rule, nothing else.

The problem with the current ResolverRule CRD and its inline Associations array is that we have one build module that provisions VPC + EKS per spoke. Every spoke runs the same code against the same shared rule. If each cluster adopts the rule and writes into the Associations array, they end up fighting over it. Spoke A writes its VPC, spoke B's reconcile loop removes it and writes its own, and so on. Our e2e build can't converge because of this. On top of that the rule isn't really owned by the spoke, it's owned by the platform team in the hub, so spokes shouldn't be authoritative over its spec at all.

This is the same reason Terraform models it as a standalone resource (aws_route53_resolver_rule_association), separate from the rule itself. Each spoke manages only its own association, no shared mutable array. That worked for us before we moved to ACK and it's the shape this PR mirrors.

So what the PR adds is a separate ResolverRuleAssociation CRD where each spoke creates only its own association object, the ResolverRule CR doesn't need to exist in the spoke cluster, and fields are immutable to match the AWS API (AssociateResolverRule / DisassociateResolverRule, no update path).

One question back to you :) am I right that with the current CRD the only path for our case would be to re-adopt the same shared rule in every spoke and let them race on the Associations array? If there's a pattern I'm missing to handle cross-account shared-rule association with ResolverRule alone, I'd be glad to learn it. But from the API shape and our e2e results, a standalone association CRD looks like the natural fit, the same way the AWS API itself models it.
Happy to iterate on the design.

@Mais316

Mais316 commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

@michaelhtm

As per Trevor Feedback on slack

I've addressed this by adding an annotation:

metadata:
  annotations:
    route53resolver.services.k8s.aws/manage-associations: "false"

When set to "false" on a ResolverRule CR, the controller completely skips managing the spec.associations field — no reading, creating, updating, or deleting associations via that CR. This defers all association lifecycle to standalone ResolverRuleAssociation CRs.

Guarded paths:

  • sdkFind — skips ListResolverRuleAssociations (won't populate spec.associations)
  • sdkCreate — skips creating inline associations after rule creation
  • sdkDelete — skips disassociating VPCs before rule deletion (standalone CRs handle their own cleanup via DisassociateResolverRule)
  • customUpdate — skips syncAssociation on spec diff

When the annotation is absent or any value other than "false", the existing inline behavior is fully preserved (backward-compatible).

I also added an E2E test (test_manage_associations_annotation.py) that:

  1. Creates a ResolverRule with manage-associations: "false"
  2. Creates a standalone ResolverRuleAssociation targeting the same rule
  3. Triggers a reconcile on the rule (via tag patch)
  4. Verifies the standalone association is not removed

@Mais316 Mais316 force-pushed the main branch 2 times, most recently from f275965 to a5d56b9 Compare May 29, 2026 22:22
@Mais316

Mais316 commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

@michaelhtm annotation change has been reverted, kindly review when you have time 🙏

Comment thread apis/v1alpha1/generator.yaml Outdated
Comment thread apis/v1alpha1/generator.yaml
@Mais316

Mais316 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@michaelhtm please review when you have time :)

@knottnt knottnt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mais316 Left a few comments. Just note, any changes to the generator.yaml should be applied to the generator.yaml file in the root of the repository.

Comment thread pkg/resource/resolver_rule_association/sdk.go Outdated
- Change GetResolverRuleAssociation operation_type from Read to Read_One
  so controller uses Get API instead of List (fixes incorrect first-match)
- Add VPC cross-resource reference on VPCId field (ec2 VPC)
- Add synced.when condition for Status.Status == COMPLETE
- Regenerated controller code
@knottnt

knottnt commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

/test all

Comment thread apis/v1alpha1/types.go
… types

The standalone ResolverRuleAssociation CRD introduced a naming conflict —
ResolverRuleAssociation now refers to the full Kubernetes resource type.
Update getAttachedVPC to use ResolverRuleAssociation_SDK, which is the
inline struct used by ResolverRuleSpec.Associations.
@Mais316

Mais316 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

/test all

@ack-prow

ack-prow Bot commented Jun 1, 2026

Copy link
Copy Markdown

@Mais316: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/test all

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knottnt

knottnt commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

/test all

@Mais316

Mais316 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

/test all

Could u plz trigger new test, did dome changes

- Add sdk_create_post_set_output hook to requeue after 5s when association
  status is not COMPLETE, ensuring faster convergence instead of relying
  on the default 30s runtime requeue interval.
- Add sdk_read_one_post_set_output hook for the same requeue behavior on
  subsequent reconcile loops.
- Wire hooks in generator.yaml under ResolverRuleAssociation resource.
- Add hooks.go with requeueWhileCreating variable to avoid time import
  in generated sdk.go.
- Bump ASSOCIATION_TIMEOUT_SECONDS from 90s to 180s to give the controller
  adequate time with the requeue cadence.
- Add AssociationTestVPC to bootstrap resources so test_multiple_vpcs_same_rule
  uses a pre-provisioned VPC instead of calling create_vpc at runtime (which
  hit VpcLimitExceeded in the shared CI account).
- Remove unused boto3 import from test file.
@knottnt

knottnt commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

/test all

@@ -0,0 +1,3 @@
if ko.Status.Status != nil && *ko.Status.Status != "COMPLETE" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mais316 This hooks should be unnecessary when using synced:when. When using that config in generator.yaml code is generated to evaluate whether or not the resource is in a synced status and requeue if it is not. From the e2e test logs while the polling logic did time out we do see the test ResolverRuleAssociation reach a synced (COMPLETE) state in the controller logs.

 ref = CustomResourceReference(group='route53resolver.services.k8s.aws', version='v1alpha1', plural='resolverruleassociations', name='rule-assoc-88rt14u19eq7eeexuls4k', namespace='default')
timeout = 90

    def wait_for_association_complete(ref, timeout=ASSOCIATION_TIMEOUT_SECONDS):
        """Poll the CR until status.status == COMPLETE or timeout."""
        deadline = time.time() + timeout
        while time.time() < deadline:
            cr = k8s.get_resource(ref)
            if cr and cr.get('status', {}).get('status') == 'COMPLETE':
                return cr
            time.sleep(10)
>       pytest.fail(f"Association {ref.name} did not reach COMPLETE within {timeout}s")
E       Failed: Association rule-assoc-88rt14u19eq7eeexuls4k did not reach COMPLETE within 90s

2026-06-01T22:38:28.689Z	DEBUG	ackrt	patched resource status	{"kind": "ResolverRuleAssociation", "namespace": "default", "name": "rule-assoc-88rt14u19eq7eeexuls4k", "account": "", "role": "", "": "us-west-2", "is_adopted": false, "generation": 1, "json": "{\"metadata\":{\"resourceVersion\":\"991\"},\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2026-06-01T22:38:28Z\",\"message\":\"Resource synced successfully\",\"reason\":\"\",\"status\":\"True\",\"type\":\"ACK.ResourceSynced\"},{\"lastTransitionTime\":\"2026-06-01T22:38:28Z\",\"message\":\"Resource synced successfully\",\"reason\":\"\",\"status\":\"True\",\"type\":\"Ready\"}],\"status\":\"COMPLETE\",\"statusMessage\":\"\"}}"}

@Mais316 Mais316 Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knottnt Removed unnecessary hooks synced:when already handles the requeue via IsSynced(). The e2e timeout was 90s in the previous run but the test code has 180s now, so it should pass on the next CI run 🤞

Mais316 added 2 commits June 2, 2026 12:53
The sdk_create_post_set_output and sdk_read_one_post_set_output hooks
manually requeue when Status != COMPLETE, but this is already handled
by the synced:when configuration in generator.yaml which generates
the IsSynced() method in the runtime.

Addresses reviewer feedback in PR aws-controllers-k8s#65.
@knottnt knottnt self-assigned this Jun 2, 2026
@Mais316

Mais316 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@knottnt please trigger another test, when you have time :)

@michaelhtm

Copy link
Copy Markdown
Member

/test all

pytest-xdist runs test_create_delete and test_multiple_vpcs_same_rule
concurrently. Both fixtures created a SYSTEM rule for the same hardcoded
domain (test-assoc.example.com) and both tried to associate with
ResolverEndpointVPC. AWS enforces one rule per domain per VPC, so the
second association never transitions to COMPLETE.

Fix: derive a unique domain from the random rule name so parallel tests
never collide on the same domain+VPC pair.
@Mais316

Mais316 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@knottnt one passed the other one failed, pushed a fix, please trigger another test :)

[gw3]�[36m [ 80%] �[0m�[32mPASSED�[0m tests/test_resolver_rule_association.py::TestResolverRuleAssociation::test_create_delete 
[gw4]�[36m [100%] �[0m�[31mFAILED�[0m tests/test_resolver_rule_association.py::TestResolverRuleAssociation::test_multiple_vpcs_same_rule 

@michaelhtm

Copy link
Copy Markdown
Member

/test all

@Mais316

Mais316 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@michaelhtm all is passed now :)

@michaelhtm

michaelhtm commented Jun 2, 2026

Copy link
Copy Markdown
Member

Thanks @Mais316
Can we add a documentation.yaml where we can put the Associations limitation?
It should have been introduced in the latest code-generator commit
aws-controllers-k8s/code-generator#710

@michaelhtm

Copy link
Copy Markdown
Member

/label release/minor

@ack-prow ack-prow Bot added the release/minor Indicates this PR should trigger a minor version release on merge. label Jun 2, 2026
@Mais316

Mais316 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @Mais316 Can we add a documentation.yaml where we can put the Associations limitation? It should have been introduced in the latest code-generator commit aws-controllers-k8s/code-generator#710

done :)

@Mais316

Mais316 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@michaelhtm please advise if it looks good.

@michaelhtm michaelhtm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
/ok-to-test
/lgtm

@michaelhtm

Copy link
Copy Markdown
Member

/test all

@michaelhtm

Copy link
Copy Markdown
Member

/ok-to-test
/lgtm

@ack-prow ack-prow Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. lgtm Indicates that a PR is ready to be merged. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 2, 2026
@ack-prow

ack-prow Bot commented Jun 2, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Mais316, michaelhtm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ack-prow ack-prow Bot added the approved label Jun 2, 2026
@ack-prow ack-prow Bot merged commit 7f3dbec into aws-controllers-k8s:main Jun 2, 2026
9 checks passed
ack-prow Bot pushed a commit that referenced this pull request Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release/minor Indicates this PR should trigger a minor version release on merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants