Add standalone ResolverRuleAssociation CRD#65
Conversation
The existing inline ResolverRule.Spec.Associations field requires ownership of the ResolverRule CR to manage VPC associations. This does not work for shared rules (e.g. rules shared via AWS RAM or managed by a central platform team) where individual clusters only need to associate an existing rule with their VPC. This adds a standalone ResolverRuleAssociation CRD that enables fleet-scale adoption: each cluster independently associates a shared rule with its own VPC without needing to manage the rule itself. AWS API mapping: - Create: AssociateResolverRule - Read: GetResolverRuleAssociation - Delete: DisassociateResolverRule - Update: N/A (all fields are immutable) Spec fields: - resolverRuleID (required, immutable, with ResolverRule reference support) - vpcID (required, immutable) - name (optional, immutable) Status fields: - id (primary key, AWS-assigned) - status - statusMessage Includes E2E tests for: - Basic create/delete lifecycle - Multiple VPCs associated with the same rule
|
Hi @Mais316. Thanks for your PR. I'm waiting for a aws-controllers-k8s member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@michaelhtm appreciate your kind review when you have time please 🙏 |
|
@michaelhtm gentle reminder 🙏 |
|
@michaelhtm adding more details here as well as per slack Thanks again :) for the pointer to ResolverRule. Let me explain our case so it's clearer why I think we still need a standalone association. Our setup is hub-and-spoke. The resolver endpoints and rules live in the hub VPC (DMZ subnet, since that's the only place allowed to send DNS queries out). The rules are then shared to all spoke accounts via AWS RAM. So from a spoke's perspective the rule is already there, read-only. We can't create it locally because the endpoint has to sit in the DMZ. The problem with the current This is the same reason Terraform models it as a standalone resource ( So what the PR adds is a separate One question back to you :) am I right that with the current CRD the only path for our case would be to re-adopt the same shared rule in every spoke and let them race on the |
|
As per Trevor Feedback on slack I've addressed this by adding an annotation: metadata:
annotations:
route53resolver.services.k8s.aws/manage-associations: "false"When set to Guarded paths:
When the annotation is absent or any value other than I also added an E2E test (
|
f275965 to
a5d56b9
Compare
|
@michaelhtm annotation change has been reverted, kindly review when you have time 🙏 |
|
@michaelhtm please review when you have time :) |
- Change GetResolverRuleAssociation operation_type from Read to Read_One so controller uses Get API instead of List (fixes incorrect first-match) - Add VPC cross-resource reference on VPCId field (ec2 VPC) - Add synced.when condition for Status.Status == COMPLETE - Regenerated controller code
|
/test all |
… types The standalone ResolverRuleAssociation CRD introduced a naming conflict — ResolverRuleAssociation now refers to the full Kubernetes resource type. Update getAttachedVPC to use ResolverRuleAssociation_SDK, which is the inline struct used by ResolverRuleSpec.Associations.
|
/test all |
|
@Mais316: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test all |
Could u plz trigger new test, did dome changes |
- Add sdk_create_post_set_output hook to requeue after 5s when association status is not COMPLETE, ensuring faster convergence instead of relying on the default 30s runtime requeue interval. - Add sdk_read_one_post_set_output hook for the same requeue behavior on subsequent reconcile loops. - Wire hooks in generator.yaml under ResolverRuleAssociation resource. - Add hooks.go with requeueWhileCreating variable to avoid time import in generated sdk.go. - Bump ASSOCIATION_TIMEOUT_SECONDS from 90s to 180s to give the controller adequate time with the requeue cadence. - Add AssociationTestVPC to bootstrap resources so test_multiple_vpcs_same_rule uses a pre-provisioned VPC instead of calling create_vpc at runtime (which hit VpcLimitExceeded in the shared CI account). - Remove unused boto3 import from test file.
|
/test all |
| @@ -0,0 +1,3 @@ | |||
| if ko.Status.Status != nil && *ko.Status.Status != "COMPLETE" { | |||
There was a problem hiding this comment.
@Mais316 This hooks should be unnecessary when using synced:when. When using that config in generator.yaml code is generated to evaluate whether or not the resource is in a synced status and requeue if it is not. From the e2e test logs while the polling logic did time out we do see the test ResolverRuleAssociation reach a synced (COMPLETE) state in the controller logs.
ref = CustomResourceReference(group='route53resolver.services.k8s.aws', version='v1alpha1', plural='resolverruleassociations', name='rule-assoc-88rt14u19eq7eeexuls4k', namespace='default')
timeout = 90
def wait_for_association_complete(ref, timeout=ASSOCIATION_TIMEOUT_SECONDS):
"""Poll the CR until status.status == COMPLETE or timeout."""
deadline = time.time() + timeout
while time.time() < deadline:
cr = k8s.get_resource(ref)
if cr and cr.get('status', {}).get('status') == 'COMPLETE':
return cr
time.sleep(10)
> pytest.fail(f"Association {ref.name} did not reach COMPLETE within {timeout}s")
E Failed: Association rule-assoc-88rt14u19eq7eeexuls4k did not reach COMPLETE within 90s
2026-06-01T22:38:28.689Z DEBUG ackrt patched resource status {"kind": "ResolverRuleAssociation", "namespace": "default", "name": "rule-assoc-88rt14u19eq7eeexuls4k", "account": "", "role": "", "": "us-west-2", "is_adopted": false, "generation": 1, "json": "{\"metadata\":{\"resourceVersion\":\"991\"},\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2026-06-01T22:38:28Z\",\"message\":\"Resource synced successfully\",\"reason\":\"\",\"status\":\"True\",\"type\":\"ACK.ResourceSynced\"},{\"lastTransitionTime\":\"2026-06-01T22:38:28Z\",\"message\":\"Resource synced successfully\",\"reason\":\"\",\"status\":\"True\",\"type\":\"Ready\"}],\"status\":\"COMPLETE\",\"statusMessage\":\"\"}}"}
There was a problem hiding this comment.
@knottnt Removed unnecessary hooks synced:when already handles the requeue via IsSynced(). The e2e timeout was 90s in the previous run but the test code has 180s now, so it should pass on the next CI run 🤞
The sdk_create_post_set_output and sdk_read_one_post_set_output hooks manually requeue when Status != COMPLETE, but this is already handled by the synced:when configuration in generator.yaml which generates the IsSynced() method in the runtime. Addresses reviewer feedback in PR aws-controllers-k8s#65.
|
@knottnt please trigger another test, when you have time :) |
|
/test all |
pytest-xdist runs test_create_delete and test_multiple_vpcs_same_rule concurrently. Both fixtures created a SYSTEM rule for the same hardcoded domain (test-assoc.example.com) and both tried to associate with ResolverEndpointVPC. AWS enforces one rule per domain per VPC, so the second association never transitions to COMPLETE. Fix: derive a unique domain from the random rule name so parallel tests never collide on the same domain+VPC pair.
|
@knottnt one passed the other one failed, pushed a fix, please trigger another test :) |
|
/test all |
|
@michaelhtm all is passed now :) |
|
Thanks @Mais316 |
|
/label release/minor |
done :) |
|
@michaelhtm please advise if it looks good. |
|
/test all |
|
/ok-to-test |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Mais316, michaelhtm The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The existing inline ResolverRule.Spec.Associations field requires ownership of the ResolverRule CR to manage VPC associations. This does not work for shared rules (e.g. rules shared via AWS RAM or managed by a central platform team) where individual clusters only need to associate an existing rule with their VPC.
This adds a standalone ResolverRuleAssociation CRD that enables fleet-scale adoption: each cluster independently associates a shared rule with its own VPC without needing to manage the rule itself.
AWS API mapping:
Spec fields:
Status fields:
Includes E2E tests for:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.