Skip to content

Fix provisioning fallback when first template exhausted (#2005)#2010

Open
zendesk-abhijeet wants to merge 2 commits intojenkinsci:masterfrom
zendesk-abhijeet:abhijeet/issue-2005-use-all-temp-with-same-label
Open

Fix provisioning fallback when first template exhausted (#2005)#2010
zendesk-abhijeet wants to merge 2 commits intojenkinsci:masterfrom
zendesk-abhijeet:abhijeet/issue-2005-use-all-temp-with-same-label

Conversation

@zendesk-abhijeet
Copy link
Copy Markdown
Contributor

@zendesk-abhijeet zendesk-abhijeet commented May 5, 2026

Fix provisioning fallback to next template when first is exhausted

Issue - #2005

When multiple EC2 templates share the same label, Jenkins would fail to provision agents if the first template reached its instance capacity. The provisioning logic would create PlannedNodes that resolved to null instead of falling back to the next available template with the same label.

Reproduction scenario:

  1. Configure two templates with label my_agent:
    • Template 1 (base): instanceCap = 1
    • Template 2 (spike): instanceCap = 10
  2. Template 1 reaches capacity (1 instance running)
  3. Queue a job requiring my_agent
  4. Expected: Provision from Template 2
  5. Actual: Job stays in queue, no agent provisioned

This issue was a regression from version 2045.v06da_da_a_46422 where the fallback behaviour worked correctly.

Fix

Modified EC2Cloud.java to:

  1. Check capacity upfront: Before creating PlannedNodes, verify the template has available capacity using getPossibleNewSlavesCount()
  2. Skip exhausted templates: If capacity is 0, log and continue to the next matching template instead of creating failed PlannedNodes
  3. Provision logic refactored: Move capacity check and provisioning logic into the async future to avoid race conditions

Key changes:

  • Added early capacity check before creating PlannedNodes
  • Use continue statement when template has no capacity (allows fallback to next template)
  • Moved t.provision() call directly into the provisioning future with double-checked locking

Testing done

Added integration tests testProvisionFallbackToSecondTemplateWhenFirstExhausted and testMultipleTemplatesWithSameLabel that validates:

  • Multiple templates with the same label are discovered
  • Provisioning considers all matching templates
  • PlannedNodes are created when templates have capacity

Unit test results:
EC2 Plugin Fix Unit Tests

Test setup:

Template 1 - Base (label: dev_cypress_medium): cap=1, no instances
Template 2 - Spike (label: dev_cypress_medium): cap=1, no instances
Queued 3 jobs requiring label "dev_cypress_medium"
Before fix: Only base instance provisions
After fix: Both base and spike instance provisions

Manual test results:
EC2 Plugin Fix Manual Test1
EC2 Plugin Fix Manual Test2

Test Checklist

  • Integration test added in EC2CloudTest.java
  • Manual testing completed with multiple templates

Submitter checklist

  • [x ] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • [x ] Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

@zendesk-abhijeet zendesk-abhijeet marked this pull request as ready for review May 5, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant