Skip to content

Add feature to configure minimum instances#533

Merged
jtnord merged 17 commits into
jenkinsci:developfrom
gbhat618:minimum-number-of-spare-instances
Mar 31, 2026
Merged

Add feature to configure minimum instances#533
jtnord merged 17 commits into
jenkinsci:developfrom
gbhat618:minimum-number-of-spare-instances

Conversation

@gbhat618
Copy link
Copy Markdown
Contributor

@gbhat618 gbhat618 commented Mar 27, 2026

Adds feature minimum number of instances and minimum number of spare instances - which can be active in a configurable time range.

This feature is implemented by referring to that existing in the ec2-plugin. The logic is almost exactly as that of ec2-plugin, adjustments are made such that code is decoupled for better testing (especially with the retention strategy, and code computation parts). Files related in ec2-plugin

UI visualization

Description screenshot
form with new fields image

Testing done

Automated testing

  • 6 new integration tests are added covering the integration that check all the features introduced in here.
  • 3 new unit test files added (each with several tests) that exercise the core logic part of the computation parts of minimum instances, whether to keep the instance, time range logic.

Manual testing

  • CasC based environment manual testing is conducted to ensure that the VMs are provisioned, stopping the jenkins deletes the idle instances

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

Comment thread src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineMonitor.java Outdated
initCredentials(j);
client = ClientUtil.getClientFactory(j.jenkins, PROJECT_ID).computeClient();
// CasC loaded min/spare values but credentials weren't available yet so fails; trigger provisioning now
SaveableListener.fireOnChange(j.jenkins, null);
Copy link
Copy Markdown
Contributor Author

@gbhat618 gbhat618 Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the credentials initialized here, the tests are throwing an exception, later this SaveableListener call makes the provisioning take place,

error log
   0.586 [id=630]       SEVERE  c.g.j.p.c.ComputeEngineCloud#createClient: Exception when creating GCE client
hudson.AbortException: Failed to initialize HTTP transport: hudson.AbortException: Could not retrieve credentials: tiger-team-testing
        at com.google.jenkins.plugins.computeengine.client.ClientUtil.getClientFactory(ClientUtil.java:59)
        at com.google.jenkins.plugins.computeengine.client.ClientUtil.getClientFactory(ClientUtil.java:73)
        at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.createClient(ComputeEngineCloud.java:190)
        at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.getClient(ComputeEngineCloud.java:208)
        at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.availableNodeCapacity(ComputeEngineCloud.java:387)
        at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provisionSpares(ComputeEngineCloud.java:310)
        at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.lambda$performCheck$0(MinimumInstanceChecker.java:280)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.lambda$performCheck$1(MinimumInstanceChecker.java:265)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.performCheck(MinimumInstanceChecker.java:265)
        at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:144)
        at com.google.jenkins.plugins.computeengine.InstanceConfiguration$OnSaveListener.onChange(InstanceConfiguration.java:1082)
        at hudson.model.listeners.SaveableListener.lambda$fireOnChange$0(SaveableListener.java:90)
        at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
        at jenkins.util.Listeners.notify(Listeners.java:70)
        at hudson.model.listeners.SaveableListener.fireOnChange(SaveableListener.java:90)
        at jenkins.model.Jenkins.save(Jenkins.java:3616)
        at hudson.BulkChange.commit(BulkChange.java:98)
        at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:278)
        at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$configureWith$6(ConfigurationAsCode.java:858)
        at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:808)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:858)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:730)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:699)
        at io.jenkins.plugins.casc.misc.JenkinsConfiguredWithCodeRule.before(JenkinsConfiguredWithCodeRule.java:46)
        at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:653)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.lang.Thread.run(Thread.java:1583)
   0.586 [id=630]       WARNING jenkins.util.Listeners#lambda$notify$0
java.lang.NullPointerException: Cannot invoke "com.google.cloud.graphite.platforms.plugin.client.ComputeClient.listInstancesWithLabel(String, java.util.Map)" because the return value of "com.google.jenkins.plugins.computeengine.ComputeEngineCloud.getClient()" is null
        at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.availableNodeCapacity(ComputeEngineCloud.java:387)
        at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provisionSpares(ComputeEngineCloud.java:310)
        at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.lambda$performCheck$0(MinimumInstanceChecker.java:280)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.lambda$performCheck$1(MinimumInstanceChecker.java:265)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.performCheck(MinimumInstanceChecker.java:265)
        at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:144)
        at com.google.jenkins.plugins.computeengine.InstanceConfiguration$OnSaveListener.onChange(InstanceConfiguration.java:1082)
        at hudson.model.listeners.SaveableListener.lambda$fireOnChange$0(SaveableListener.java:90)
        at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
        at jenkins.util.Listeners.notify(Listeners.java:70)
        at hudson.model.listeners.SaveableListener.fireOnChange(SaveableListener.java:90)
        at jenkins.model.Jenkins.save(Jenkins.java:3616)
        at hudson.BulkChange.commit(BulkChange.java:98)
        at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:278)
        at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$configureWith$6(ConfigurationAsCode.java:858)
        at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:808)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:858)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:730)
        at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:699)
        at io.jenkins.plugins.casc.misc.JenkinsConfiguredWithCodeRule.before(JenkinsConfiguredWithCodeRule.java:46)
        at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:653)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
        at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.lang.Thread.run(Thread.java:1583)

@gbhat618 gbhat618 changed the title Add option to configure minimum instances Add feature to configure minimum instances Mar 30, 2026
@gbhat618
Copy link
Copy Markdown
Contributor Author

/label enhancement

@gbhat618 gbhat618 marked this pull request as ready for review March 30, 2026 17:18
@jglick
Copy link
Copy Markdown
Member

jglick commented Mar 30, 2026

stopping the jenkins deletes the idle instances

You mean when #530 is enabled?

Copy link
Copy Markdown
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK from a quick read.

*/
public List<PlannedNode> provisionSpares(InstanceConfiguration config, int numberToProvision) {
List<PlannedNode> result = new ArrayList<>();
if (Jenkins.get().isQuietingDown() || Jenkins.get().isTerminating()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware that isTerminating is false immediately after termination begins! It only switches to true sometime later. Longtime annoyance in core (CB-internal references upon request); could use a new API for this.

Comment thread src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineCloud.java Outdated
Comment thread src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineMonitor.java Outdated

@Override
public long getRecurrencePeriod() {
return recurrencePeriod.toMillis();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(would be nice to improve core to allow this to be specified directly as a Duration)

Comment thread src/main/java/com/google/jenkins/plugins/computeengine/InstanceConfiguration.java Outdated
public static final class OnSaveListener extends SaveableListener {
@Override
public void onChange(Saveable o, XmlFile file) {
if (o instanceof Jenkins) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, there is no more specific event for a particular Cloud?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well Clouds each have their own config page now so it is not even using that endpoint to submit changes I think.

@gbhat618
Copy link
Copy Markdown
Contributor Author

stopping the jenkins deletes the idle instances

You mean when #530 is enabled?

yes, right; using terminateIdleDuringShutdown: true in the config.

@gbhat618
Copy link
Copy Markdown
Contributor Author

one of the test ComputeEngineCloudMinimumInstancesIT#testAgentTemporarilyOffline_notCountedAsSpare had started flaking after I tried reduce the CasC configuration test file and started to set the retention time via code - this wouldn't work because the agents are spawned immediately after jenkins is started. Now added the separate CasC file again for this particular test.

As of now all tests are passing. The PR looks to be final with code changes.

Comment thread src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineMonitor.java Outdated
@mikecirioli
Copy link
Copy Markdown

Minor concurrency note on MinimumInstanceChecker:

shouldPreserve() calls incrementPendingTerminations() outside the checking AtomicBoolean guard, while checkForMinimumInstances() calls resetPendingTerminations() inside it. Since shouldPreserve is invoked from ComputeEngineRetentionStrategy.check() on its own retention timer, there's a small race window:

  1. shouldPreserveincrementPendingTerminations(config) (counter = 1)
  2. checkForMinimumInstancesresetPendingTerminations() (counter wiped)
  3. Another shouldPreserve call sees counter = 0, allows a second termination that should have been blocked

Impact is low — worst case an extra agent gets terminated and the next periodic check (≤10 min) replenishes it. Just flagging for awareness, not a blocker.

@gbhat618
Copy link
Copy Markdown
Contributor Author

Minor concurrency note on MinimumInstanceChecker [...]

I see. Thank you for noting it down. Agree on the race condition existence. Maybe the fix for race condition is to not clear the map, but decrement when agent is terminated, perhaps via NodeListener.onDeleted. I am thinking to leave it at here for now, given the periodic task will offset if a spare gets deleted due to the race condition.

@gbhat618
Copy link
Copy Markdown
Contributor Author

facing issue with CI jenkins-infra/helpdesk#5055

@gbhat618
Copy link
Copy Markdown
Contributor Author

retriggering build to see if it works.

@gbhat618
Copy link
Copy Markdown
Contributor Author

All looks good with this PR,

  • reran the integration tests — all 6 are passing
  • did a manual interactive test
    • by not configuring minimum instances - no spares created.
    • configuring min instances on Monday (today is Tuesday) - no spares created
    • configure today - instance created
    • remove today and keep yesterday - spares deleted.

Overall all looks good imo

@gbhat618
Copy link
Copy Markdown
Contributor Author

IMPORTANT: /label enhancement

@jtnord jtnord added the enhancement New feature or request label Mar 31, 2026
@jtnord jtnord merged commit 3634530 into jenkinsci:develop Mar 31, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants