Add feature to configure minimum instances#533
Conversation
| initCredentials(j); | ||
| client = ClientUtil.getClientFactory(j.jenkins, PROJECT_ID).computeClient(); | ||
| // CasC loaded min/spare values but credentials weren't available yet so fails; trigger provisioning now | ||
| SaveableListener.fireOnChange(j.jenkins, null); |
There was a problem hiding this comment.
Due to the credentials initialized here, the tests are throwing an exception, later this SaveableListener call makes the provisioning take place,
error log
0.586 [id=630] SEVERE c.g.j.p.c.ComputeEngineCloud#createClient: Exception when creating GCE client
hudson.AbortException: Failed to initialize HTTP transport: hudson.AbortException: Could not retrieve credentials: tiger-team-testing
at com.google.jenkins.plugins.computeengine.client.ClientUtil.getClientFactory(ClientUtil.java:59)
at com.google.jenkins.plugins.computeengine.client.ClientUtil.getClientFactory(ClientUtil.java:73)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.createClient(ComputeEngineCloud.java:190)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.getClient(ComputeEngineCloud.java:208)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.availableNodeCapacity(ComputeEngineCloud.java:387)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provisionSpares(ComputeEngineCloud.java:310)
at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.lambda$performCheck$0(MinimumInstanceChecker.java:280)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.lambda$performCheck$1(MinimumInstanceChecker.java:265)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.performCheck(MinimumInstanceChecker.java:265)
at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:144)
at com.google.jenkins.plugins.computeengine.InstanceConfiguration$OnSaveListener.onChange(InstanceConfiguration.java:1082)
at hudson.model.listeners.SaveableListener.lambda$fireOnChange$0(SaveableListener.java:90)
at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
at jenkins.util.Listeners.notify(Listeners.java:70)
at hudson.model.listeners.SaveableListener.fireOnChange(SaveableListener.java:90)
at jenkins.model.Jenkins.save(Jenkins.java:3616)
at hudson.BulkChange.commit(BulkChange.java:98)
at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:278)
at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$configureWith$6(ConfigurationAsCode.java:858)
at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:808)
at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:858)
at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:730)
at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:699)
at io.jenkins.plugins.casc.misc.JenkinsConfiguredWithCodeRule.before(JenkinsConfiguredWithCodeRule.java:46)
at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:653)
at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.lang.Thread.run(Thread.java:1583)
0.586 [id=630] WARNING jenkins.util.Listeners#lambda$notify$0
java.lang.NullPointerException: Cannot invoke "com.google.cloud.graphite.platforms.plugin.client.ComputeClient.listInstancesWithLabel(String, java.util.Map)" because the return value of "com.google.jenkins.plugins.computeengine.ComputeEngineCloud.getClient()" is null
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.availableNodeCapacity(ComputeEngineCloud.java:387)
at com.google.jenkins.plugins.computeengine.ComputeEngineCloud.provisionSpares(ComputeEngineCloud.java:310)
at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.lambda$performCheck$0(MinimumInstanceChecker.java:280)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.lambda$performCheck$1(MinimumInstanceChecker.java:265)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1939)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.performCheck(MinimumInstanceChecker.java:265)
at com.google.jenkins.plugins.computeengine.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:144)
at com.google.jenkins.plugins.computeengine.InstanceConfiguration$OnSaveListener.onChange(InstanceConfiguration.java:1082)
at hudson.model.listeners.SaveableListener.lambda$fireOnChange$0(SaveableListener.java:90)
at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
at jenkins.util.Listeners.notify(Listeners.java:70)
at hudson.model.listeners.SaveableListener.fireOnChange(SaveableListener.java:90)
at jenkins.model.Jenkins.save(Jenkins.java:3616)
at hudson.BulkChange.commit(BulkChange.java:98)
at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:278)
at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$configureWith$6(ConfigurationAsCode.java:858)
at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:808)
at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:858)
at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:730)
at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:699)
at io.jenkins.plugins.casc.misc.JenkinsConfiguredWithCodeRule.before(JenkinsConfiguredWithCodeRule.java:46)
at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:653)
at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.lang.Thread.run(Thread.java:1583)
|
/label enhancement |
You mean when #530 is enabled? |
| */ | ||
| public List<PlannedNode> provisionSpares(InstanceConfiguration config, int numberToProvision) { | ||
| List<PlannedNode> result = new ArrayList<>(); | ||
| if (Jenkins.get().isQuietingDown() || Jenkins.get().isTerminating()) { |
There was a problem hiding this comment.
Beware that isTerminating is false immediately after termination begins! It only switches to true sometime later. Longtime annoyance in core (CB-internal references upon request); could use a new API for this.
|
|
||
| @Override | ||
| public long getRecurrencePeriod() { | ||
| return recurrencePeriod.toMillis(); |
There was a problem hiding this comment.
(would be nice to improve core to allow this to be specified directly as a Duration)
| public static final class OnSaveListener extends SaveableListener { | ||
| @Override | ||
| public void onChange(Saveable o, XmlFile file) { | ||
| if (o instanceof Jenkins) { |
There was a problem hiding this comment.
Oof, there is no more specific event for a particular Cloud?
There was a problem hiding this comment.
looks like not. looked at the ec2-plugin, kubernetes plugin and also tried checking in the jenkins core if there is anything like CloudListener - none found.
- https://github.com/jenkinsci/ec2-plugin/blob/9c48b882174fe2131e7fe9611e3e03ecc68326f0/src/main/java/hudson/plugins/ec2/SlaveTemplate.java#L3255-L3263
- https://github.com/jenkinsci/kubernetes-plugin/blob/a82d78a0087e9eaa10ec2339b48891549345c1a4/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesClientProvider.java#L124-L128
perhaps we need to implement one in core maybe https://github.com/jenkinsci/jenkins/blob/b679ff1627f7f4be7c0f255399b0460b0d593e73/core/src/main/java/hudson/slaves/Cloud.java#L323, https://github.com/jenkinsci/jenkins/blob/b679ff1627f7f4be7c0f255399b0460b0d593e73/core/src/main/java/hudson/slaves/Cloud.java#L313 and maybe also https://github.com/jenkinsci/jenkins/blob/b679ff1627f7f4be7c0f255399b0460b0d593e73/core/src/main/java/jenkins/model/Jenkins.java#L4033 but only when clouds added or removed.
There was a problem hiding this comment.
Well Clouds each have their own config page now so it is not even using that endpoint to submit changes I think.
yes, right; using |
…taskCompletedWithProblems when oneShot
…atic field reference to avoid class loading and property getting set to default
…ountedAsSpare became flake after removing the casc file and setting the retention time in code - go back to casc file for deterministic test execution
|
one of the test As of now all tests are passing. The PR looks to be final with code changes. |
|
Minor concurrency note on
Impact is low — worst case an extra agent gets terminated and the next periodic check (≤10 min) replenishes it. Just flagging for awareness, not a blocker. |
I see. Thank you for noting it down. Agree on the race condition existence. Maybe the fix for race condition is to not clear the map, but decrement when agent is terminated, perhaps via |
|
facing issue with CI jenkins-infra/helpdesk#5055 |
|
retriggering build to see if it works. |
|
All looks good with this PR,
Overall all looks good imo |
|
IMPORTANT: |
Adds feature minimum number of instances and minimum number of spare instances - which can be active in a configurable time range.
This feature is implemented by referring to that existing in the ec2-plugin. The logic is almost exactly as that of ec2-plugin, adjustments are made such that code is decoupled for better testing (especially with the retention strategy, and code computation parts). Files related in ec2-plugin
oneShotconfiguration needed to be considered and race condition possibilities.UI visualization
Testing done
Automated testing
Manual testing
Submitter checklist