Skip to content

support fail fast for node removal#998

Closed
car-roll wants to merge 5 commits intojenkinsci:masterfrom
car-roll:extendAbstractCloudSlave
Closed

support fail fast for node removal#998
car-roll wants to merge 5 commits intojenkinsci:masterfrom
car-roll:extendAbstractCloudSlave

Conversation

@car-roll
Copy link
Copy Markdown
Contributor

@car-roll car-roll commented Oct 28, 2024

jenkinsci/workflow-durable-task-step-plugin#372 added fail fast for cloud node removal. However, this feature is not supported by the ec2 plugin.

This PR refactors EC2AbstractSlave and EC2Computer in order to take advantage of jenkinsci/workflow-durable-task-step-plugin#372

Testing done

  • Perform manual testing by killing spot instance manually and watching startup

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@car-roll car-roll changed the title Use AbstractCloudSlave and AbstractCloudComputer to support FailFast support fail fast for node removal Oct 28, 2024
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing in CI.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this one is mystifying me. I'm still trying to figure out why

if (!stopOnTerminate) {
terminate();
try {
terminate();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ignore WS)

Comment thread src/main/java/hudson/plugins/ec2/EC2OndemandSlave.java Outdated
Comment thread src/test/java/hudson/plugins/ec2/EC2AbstractSlaveTest.java Outdated
Comment thread src/test/java/hudson/plugins/ec2/EC2AbstractSlaveTest.java Outdated
Comment thread src/test/java/hudson/plugins/ec2/EC2RetentionStrategyTest.java Outdated
Comment thread src/test/java/hudson/plugins/ec2/EC2RetentionStrategyTest.java Outdated
@jglick
Copy link
Copy Markdown
Member

jglick commented Oct 29, 2024

Looks right, but has it been tested?

@car-roll
Copy link
Copy Markdown
Contributor Author

Cannot read field "terminateScheduled" because "node" is null
Comparing the test logs from master and this PR, everything looks similar up until the very end. Both versions show the ec2 instance idle timeout and terminated/removed, but then this version has this error that the node is null. I guess something funky is going on with this version's terminate. Terminated too soon maybe?

Copy link
Copy Markdown
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #998 (comment) I guess that

EC2AbstractSlave node = c.getNode();
assertTrue(node.terminateScheduled.await(10, TimeUnit.SECONDS));
was not right to begin with (Computer.getNode may return null if the agent definition has been removed from the system before the executor is released and the computer destroyed) but happened to pass before due to different timing conditions. It is not exactly clear to me what the test is supposed to be asserting here. I guess you could just skip the last assertion in case node == null since this would be normal if the build is finishing.

Comment thread src/main/java/hudson/plugins/ec2/EC2OndemandSlave.java Outdated
Comment thread src/main/java/hudson/plugins/ec2/EC2SpotSlave.java Outdated
Comment thread src/main/java/hudson/plugins/ec2/EC2SpotSlave.java Outdated
Comment thread src/main/java/hudson/plugins/ec2/EC2SpotSlave.java Outdated
@res0nance res0nance added the enhancement Feature additions or enhancements label Nov 4, 2024
@car-roll
Copy link
Copy Markdown
Contributor Author

car-roll commented Dec 2, 2024

closing in favor of #1015 due to the branch conflicts

@car-roll car-roll closed this Dec 2, 2024
@jglick
Copy link
Copy Markdown
Member

jglick commented Dec 3, 2024

For the future, please retain the same PR and just resolve conflicts in it. In the worst case, if running a merge tool became completely unwieldy and it was really easiest to start from scratch, just reset your branch to origin/master, reapply changes, and force push. At least the context of previous reviews is retained.

@car-roll
Copy link
Copy Markdown
Contributor Author

car-roll commented Dec 4, 2024

@jglick yeah I was having a terrible time with the merge conflicts but I didn't think about force pushing. Will do that next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Feature additions or enhancements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants