Enforce VM size minimum during node selection#875
Open
simple-agent-manager[bot] wants to merge 10 commits intomainfrom
Open
Enforce VM size minimum during node selection#875simple-agent-manager[bot] wants to merge 10 commits intomainfrom
simple-agent-manager[bot] wants to merge 10 commits intomainfrom
Conversation
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Validation
pnpm lintpnpm typecheckpnpm testAdditional validation:
pnpm --filter @simple-agent-manager/shared test -- vm-sizespnpm --filter @simple-agent-manager/api test -- durable-objects/task-runner-node-selection services/node-selectorpnpm --filter @simple-agent-manager/shared typecheckpnpm --filter @simple-agent-manager/api typecheckgit diff --checkStaging Verification (REQUIRED for all code changes — merge-blocking)
Deploy Stagingworkflow triggered manually and passed for this branchapp.sammy.partyand passedStaging Verification Evidence
2f89ce1a; later commits148fdd36,af7412f1, and95097497only refreshed PR/task evidence and added a backlog deferral for audit persistence.UI Compliance Checklist (Required for UI changes)
N/A: no runtime UI changes.
packages/ui/tests/ButtonGroup.test.tsxwas updated only to tolerate equivalent CSS zero-value serialization (0/0px).End-to-End Verification (Required for multi-component changes)
.claude/rules/10-e2e-verification.md)Data Flow Trace
apps/api/src/routes/tasks/submit.tsresolves explicit task value, then agent profile value, thenproject.defaultVmSize, then platform default.apps/api/src/services/task-runner-do.ts:startTaskRunnerDO()forwardsTaskRunConfig.vmSize.apps/api/src/durable-objects/task-runner/node-steps.ts:handleNodeSelection()readsnodes.vm_sizeand callscanSatisfyVmSize(node.vm_size, state.config.vmSize).apps/api/src/durable-objects/task-runner/node-steps.ts:tryClaimWarmNode()filters warm candidates withcanSatisfyVmSize.apps/api/src/durable-objects/task-runner/node-steps.ts:findNodeWithCapacity()filters running candidates withcanSatisfyVmSize.apps/api/src/services/node-selector.ts:selectNodeForTaskRun()filters warm and existing candidates withcanSatisfyVmSize.Untested Gaps
Full live VM provisioning was not run to avoid staging node quota/cost. Automated coverage exercises helper semantics, standalone selector warm/existing paths, and TaskRunner preferred/warm/existing paths with mixed VM sizes.
Post-Mortem (Required for bug fix PRs)
What broke
Node reuse treated requested VM size as a soft preference. A larger request could reuse a smaller node when no exact-size node was selected first.
Root cause
Standalone and TaskRunner node selection sorted by exact VM-size match but did not reject undersized candidates before reuse.
Class of bug
Selection logic treated a hard compatibility constraint as a ranking preference.
Why it wasn't caught
Existing tests covered helper-like behavior and source structure, but did not execute mixed-size node selection paths with incompatible candidates.
Process fix included in this PR
.claude/rules/10-e2e-verification.mdnow requires behavioral tests for compatibility constraints in selection logic, including incompatible candidates that would otherwise rank well.Post-mortem file
docs/notes/2026-05-01-vm-size-minimum-selection-postmortem.mdSpecialist Review Evidence (Required for agent-authored PRs)
needs-human-reviewlabel added and merge deferred to human — resolved after successful retryVM_SIZE_RANKfinding fixed by deriving ordering fromDEFAULT_VM_SIZE_VCPUS; re-check passed.tasks/backlog/2026-05-01-persist-task-requested-vm-size.mdand linking the explicit deferral from the active task. Final re-check passed with no blocking findings.Exceptions (If any)
Agent Preflight (Required)
Classification
External References
N/A: no external API behavior changed. Investigation used repo code and staging D1 state.
Codebase Impact Analysis
Affected paths:
packages/shared/src/constants/vm-sizes.tsapps/api/src/services/node-selector.tsapps/api/src/durable-objects/task-runner/node-steps.tsDocumentation & Specs
Updated:
tasks/active/2026-05-01-vm-size-minimum-selection.mddocs/notes/2026-05-01-vm-size-minimum-selection-postmortem.md.claude/rules/10-e2e-verification.mdConstitution & Risk Check
Checked Principle XI. Initial review found a duplicate hardcoded VM-size rank table; fixed by deriving ordering from the existing
DEFAULT_VM_SIZE_VCPUSsource of truth.