[ee] perf: add agent-batch mode with batch pull and batch result commit#8758
Draft
rubenfiszel wants to merge 5 commits intomainfrom
Draft
[ee] perf: add agent-batch mode with batch pull and batch result commit#8758rubenfiszel wants to merge 5 commits intomainfrom
rubenfiszel wants to merge 5 commits intomainfrom
Conversation
Deploying windmill with
|
| Latest commit: |
c41565b
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://596d810b.windmill.pages.dev |
| Branch Preview URL: | https://worker-batch-pull-write.windmill.pages.dev |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
97bad10 to
1b6e255
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
eb7a903 to
623e332
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
623e332 to
d1b8d54
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
MODE=agent-batch— a new worker mode that batches both job pulling and result writing for higher throughput on dedicated workers. All batch behavior is strictly gated onMode::AgentBatch— zero behavioral change from main in any other mode (standalone, worker, agent, server).Benchmark Results
Dedicated Worker (deno identity script, 1 worker, 500 jobs)
SQL-level (raw DB operations, no job execution)
Why +74% end-to-end (not 90x)
The 90x SQL speedup measures only the pull query. End-to-end, the bottleneck was actually result writes: each
commit_completed_jobopens its own transaction (BEGIN + INSERT intov2_job_completed+ DELETE fromv2_job_queue+ COMMIT = 4 DB round-trips per job). Batch result commit wraps N jobs in a single transaction (1 BEGIN + N inserts + N deletes + 1 COMMIT), eliminating 3×(N-1) round-trips.Architecture
Batch Pull
Problem: Each worker pull does
FOR UPDATE SKIP LOCKED LIMIT 1— one DB round-trip per job.Solution: New CTE query with
LIMIT $2andWHERE id IN (SELECT id FROM peek)instead ofWHERE id = (SELECT id FROM peek). Pulls up to N jobs atomically, stores in a localVecDeque, serves one at a time.SQL (
make_batch_pull_queryinwindmill-common/src/worker.rs):Worker-side (
worker.rs): Two buffers —batch_pull_buffer: VecDeque<PulledJob>for SQL path,agent_batch_buffer: VecDeque<JobAndPerms>for HTTP path. Both only populated whenbatch_pull_size > 0(which is only true inMode::AgentBatch).Server-side (EE
ee.rs): NewPOST /api/agent_workers/batch_pullendpoint acceptsbatch_size: i32, callsbatch_pull()+pulled_job_to_job_and_perms()for each job, returnsVec<JobAndPerms>.Batch Result Commit
Problem: Each completed job goes through
process_jc→add_completed_job→commit_completed_job, which opens a transaction, runs INSERT + DELETE, and commits. With dedicated workers doing ~1ms per job, the 4 DB round-trips (~3.5ms total) are the bottleneck.Solution: In the background result processor, accumulate "simple" completed jobs and commit them in a single transaction.
What qualifies as "simple" (
is_batchable()inresult_processor.rs):success == true!job.is_flow_step())This covers the dedicated worker use case (simple script executions). Complex jobs (flows, failures, scheduled, concurrent) go through the unchanged
process_jcpath.Accumulation logic (
start_background_processorinresult_processor.rs):JobCompletedfrom the channelbatch_mode && is_batchable(jc)→ push tobatch_result_bufferbounded_rx.try_recv()(up to 50)process_jcimmediatelybatch_commit_completed_jobs()with all accumulated jobsprocess_jcBatch SQL (
batch_commit_completed_jobsinjobs.rs):One transaction for N jobs instead of N transactions.
Mode Gating
All batch code paths are gated on
Mode::AgentBatch:worker.rs:2078:batch_pull_sizeis 0 unlessis_agent_batchresult_processor.rs:324:batch_modeis false unlessMode::AgentBatchbatch_pull_size == 0: allif batch_pull_size > 0checks short-circuit, falling through to the original single-pull codebatch_mode == false: theif batch_mode && is_batchablecheck short-circuits, falling through to the originalprocess_jccodeConfiguration
Files Changed
OSS (7 files)
src/main.rsMode::Agent→matches!(mode, Mode::Agent | Mode::AgentBatch)at 4 siteswindmill-common/src/utils.rsMode::AgentBatchvariant,"agent-batch"parsing, Display implwindmill-common/src/worker.rsmake_batch_pull_query(),format_batch_pull_query(),BATCH_PULL_SIZElazy_static,WORKER_BATCH_PULL_QUERIESglobal, updatestore_pull_query()to populate batch querieswindmill-queue/src/jobs.rsbatch_pull(),batch_commit_completed_jobs()windmill-worker/src/agent_workers.rsbatch_pull_jobs()HTTP client functionwindmill-worker/src/worker.rsis_agent_batchflag,batch_pull_size,batch_pull_buffer,agent_batch_buffer, batch pull logic in bothConnection::SqlandConnection::Httpbrancheswindmill-worker/src/result_processor.rsis_batchable()helper,batch_result_buffer, batch accumulation + drain + flush logic, fallback on failureEE (1 file)
windmill-api-agent-workers/src/ee.rsbatch_pull_handlerendpoint, route registrationKnown Limitations
AND id NOT IN (SELECT id FROM v2_job WHERE same_worker = true)since same_worker jobs need co-location guaranteesTest Plan
cargo checkpasses (CE build)cargo check --features deno_core,enterprise,private,license,agent_worker_serverpasses (EE build)MODE != agent-batchGenerated with Claude Code