Skip to content

feat: pool worker threads in full indexing (avoid 1 Worker per file) #139

@kapral18

Description

@kapral18

Context

Incremental indexing already uses a reusable worker pool (src/commands/incremental_index_command.ts) to avoid per-file worker startup/teardown overhead.

Full indexing (src/commands/full_index_producer.ts) still spawns one Worker per file, which can be a major perf and stability hit on large repos.

Historical context / likely rationale

I searched the repo's merged PR history for explicit rationale around producer worker lifecycle (worker pools vs per-file workers) and did not find a PR/issue that directly explains this choice.

The closest related production-history signals are about consumer-side overload when Elasticsearch is slow/timeouting:

Those fixes are about not overwhelming memory when ES is slow. They don’t directly require per-file producer workers; producer concurrency is already bounded by p-queue.

Plausible reasons for per-file workers (no historical ticket found):

  • Simplicity: spawn a worker, parse one file, terminate.
  • Defensive memory reset: if tree-sitter/native parsing accumulates memory over time, terminating per file forces a reset of native allocations.

Why this matters

  • Worker startup/teardown overhead dominates when indexing large codebases.
  • High worker churn increases memory pressure and can trigger OS limits / slowdowns.

Where in code

  • src/commands/full_index_producer.ts: creates a new Worker inside the per-file loop.

Suggested fix

Pool worker threads for full indexing while keeping safety under load:

  • Create N workers upfront (N = min(CPU_CORES, configured pool size, file count)).
  • Maintain an idle-worker queue and assign jobs.
  • Ensure event listeners are cleaned up per job.
  • Terminate workers only once after queue drain.

Optional safety guardrail (to preserve the likely “memory reset” benefit of per-file workers):

  • Add a worker recycle policy (terminate/recreate a worker after N files or after a memory threshold).

Config knob:

Test plan

  • Unit test asserting Worker is instantiated at most poolSize times during full indexing for many files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions