Skip to content

Memoize Reach.Project.Query.function_index/1#22

Open
jeroen11dijk wants to merge 2 commits into
elixir-vibe:masterfrom
jeroen11dijk:fix/function-index-cache
Open

Memoize Reach.Project.Query.function_index/1#22
jeroen11dijk wants to merge 2 commits into
elixir-vibe:masterfrom
jeroen11dijk:fix/function-index-cache

Conversation

@jeroen11dijk

@jeroen11dijk jeroen11dijk commented Jun 8, 2026

Copy link
Copy Markdown

Symptom

On a 4,700-line Elixir diff against a 146kLoC / 1,099-module project, mix reach.check --changed --base main doesn't complete in 20+ minutes. With this patch it completes
normally.

Root cause

Reach.Project.Query.function_index/1 rebuilds the full index on every call — four
Enum.group_by passes over project.nodes plus build_node_to_function_index/1. It's
invoked from multiple hot paths (find_function_at_location/3, find_function/2,
resolve_by_function_name/2, find_named_module/2). reset_cache/0 already existed as a
stub, and Reach.CLI.Project.load/1 already calls it on every project load — the caching
layer was clearly intended, just never wired up.

What this PR does

  • Adds cache_key: reference() | nil to %Reach.Project{}, set via make_ref/0 in both
    terminal constructors (source_project/2, merge_project/2).
  • function_index/1 memoizes in the process dictionary keyed on project.cache_key.
    Distinct projects carry distinct refs, so the existing isolation test ("function indexes do
    not leak between projects in the same process") still holds.
  • reset_cache/0 deletes the cached entry.
  • Adds positive memoization + reset tests alongside the existing isolation test.

Design notes (happy to revisit)

  • Struct field vs explicit threading. Added a field rather than threading the index
    through callers because function_index/1 has many call sites across Query. The new field
    defaults to nil so existing pattern matches and constructions are unaffected, but it's
    still a minor-version change technically.
  • Process dictionary vs ETS. Per-CLI-invocation scope matches process scope, and
    Reach.CLI.Project.load/1 already invalidates between loads. ETS would allow cross-process
    sharing but introduces lifecycle management that doesn't seem warranted yet — happy to
    switch if you'd prefer.
  • Memory. The cached index is ~tens of MB for a 1k-module project. The CLI's existing
    Query.reset_cache() after each Project.load bounds it. Worth a moduledoc note if you
    want — happy to add.

Not in this PR (but worth a follow-up)

The deeper algorithmic issue is in Reach.Check.Changed.changed_functions/3 — it expands
every changed range to individual lines and calls find_function_at_location/3 per line:

ranges
|> Enum.flat_map(fn {first, last} -> first..last end)
|> Enum.map(&Query.find_function_at_location(project, file, &1))

jeroen11dijk and others added 2 commits June 8, 2026 13:54
`function_index/1` is called from hot paths — notably
`find_function_at_location/3`, which `Reach.Check.Changed.changed_functions/3`
invokes once per changed line. With no caching, every call rebuilt the entire
function index (four maps + a node-to-function index over `project.nodes`),
producing O(diff_lines × project_nodes) work on `mix reach.check --changed`.

A 4,700-line diff against a 146kLoC / 1,099-module project failed to
complete in 20+ minutes pre-patch and completes in normal time after.

The previous `reset_cache/0` stub already pointed at the intent. This wires
it up:

  - Add `cache_key: reference() | nil` to `Reach.Project`, set via
    `make_ref/0` at both terminal constructors (`source_project/2`,
    `merge_project/2`).
  - `function_index/1` memoizes the built index in the process dictionary,
    keyed on `project.cache_key`. A `nil` cache_key (e.g. a struct built
    by external code) bypasses the cache and rebuilds on every call, so
    callers can't accidentally serve stale data.
  - `reset_cache/0` deletes the entry.

The existing isolation test ("function indexes do not leak between
projects in the same process") still passes because distinct projects
carry distinct refs. Adds positive memoization, reset, and bypass tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ExUnit runs each test in its own process, so the per-test setup that
reset the process-dict cache was a no-op.

The nil cache_key clause was a safety net for projects constructed
outside Reach.Project's own constructors — both constructors now
always set make_ref/0, so the safety net just adds API surface and
test scaffolding without protecting against a realistic case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jeroen11dijk

Copy link
Copy Markdown
Author

Furthermore mix reach.check --changed --base main gave low risk for this immense PR so Im unsure about its usefulness. Especially since something like it is nice for existing code bases with many violations already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant