Memoize Reach.Project.Query.function_index/1#22
Open
jeroen11dijk wants to merge 2 commits into
Open
Conversation
`function_index/1` is called from hot paths — notably
`find_function_at_location/3`, which `Reach.Check.Changed.changed_functions/3`
invokes once per changed line. With no caching, every call rebuilt the entire
function index (four maps + a node-to-function index over `project.nodes`),
producing O(diff_lines × project_nodes) work on `mix reach.check --changed`.
A 4,700-line diff against a 146kLoC / 1,099-module project failed to
complete in 20+ minutes pre-patch and completes in normal time after.
The previous `reset_cache/0` stub already pointed at the intent. This wires
it up:
- Add `cache_key: reference() | nil` to `Reach.Project`, set via
`make_ref/0` at both terminal constructors (`source_project/2`,
`merge_project/2`).
- `function_index/1` memoizes the built index in the process dictionary,
keyed on `project.cache_key`. A `nil` cache_key (e.g. a struct built
by external code) bypasses the cache and rebuilds on every call, so
callers can't accidentally serve stale data.
- `reset_cache/0` deletes the entry.
The existing isolation test ("function indexes do not leak between
projects in the same process") still passes because distinct projects
carry distinct refs. Adds positive memoization, reset, and bypass tests.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ExUnit runs each test in its own process, so the per-test setup that reset the process-dict cache was a no-op. The nil cache_key clause was a safety net for projects constructed outside Reach.Project's own constructors — both constructors now always set make_ref/0, so the safety net just adds API surface and test scaffolding without protecting against a realistic case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Author
|
Furthermore |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
On a 4,700-line Elixir diff against a 146kLoC / 1,099-module project,
mix reach.check --changed --base maindoesn't complete in 20+ minutes. With this patch it completesnormally.
Root cause
Reach.Project.Query.function_index/1rebuilds the full index on every call — fourEnum.group_bypasses overproject.nodesplusbuild_node_to_function_index/1. It'sinvoked from multiple hot paths (
find_function_at_location/3,find_function/2,resolve_by_function_name/2,find_named_module/2).reset_cache/0already existed as astub, and
Reach.CLI.Project.load/1already calls it on every project load — the cachinglayer was clearly intended, just never wired up.
What this PR does
cache_key: reference() | nilto%Reach.Project{}, set viamake_ref/0in bothterminal constructors (
source_project/2,merge_project/2).function_index/1memoizes in the process dictionary keyed onproject.cache_key.Distinct projects carry distinct refs, so the existing isolation test ("function indexes do
not leak between projects in the same process") still holds.
reset_cache/0deletes the cached entry.Design notes (happy to revisit)
through callers because
function_index/1has many call sites acrossQuery. The new fielddefaults to
nilso existing pattern matches and constructions are unaffected, but it'sstill a minor-version change technically.
Reach.CLI.Project.load/1already invalidates between loads. ETS would allow cross-processsharing but introduces lifecycle management that doesn't seem warranted yet — happy to
switch if you'd prefer.
Query.reset_cache()after eachProject.loadbounds it. Worth a moduledoc note if youwant — happy to add.
Not in this PR (but worth a follow-up)
The deeper algorithmic issue is in
Reach.Check.Changed.changed_functions/3— it expandsevery changed range to individual lines and calls
find_function_at_location/3per line: