Add local engine profiling workflow#44
Merged
Merged
Conversation
Owner
Author
Code reviewFound 1 issue:
agentgrep/scripts/benchmark.py Lines 645 to 651 in 1afd722 Emit site for the dropped attribute: agentgrep/src/agentgrep/__init__.py Lines 3687 to 3689 in 1afd722 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
why: Fast-path planning work needs timings that separate engine discovery, planning, collection, and rendering costs. A dormant, sanitized profiler lets benchmarks capture those boundaries without exposing prompt text, command argv, or local paths. what: - Add typed engine profiling primitives with context-local spans and redacted subprocess samples. - Add a standalone engine profiling script plus benchmark entries whose names and descriptions disclose their limits. - Add tests for profiling phase output, subprocess redaction, default-path import avoidance, and benchmark coverage.
why: streamline-02 is scoped to observability, so bottleneck work needs a stable, privacy-safe way to profile command-shaped engine paths before changing planner or executor behavior. what: - Add profile_engine component routing for prompt search, conversation search, grep-shaped prompt and conversation search, and prompt-source find runs. - Add explicit capped benchmark selectors for the profile-engine components. - Cover component expansion, query redaction, result caps, and prompt/conversation scope selection with typed pytest cases.
why: streamline-02 should leave a clear, repeatable observability workflow for agents and maintainers without adding CI artifact upload or performance changes to this branch. what: - Document profile_engine components and the required pre-commit gate in AGENTS.md and developer docs. - Add repo-local $profile guidance and extend $benchmark with component selectors. - Test that repo-local agent skills keep the component argument and core profiler components visible.
why: The profile_engine all component accepts content search terms for search-like runs, but find-prompts enumerates prompt sources rather than prompt contents. Passing those terms into find-prompts made batch profiles undercount source enumeration versus the standalone profiler benchmark. what: - Make ProfileRunSpec explicitly opt legacy find into using terms as a source metadata pattern. - Report effective term_count per component so batch find-prompts shows no find pattern was applied. - Add regression coverage for all tmux find-prompts parity and legacy find term filtering.
why: Engine profiles need both machine-readable child-run streams and a readable top-spans view so bottleneck evidence can be inspected without exposing local paths or query text. what: - Add json, ndjson, and rich renderers to scripts/profile_engine.py. - Keep the existing JSON document shape as the default while flattening batch runs for NDJSON. - Cover renderer behavior and entry-point parsing with profile-engine script tests.
why: Benchmark JSON and NDJSON output need to carry profiler details without leaking local command paths, and dry-run rows need an explicit marker for machine consumers. what: - Add dry_run, profile_payload, and profile_capture_error fields to benchmark measurements. - Sanitize rendered benchmark command strings before serialization. - Capture profile_engine JSON for profiler benchmark rows after timing.
why: The profiler and benchmark helpers now emit richer artifact shapes, so local workflows and agent skills need to explain which fields are shareable and how to inspect slow spans. what: - Document profile_engine output formats and top-span rendering. - Document benchmark dry-run and profile payload artifact fields. - Update agent-facing benchmark and profile skill guidance.
why: Profile and benchmark artifacts are intended to be copied into issues and consumed by tools. Small explicit schema markers make those machine artifacts durable without introducing a migration framework. what: - Add schema_version and artifact_kind markers to profile run and batch payloads. - Add schema_version and artifact_kind markers to benchmark JSON roots and measurement rows. - Cover JSON and NDJSON artifact metadata in profiler and benchmark tests.
why: Profile-engine benchmark rows already preserve child profile payloads, but the rich renderer only showed timing tables. Rendering the slowest nested spans keeps local terminal runs useful without changing machine formats. what: - Add --top-spans to benchmark run and compare commands. - Render sanitized nested profile_payload spans in rich output. - Cover rich span rendering, suppression, and CLI parsing in tests.
why: Conversation searches could show collection as the hotspot without enough detail to identify which agents, stores, or source adapters drove the cost. Source-level profile samples provide that evidence while staying inactive outside profiling runs. what: - Record discovery groups by agent, store, adapter, path kind, and source kind. - Record planning, collection, and find-filter samples without paths or prompt text. - Cover the source-level samples and preserve the no-import fast path when profiling is inactive.
why: The local profiling workflow now exposes durable schema markers, rich top-span summaries, and source-level spans, so repo docs and agent skills need to show how to collect and read those artifacts. what: - Document benchmark artifact metadata and --top-spans rich output. - Document source-level profile samples and privacy constraints. - Update agent-facing benchmark and profile skill checks.
why: The benchmark skill and profiling workflow treat profile-engine as the natural selector for all engine-only benchmark probes, but the harness only accepted exact benchmark keys. Resolving that group inside the benchmark script keeps the CLI ergonomic while still executing explicit benchmark rows. what: - Add a typed profile-engine benchmark group and expand it in --commands. - Show available command groups in list-commands output and keep bad selectors to a plain CLI error. - Document the selector in repo docs and agent benchmark guidance.
why: A blank --commands value or separator-only selector produced a successful benchmark artifact with zero rows. That can make automation look green while measuring nothing. what: - Raise a BadParameter when --commands expands to an empty benchmark list. - Cover blank and separator-only selectors in the benchmark validation tests.
why: Profile-engine benchmark artifacts already preserve timing samples and nested spans, but repeated bottleneck reports still required ad hoc jq queries. A dedicated analyzer turns saved benchmark JSON/NDJSON into stable human and machine reports without rerunning searches. what: - Add benchmark artifact loading and typed analysis summaries for command timings, slow spans, grouped spans, and warnings. - Add rich/json/ndjson analysis reporters and the `benchmark.py analyze` subcommand. - Cover JSON/NDJSON artifact loading, sanitization, reporter output, span limits, and CLI output.
why: The benchmark analyzer replaces ad hoc jq summaries for saved profile-engine artifacts. Documenting the workflow keeps agent and human operators on the same repeatable path. what: - Document `benchmark.py analyze` in developer benchmark docs and AGENTS profiling guidance. - Update the repo benchmark skill to use analyzer reports for bottleneck summaries. - Assert the benchmark skill advertises the analyzer artifact shape.
why: Running a profiler component without an explicit machine-output flag should be readable at the terminal. JSON remains available for saved artifacts, but it should be opt-in rather than surprising interactive output. what: - Default profile_engine.py output to the Rich reporter. - Add --json and --ndjson shortcuts alongside the existing --format selector. - Cover every profiler component and legacy alias at the script entry point.
why: The profiler now favors terminal-readable output unless a machine format is requested. Documentation and agent skills need to show --json for saved artifacts so examples do not write Rich tables into JSON files. what: - Document Rich as the default profile_engine.py output. - Update profiler artifact examples to pass --json explicitly. - Teach the profile skill to use --json and --ndjson for machine-readable output.
why: The profiler now defaults to Rich output for humans, but benchmark profile-engine rows still promise sanitized JSON profile payloads. Without an explicit machine format the timing rows succeed while nested profile capture reports invalid JSON. what: - Add explicit --format json to every committed profile-engine benchmark command. - Validate before run and compare sweeps that profile-engine commands request JSON output (--format json, --format=json, or --json), and hint at --format json when payload capture sees invalid JSON. - Cover the validator, the informational-command tolerance, and the committed-benchmark format assertions in tests.
why: Cursor IDE history lives in SQLite state.vscdb stores, but the committed profile-engine benchmark group only covered all-agent workloads. A Cursor-specific group makes DB profiling repeatable through scripts/benchmark.py and keeps branch-tip baselines comparable from streamline-02. what: - Add Cursor IDE search, grep, and find profile-engine benchmark entries plus a dedicated command group. - Keep SQLite sources out of binary root prefiltering so Cursor state.vscdb stores reach parser collection. - Document the Cursor IDE profiling commands and cover the benchmark group and synthetic SQLite profile path in tests.
why: Record the streamline-02 deliverables for the unreleased version so readers know broad searches no longer skip Cursor IDE's SQLite history store and that a local profiling workflow now exists for bottleneck evidence. what: - Add a Fixes deliverable describing how the search pre-scan could silently drop SQLite-backed stores before the database reader ran. - Add Development deliverables for scripts/profile_engine.py, the profile-engine benchmark group with profile capture and top-span rendering, and the benchmark.py analyze subcommand for saved artifacts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scripts/profile_engine.pycomponent runs for prompt search, conversation search, grep-shaped searches, and prompt-source enumeration, with Rich terminal output by default and explicit JSON/NDJSON machine formats.profile-enginecommand group,benchmark.py analyzefor saved artifacts, rejection of empty--commandsselectors, and load-time validation that profile-engine benchmarks request JSON output.profile-engine-cursor-ide) exercising the SQLitestate.vscdbread path.Refs #42. CI artifact upload remains separate in #43.
Test Plan
rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run ty check; uv run py.test --reruns 0 -vvv; just build-docs;