Add local engine profiling workflow by tony · Pull Request #44 · tony/agentgrep

tony · 2026-05-31T23:53:29Z

Summary

Adds dormant, privacy-safe engine profiling spans around discovery, planning, collection, source-level decisions, and subprocess calls.
Adds scripts/profile_engine.py component runs for prompt search, conversation search, grep-shaped searches, and prompt-source enumeration, with Rich terminal output by default and explicit JSON/NDJSON machine formats.
Extends benchmark artifacts with schema markers, sanitized profile payload capture, Rich nested-span rendering, a profile-engine command group, benchmark.py analyze for saved artifacts, rejection of empty --commands selectors, and load-time validation that profile-engine benchmarks request JSON output.
Adds a Cursor IDE profile-engine benchmark set (profile-engine-cursor-ide) exercising the SQLite state.vscdb read path.
Fixes broad searches dropping SQLite-backed stores during binary root prefiltering, so Cursor IDE matches reliably reach parser collection.
Updates AGENTS.md, repo agent skills, and developer docs for the local profiling and benchmark-analysis workflow, and records the deliverables in CHANGES.

Refs #42. CI artifact upload remains separate in #43.

Test Plan

rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run ty check; uv run py.test --reruns 0 -vvv; just build-docs;

tony · 2026-06-04T23:32:40Z

Code review

Found 1 issue:

_safe_profile_attribute_dict filters span attributes by key substring ("path" in denied_key_parts), which silently drops agentgrep_path_kind from benchmark.py analyze reports and Rich span tables. The value is a safe classifier literal (history_file / session_file / sqlite_db / store_file), not a filesystem path, and direct profile_engine.py artifacts retain it — so benchmark analysis loses the store-type dimension that the source-level spans were added to provide. Consider exact-key denial or an explicit allowance for agentgrep_path_kind.

agentgrep/scripts/benchmark.py

Lines 645 to 651 in 1afd722

    
           safe: dict[str, object] = {} 
        
           denied_key_parts = ("argv", "command", "path", "query") 
        
           for key, value in sorted(attributes.items()): 
        
               if not isinstance(key, str): 
        
                   continue 
        
               if any(part in key.casefold() for part in denied_key_parts): 
        
                   continue

Emit site for the dropped attribute:

agentgrep/src/agentgrep/__init__.py

Lines 3687 to 3689 in 1afd722

    
           "agentgrep_adapter_id": source.adapter_id, 
        
           "agentgrep_path_kind": source.path_kind, 
        
           "agentgrep_source_kind": source.source_kind,

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

why: Fast-path planning work needs timings that separate engine discovery, planning, collection, and rendering costs. A dormant, sanitized profiler lets benchmarks capture those boundaries without exposing prompt text, command argv, or local paths. what: - Add typed engine profiling primitives with context-local spans and redacted subprocess samples. - Add a standalone engine profiling script plus benchmark entries whose names and descriptions disclose their limits. - Add tests for profiling phase output, subprocess redaction, default-path import avoidance, and benchmark coverage.

why: streamline-02 is scoped to observability, so bottleneck work needs a stable, privacy-safe way to profile command-shaped engine paths before changing planner or executor behavior. what: - Add profile_engine component routing for prompt search, conversation search, grep-shaped prompt and conversation search, and prompt-source find runs. - Add explicit capped benchmark selectors for the profile-engine components. - Cover component expansion, query redaction, result caps, and prompt/conversation scope selection with typed pytest cases.

why: streamline-02 should leave a clear, repeatable observability workflow for agents and maintainers without adding CI artifact upload or performance changes to this branch. what: - Document profile_engine components and the required pre-commit gate in AGENTS.md and developer docs. - Add repo-local $profile guidance and extend $benchmark with component selectors. - Test that repo-local agent skills keep the component argument and core profiler components visible.

why: The profile_engine all component accepts content search terms for search-like runs, but find-prompts enumerates prompt sources rather than prompt contents. Passing those terms into find-prompts made batch profiles undercount source enumeration versus the standalone profiler benchmark. what: - Make ProfileRunSpec explicitly opt legacy find into using terms as a source metadata pattern. - Report effective term_count per component so batch find-prompts shows no find pattern was applied. - Add regression coverage for all tmux find-prompts parity and legacy find term filtering.

why: Engine profiles need both machine-readable child-run streams and a readable top-spans view so bottleneck evidence can be inspected without exposing local paths or query text. what: - Add json, ndjson, and rich renderers to scripts/profile_engine.py. - Keep the existing JSON document shape as the default while flattening batch runs for NDJSON. - Cover renderer behavior and entry-point parsing with profile-engine script tests.

why: Benchmark JSON and NDJSON output need to carry profiler details without leaking local command paths, and dry-run rows need an explicit marker for machine consumers. what: - Add dry_run, profile_payload, and profile_capture_error fields to benchmark measurements. - Sanitize rendered benchmark command strings before serialization. - Capture profile_engine JSON for profiler benchmark rows after timing.

why: The profiler and benchmark helpers now emit richer artifact shapes, so local workflows and agent skills need to explain which fields are shareable and how to inspect slow spans. what: - Document profile_engine output formats and top-span rendering. - Document benchmark dry-run and profile payload artifact fields. - Update agent-facing benchmark and profile skill guidance.

why: Profile and benchmark artifacts are intended to be copied into issues and consumed by tools. Small explicit schema markers make those machine artifacts durable without introducing a migration framework. what: - Add schema_version and artifact_kind markers to profile run and batch payloads. - Add schema_version and artifact_kind markers to benchmark JSON roots and measurement rows. - Cover JSON and NDJSON artifact metadata in profiler and benchmark tests.

why: Profile-engine benchmark rows already preserve child profile payloads, but the rich renderer only showed timing tables. Rendering the slowest nested spans keeps local terminal runs useful without changing machine formats. what: - Add --top-spans to benchmark run and compare commands. - Render sanitized nested profile_payload spans in rich output. - Cover rich span rendering, suppression, and CLI parsing in tests.

why: Conversation searches could show collection as the hotspot without enough detail to identify which agents, stores, or source adapters drove the cost. Source-level profile samples provide that evidence while staying inactive outside profiling runs. what: - Record discovery groups by agent, store, adapter, path kind, and source kind. - Record planning, collection, and find-filter samples without paths or prompt text. - Cover the source-level samples and preserve the no-import fast path when profiling is inactive.

why: The local profiling workflow now exposes durable schema markers, rich top-span summaries, and source-level spans, so repo docs and agent skills need to show how to collect and read those artifacts. what: - Document benchmark artifact metadata and --top-spans rich output. - Document source-level profile samples and privacy constraints. - Update agent-facing benchmark and profile skill checks.

why: The benchmark skill and profiling workflow treat profile-engine as the natural selector for all engine-only benchmark probes, but the harness only accepted exact benchmark keys. Resolving that group inside the benchmark script keeps the CLI ergonomic while still executing explicit benchmark rows. what: - Add a typed profile-engine benchmark group and expand it in --commands. - Show available command groups in list-commands output and keep bad selectors to a plain CLI error. - Document the selector in repo docs and agent benchmark guidance.

why: A blank --commands value or separator-only selector produced a successful benchmark artifact with zero rows. That can make automation look green while measuring nothing. what: - Raise a BadParameter when --commands expands to an empty benchmark list. - Cover blank and separator-only selectors in the benchmark validation tests.

why: Profile-engine benchmark artifacts already preserve timing samples and nested spans, but repeated bottleneck reports still required ad hoc jq queries. A dedicated analyzer turns saved benchmark JSON/NDJSON into stable human and machine reports without rerunning searches. what: - Add benchmark artifact loading and typed analysis summaries for command timings, slow spans, grouped spans, and warnings. - Add rich/json/ndjson analysis reporters and the `benchmark.py analyze` subcommand. - Cover JSON/NDJSON artifact loading, sanitization, reporter output, span limits, and CLI output.

why: The benchmark analyzer replaces ad hoc jq summaries for saved profile-engine artifacts. Documenting the workflow keeps agent and human operators on the same repeatable path. what: - Document `benchmark.py analyze` in developer benchmark docs and AGENTS profiling guidance. - Update the repo benchmark skill to use analyzer reports for bottleneck summaries. - Assert the benchmark skill advertises the analyzer artifact shape.

why: Running a profiler component without an explicit machine-output flag should be readable at the terminal. JSON remains available for saved artifacts, but it should be opt-in rather than surprising interactive output. what: - Default profile_engine.py output to the Rich reporter. - Add --json and --ndjson shortcuts alongside the existing --format selector. - Cover every profiler component and legacy alias at the script entry point.

why: The profiler now favors terminal-readable output unless a machine format is requested. Documentation and agent skills need to show --json for saved artifacts so examples do not write Rich tables into JSON files. what: - Document Rich as the default profile_engine.py output. - Update profiler artifact examples to pass --json explicitly. - Teach the profile skill to use --json and --ndjson for machine-readable output.

why: The profiler now defaults to Rich output for humans, but benchmark profile-engine rows still promise sanitized JSON profile payloads. Without an explicit machine format the timing rows succeed while nested profile capture reports invalid JSON. what: - Add explicit --format json to every committed profile-engine benchmark command. - Validate before run and compare sweeps that profile-engine commands request JSON output (--format json, --format=json, or --json), and hint at --format json when payload capture sees invalid JSON. - Cover the validator, the informational-command tolerance, and the committed-benchmark format assertions in tests.

why: Cursor IDE history lives in SQLite state.vscdb stores, but the committed profile-engine benchmark group only covered all-agent workloads. A Cursor-specific group makes DB profiling repeatable through scripts/benchmark.py and keeps branch-tip baselines comparable from streamline-02. what: - Add Cursor IDE search, grep, and find profile-engine benchmark entries plus a dedicated command group. - Keep SQLite sources out of binary root prefiltering so Cursor state.vscdb stores reach parser collection. - Document the Cursor IDE profiling commands and cover the benchmark group and synthetic SQLite profile path in tests.

why: Record the streamline-02 deliverables for the unreleased version so readers know broad searches no longer skip Cursor IDE's SQLite history store and that a local profiling workflow now exists for bottleneck evidence. what: - Add a Fixes deliverable describing how the search pre-scan could silently drop SQLite-backed stores before the database reader ran. - Add Development deliverables for scripts/profile_engine.py, the profile-engine benchmark group with profile capture and top-span rendering, and the benchmark.py analyze subcommand for saved artifacts.

tony temporarily deployed to docs May 31, 2026 23:53 — with GitHub Actions Inactive

tony temporarily deployed to docs June 1, 2026 00:22 — with GitHub Actions Inactive

tony temporarily deployed to docs June 1, 2026 00:58 — with GitHub Actions Inactive

tony temporarily deployed to docs June 1, 2026 23:42 — with GitHub Actions Inactive

tony temporarily deployed to docs June 4, 2026 23:00 — with GitHub Actions Inactive

tony force-pushed the streamline-02 branch from 88b3298 to 455bb06 Compare June 4, 2026 23:02

tony temporarily deployed to docs June 4, 2026 23:02 — with GitHub Actions Inactive

tony temporarily deployed to docs June 4, 2026 23:14 — with GitHub Actions Inactive

tony temporarily deployed to docs June 4, 2026 23:59 — with GitHub Actions Inactive

tony temporarily deployed to docs June 5, 2026 00:05 — with GitHub Actions Inactive

tony added 4 commits June 4, 2026 19:13

tony force-pushed the streamline-02 branch from 32dd6c9 to fa95ed4 Compare June 5, 2026 00:17

tony temporarily deployed to docs June 5, 2026 00:17 — with GitHub Actions Inactive

tony added 13 commits June 4, 2026 19:49

tony added 3 commits June 4, 2026 19:49

tony force-pushed the streamline-02 branch from fa95ed4 to d40447f Compare June 5, 2026 00:51

tony temporarily deployed to docs June 5, 2026 00:51 — with GitHub Actions Inactive

tony merged commit 562fe0d into master Jun 5, 2026
3 checks passed

tony deleted the streamline-02 branch June 5, 2026 01:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local engine profiling workflow#44

Add local engine profiling workflow#44
tony merged 20 commits into
masterfrom
streamline-02

tony commented May 31, 2026 •

edited

Loading

Uh oh!

tony commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tony commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

tony commented Jun 4, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tony commented May 31, 2026 •

edited

Loading