Skip to content

Add local engine profiling workflow#44

Merged
tony merged 20 commits into
masterfrom
streamline-02
Jun 5, 2026
Merged

Add local engine profiling workflow#44
tony merged 20 commits into
masterfrom
streamline-02

Conversation

@tony
Copy link
Copy Markdown
Owner

@tony tony commented May 31, 2026

Summary

  • Adds dormant, privacy-safe engine profiling spans around discovery, planning, collection, source-level decisions, and subprocess calls.
  • Adds scripts/profile_engine.py component runs for prompt search, conversation search, grep-shaped searches, and prompt-source enumeration, with Rich terminal output by default and explicit JSON/NDJSON machine formats.
  • Extends benchmark artifacts with schema markers, sanitized profile payload capture, Rich nested-span rendering, a profile-engine command group, benchmark.py analyze for saved artifacts, rejection of empty --commands selectors, and load-time validation that profile-engine benchmarks request JSON output.
  • Adds a Cursor IDE profile-engine benchmark set (profile-engine-cursor-ide) exercising the SQLite state.vscdb read path.
  • Fixes broad searches dropping SQLite-backed stores during binary root prefiltering, so Cursor IDE matches reliably reach parser collection.
  • Updates AGENTS.md, repo agent skills, and developer docs for the local profiling and benchmark-analysis workflow, and records the deliverables in CHANGES.

Refs #42. CI artifact upload remains separate in #43.

Test Plan

  • rm -rf docs/_build; uv run ruff check . --fix --show-fixes; uv run ruff format .; uv run ty check; uv run py.test --reruns 0 -vvv; just build-docs;

@tony
Copy link
Copy Markdown
Owner Author

tony commented Jun 4, 2026

Code review

Found 1 issue:

  1. _safe_profile_attribute_dict filters span attributes by key substring ("path" in denied_key_parts), which silently drops agentgrep_path_kind from benchmark.py analyze reports and Rich span tables. The value is a safe classifier literal (history_file / session_file / sqlite_db / store_file), not a filesystem path, and direct profile_engine.py artifacts retain it — so benchmark analysis loses the store-type dimension that the source-level spans were added to provide. Consider exact-key denial or an explicit allowance for agentgrep_path_kind.

safe: dict[str, object] = {}
denied_key_parts = ("argv", "command", "path", "query")
for key, value in sorted(attributes.items()):
if not isinstance(key, str):
continue
if any(part in key.casefold() for part in denied_key_parts):
continue

Emit site for the dropped attribute:

"agentgrep_adapter_id": source.adapter_id,
"agentgrep_path_kind": source.path_kind,
"agentgrep_source_kind": source.source_kind,

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

tony added 4 commits June 4, 2026 19:13
why: Fast-path planning work needs timings that separate engine discovery, planning, collection, and rendering costs. A dormant, sanitized profiler lets benchmarks capture those boundaries without exposing prompt text, command argv, or local paths.

what:
- Add typed engine profiling primitives with context-local spans and redacted subprocess samples.
- Add a standalone engine profiling script plus benchmark entries whose names and descriptions disclose their limits.
- Add tests for profiling phase output, subprocess redaction, default-path import avoidance, and benchmark coverage.
why: streamline-02 is scoped to observability, so bottleneck work needs a stable, privacy-safe way to profile command-shaped engine paths before changing planner or executor behavior.
what:
- Add profile_engine component routing for prompt search, conversation search, grep-shaped prompt and conversation search, and prompt-source find runs.
- Add explicit capped benchmark selectors for the profile-engine components.
- Cover component expansion, query redaction, result caps, and prompt/conversation scope selection with typed pytest cases.
why: streamline-02 should leave a clear, repeatable observability workflow for agents and maintainers without adding CI artifact upload or performance changes to this branch.
what:
- Document profile_engine components and the required pre-commit gate in AGENTS.md and developer docs.
- Add repo-local $profile guidance and extend $benchmark with component selectors.
- Test that repo-local agent skills keep the component argument and core profiler components visible.
why: The profile_engine all component accepts content search terms for search-like runs, but find-prompts enumerates prompt sources rather than prompt contents. Passing those terms into find-prompts made batch profiles undercount source enumeration versus the standalone profiler benchmark.
what:
- Make ProfileRunSpec explicitly opt legacy find into using terms as a source metadata pattern.
- Report effective term_count per component so batch find-prompts shows no find pattern was applied.
- Add regression coverage for all tmux find-prompts parity and legacy find term filtering.
tony added 13 commits June 4, 2026 19:49
why: Engine profiles need both machine-readable child-run streams and a readable top-spans view so bottleneck evidence can be inspected without exposing local paths or query text.

what:
- Add json, ndjson, and rich renderers to scripts/profile_engine.py.
- Keep the existing JSON document shape as the default while flattening batch runs for NDJSON.
- Cover renderer behavior and entry-point parsing with profile-engine script tests.
why: Benchmark JSON and NDJSON output need to carry profiler details without leaking local command paths, and dry-run rows need an explicit marker for machine consumers.
what:
- Add dry_run, profile_payload, and profile_capture_error fields to benchmark measurements.
- Sanitize rendered benchmark command strings before serialization.
- Capture profile_engine JSON for profiler benchmark rows after timing.
why: The profiler and benchmark helpers now emit richer artifact shapes, so local workflows and agent skills need to explain which fields are shareable and how to inspect slow spans.
what:
- Document profile_engine output formats and top-span rendering.
- Document benchmark dry-run and profile payload artifact fields.
- Update agent-facing benchmark and profile skill guidance.
why: Profile and benchmark artifacts are intended to be copied into issues and consumed by tools. Small explicit schema markers make those machine artifacts durable without introducing a migration framework.
what:
- Add schema_version and artifact_kind markers to profile run and batch payloads.
- Add schema_version and artifact_kind markers to benchmark JSON roots and measurement rows.
- Cover JSON and NDJSON artifact metadata in profiler and benchmark tests.
why: Profile-engine benchmark rows already preserve child profile payloads, but the rich renderer only showed timing tables. Rendering the slowest nested spans keeps local terminal runs useful without changing machine formats.
what:
- Add --top-spans to benchmark run and compare commands.
- Render sanitized nested profile_payload spans in rich output.
- Cover rich span rendering, suppression, and CLI parsing in tests.
why: Conversation searches could show collection as the hotspot without enough detail to identify which agents, stores, or source adapters drove the cost. Source-level profile samples provide that evidence while staying inactive outside profiling runs.
what:
- Record discovery groups by agent, store, adapter, path kind, and source kind.
- Record planning, collection, and find-filter samples without paths or prompt text.
- Cover the source-level samples and preserve the no-import fast path when profiling is inactive.
why: The local profiling workflow now exposes durable schema markers, rich top-span summaries, and source-level spans, so repo docs and agent skills need to show how to collect and read those artifacts.
what:
- Document benchmark artifact metadata and --top-spans rich output.
- Document source-level profile samples and privacy constraints.
- Update agent-facing benchmark and profile skill checks.
why: The benchmark skill and profiling workflow treat profile-engine as the natural selector for all engine-only benchmark probes, but the harness only accepted exact benchmark keys. Resolving that group inside the benchmark script keeps the CLI ergonomic while still executing explicit benchmark rows.

what:
- Add a typed profile-engine benchmark group and expand it in --commands.
- Show available command groups in list-commands output and keep bad selectors to a plain CLI error.
- Document the selector in repo docs and agent benchmark guidance.
why: A blank --commands value or separator-only selector produced a successful benchmark artifact with zero rows. That can make automation look green while measuring nothing.

what:
- Raise a BadParameter when --commands expands to an empty benchmark list.
- Cover blank and separator-only selectors in the benchmark validation tests.
why: Profile-engine benchmark artifacts already preserve timing samples and nested spans, but repeated bottleneck reports still required ad hoc jq queries. A dedicated analyzer turns saved benchmark JSON/NDJSON into stable human and machine reports without rerunning searches.
what:
- Add benchmark artifact loading and typed analysis summaries for command timings, slow spans, grouped spans, and warnings.
- Add rich/json/ndjson analysis reporters and the `benchmark.py analyze` subcommand.
- Cover JSON/NDJSON artifact loading, sanitization, reporter output, span limits, and CLI output.
why: The benchmark analyzer replaces ad hoc jq summaries for saved profile-engine artifacts. Documenting the workflow keeps agent and human operators on the same repeatable path.
what:
- Document `benchmark.py analyze` in developer benchmark docs and AGENTS profiling guidance.
- Update the repo benchmark skill to use analyzer reports for bottleneck summaries.
- Assert the benchmark skill advertises the analyzer artifact shape.
why: Running a profiler component without an explicit machine-output flag should be readable at the terminal. JSON remains available for saved artifacts, but it should be opt-in rather than surprising interactive output.
what:
- Default profile_engine.py output to the Rich reporter.
- Add --json and --ndjson shortcuts alongside the existing --format selector.
- Cover every profiler component and legacy alias at the script entry point.
why: The profiler now favors terminal-readable output unless a machine format is requested. Documentation and agent skills need to show --json for saved artifacts so examples do not write Rich tables into JSON files.
what:
- Document Rich as the default profile_engine.py output.
- Update profiler artifact examples to pass --json explicitly.
- Teach the profile skill to use --json and --ndjson for machine-readable output.
tony added 3 commits June 4, 2026 19:49
why: The profiler now defaults to Rich output for humans, but benchmark
profile-engine rows still promise sanitized JSON profile payloads.
Without an explicit machine format the timing rows succeed while nested
profile capture reports invalid JSON.

what:
- Add explicit --format json to every committed profile-engine
  benchmark command.
- Validate before run and compare sweeps that profile-engine commands
  request JSON output (--format json, --format=json, or --json), and
  hint at --format json when payload capture sees invalid JSON.
- Cover the validator, the informational-command tolerance, and the
  committed-benchmark format assertions in tests.
why: Cursor IDE history lives in SQLite state.vscdb stores, but the committed
profile-engine benchmark group only covered all-agent workloads. A Cursor-specific
group makes DB profiling repeatable through scripts/benchmark.py and keeps
branch-tip baselines comparable from streamline-02.

what:
- Add Cursor IDE search, grep, and find profile-engine benchmark entries plus a
  dedicated command group.
- Keep SQLite sources out of binary root prefiltering so Cursor state.vscdb
  stores reach parser collection.
- Document the Cursor IDE profiling commands and cover the benchmark group and
  synthetic SQLite profile path in tests.
why: Record the streamline-02 deliverables for the unreleased version
so readers know broad searches no longer skip Cursor IDE's SQLite
history store and that a local profiling workflow now exists for
bottleneck evidence.

what:
- Add a Fixes deliverable describing how the search pre-scan could
  silently drop SQLite-backed stores before the database reader ran.
- Add Development deliverables for scripts/profile_engine.py, the
  profile-engine benchmark group with profile capture and top-span
  rendering, and the benchmark.py analyze subcommand for saved
  artifacts.
@tony tony force-pushed the streamline-02 branch from fa95ed4 to d40447f Compare June 5, 2026 00:51
@tony tony merged commit 562fe0d into master Jun 5, 2026
3 checks passed
@tony tony deleted the streamline-02 branch June 5, 2026 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant