[codex] Redesign Minimap as a lean navigation graph by himattm · Pull Request #1 · himattm/minimap

himattm · 2026-05-28T18:48:58Z

Summary

This PR resets Minimap around the narrower product goal we aligned on: an Android-only navigation memory layer for agents. It replaces the heavier proposal/journal-oriented model with a lean graph of semantic places and deterministic UI edges that agents can grow, replay, and validate over time.

Key changes:

Introduces the v1 .minimap/graph model with semantic places, place variants, deterministic edge files, explicit viewport compatibility for coordinate taps, and one active Android app profile.
Refactors the CLI around the agent-facing commands: init, doctor, whereami, layout, tap, scroll, back, and go.
Uses Android layout observations to recognize known places, add variants for changed screens, record new paths, and avoid silently claiming unknown screens as known places.
Adds fresh session/layout reuse so an agent can navigate via Minimap and verify from cached observed layout without immediately re-dumping Android UI state.
Updates the Codex/Claude agent skill packaging so agents are guided to use Minimap first and raw Android only as fallback evidence.
Adds benchmark notes and an active-development change benchmark protocol for known-route replay, graph growth, route repair, and changed screens.
Keeps the earlier selector compatibility fix so selector taps work against the real android layout shape.

Review Focus

Please review this as a breaking v1 redesign rather than an incremental compatibility patch.

High-value review areas:

CLI contract in crates/minimap-cli/src/main.rs and crates/minimap-cli/tests/cli_contract.rs.
Place matching and variant behavior in crates/minimap-core/src/lib.rs.
Graph path resolution and viewport compatibility in crates/minimap-graph/src/lib.rs.
Repository layout and init behavior in crates/minimap-repo/src/lib.rs.
Schema names and serialized graph shape in crates/minimap-schemas/src/lib.rs.
Agent instructions in plugins/minimap-claude-code/skills/minimap-app-navigation/SKILL.md.
Benchmark interpretation in docs/MINIMAP_BENCHMARK_NOTES.md and docs/MINIMAP_CHANGE_BENCHMARK_PROTOCOL.md.

Known local-only files intentionally excluded from the PR: .claude/ settings/checkpoints.

Validation

Ran locally:

cargo fmt --check
cargo test
cargo clippy --all-targets -- -D warnings
git diff --cached --check

Also ran a controlled change smoke against the installed Minimap CLI using fake android layout and fake adb commands:

/private/tmp/minimap-change-bench/runs/20260528-134010/change-smoke-results.json

Smoke coverage:

Existing place grew -> known_changed, one place variant, no edge churn.
New option/new screen -> needs_label, then new place plus new edge.
Known route with changed destination -> go succeeded, destination variant added.
Renamed selector -> old route surfaced config_error, repair recorded a replacement edge.
Removed option -> old route surfaced config_error, no new edge recorded.

Notes

The real Compose sample changed-app benchmark is not included yet. The current benchmark evidence covers known-path replay on Jetsnack plus deterministic change-case smoke. A follow-up should run the same protocol against a modified Compose sample build before treating the performance numbers as product claims.

Update (2026-06-11): hardening + device targeting

Two commits landed since the original push, closing the CI failure and the validated hardening backlog:

Harden matching tolerance, graph writes, and CLI safety

Safety/correctness fixes, each with regression tests: edge_id panic on long selectors; no_compatible_path vs no_known_path reachability; atomic graph writes; pending-transition TTL + dangling-edge guard; cross-device cache bleed (serial-less -> no cache); arg validation before mutation; duplicate-id detection in validate_graph; overlay android:id/button1 false-positive removed; doctor exit code routed through exit_code_for_status; 0600/0700 cache-file perms; default-deny redaction with tightened email/numeric heuristics.
Matching tolerance: per-dimension scoring is now a blend of Jaccard + containment so one-sided scroll drift heals without size-mismatched false merges; role histogram replaced with presence-set + min/max count term; normalize_label uses deunicode transliteration with a never-empty fallback; duplicate slugs surface label_mismatch unless --allow-duplicate-label. KNOWN_CHANGED_THRESHOLD tuned to 0.80, band-center of the measured clean gap [0.689, 0.902] on Jetsnack. Sibling detail screens (e.g. two product details) intentionally merge into one item-agnostic place.
Fixes the CI failure on the previous push (clippy manual_option_zip, rustfmt drift).

Thread device serial through adb and android CLI calls

New global --serial flag with ANDROID_SERIAL env fallback; every adb call now carries -s <serial>, android layout carries --device=<serial>, and android screen subcommands get ANDROID_SERIAL on the child process. With a configured serial, cache scoping no longer depends on adb get-serialno succeeding.
doctor now detects the multiple-devices-without-serial condition and reports an actionable hint instead of a raw adb failure; with a serial it reports the targeted device.
Contract tests use serial-asserting fakes that fail on any serial-less invocation, so the threading is proven end to end.

Validation: cargo fmt --check, cargo clippy --all-targets -- -D warnings, cargo test -> 100 passed, 0 failed. Earlier live e2e on Jetsnack validated the full loop (init -> label -> grow -> re-identify known -> go replay -> viewport-mismatch refusal).

Deferred follow-ups (intentionally out of scope for v1):

Overlay detector false-negative on stock AlertDialogs (the android layout CLI emits their buttons as text without the resource-ids the detector scans).
Session-cache 30s staleness window: whereami can report a stale place if the screen changed via raw adb in between.
1000ms settle is occasionally too short right after an app relaunch; skipped_edges diagnostics are noisy.
No per-command layout-call counter, so the "second pass cheaper" benchmark claim is inferable but not directly measured.
The real changed-Compose-app benchmark from the change protocol still needs a run before the performance numbers become product claims.

`android layout` emits a flat array of nodes with hyphenated keys (content-desc, resource-id) and a stringified center "[x,y]". `resolve_selector_point` was looking for the legacy UIAutomator shape (camelCase contentDescription/testTag, bounds object) and never matched against real CLI output. Live smoke confirmed: `minimap tap --selector content_desc=Settings` always failed with "Selector not found" against the running emulator. Extend the resolver to accept both shapes so existing fake-adb test fixtures keep working and real CLI output is now supported: - Selector key lookup tries hyphenated first, then camelCase. - center_of falls back to parsing the "[x,y]" string when no bounds object is present. Two new tests: tap_selector_resolves_real_cli_shape exercises the end-to-end selector path against a real-CLI-shaped layout fixture; parse_center_string_parses_bracketed_pair covers the parser edges. Test count: 49 -> 51 passing, 0 failed, 1 ignored.

himattm added 4 commits May 21, 2026 18:29

Redesign minimap as lean navigation graph

1d40efd

Harden matching tolerance, graph writes, and CLI safety

d2ef0e5

Thread device serial through adb and android CLI calls

dd0b962

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Redesign Minimap as a lean navigation graph#1

[codex] Redesign Minimap as a lean navigation graph#1
himattm wants to merge 4 commits into
mainfrom
codex/lean-minimap-navigation-graph

himattm commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

himattm commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review Focus

Validation

Notes

Update (2026-06-11): hardening + device targeting

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

himattm commented May 28, 2026 •

edited

Loading