feat: wire data_diff tool for deterministic data validation#107
feat: wire data_diff tool for deterministic data validation#107suryaiyer95 wants to merge 10000 commits intomainfrom
Conversation
- Restore .trim() on models API JSON to prevent syntax error in generated models-snapshot.ts - Fix archive path for scoped package names (@altimate/cli-*) in release tarball/zip creation - Remove gh release upload from build.ts (handled by github-release job) - Add CHANGELOG.md entry for v0.1.5 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Redesign M as 5-wide with visible V-valley to distinguish from A - Change E top from full bar to open-right, distinguishing from T - Fix T with full-width crossbar and I as narrow column - Fix D shape in CODE - Render CODE in theme.accent (purple) instead of theme.primary (peach) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- publish.ts: change glob from `*/package.json` to `**/package.json` to find scoped package directories (@altimate/cli-*) which are 2 levels deep - release.yml: add skip-existing to PyPI publish so it doesn't fail when the engine version hasn't changed between releases Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The npm org is @AltimateAI, not @Altimate. Update all package names, workspace dependencies, imports, and documentation to use the correct scope so npm publish succeeds. Name mapping: - @altimate/cli → @altimateai/altimate-code - @altimate/cli-sdk → @altimateai/altimate-code-sdk - @altimate/cli-plugin → @altimateai/altimate-code-plugin - @altimate/cli-util → @altimateai/altimate-code-util - @altimate/cli-script → @altimateai/altimate-code-script Also updates publish.ts to emit the wrapper package as @altimateai/altimate-code (no -ai suffix) and hardcodes the bin entry to altimate-code. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Two issues: 1. TypeScript permission-task tests: test fixture wrote config to `opencode.json` but the config loader only looks for `altimate-code.json`. Updated fixture to use correct filename. 2. Python tests: `pytest: command not found` because pyproject.toml had no `dev` optional dependency group. Added `dev` extras with pytest and ruff. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: rename opencode references to altimate-code in all test files Update test files to use the correct names after the config loader was renamed from opencode to altimate-code: - `opencode.json` → `altimate-code.json` - `.opencode/` → `.altimate-code/` - `.git/opencode` → `.git/altimate-code` - `OPENCODE_*` env vars → `ALTIMATE_CLI_*` - Cache dir `opencode` → `altimate-code` - Schema URL `opencode.ai` → `altimate-code.dev` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve remaining test failures and build import issue - Fix build.ts solid-plugin import to use bare specifier for monorepo hoisting - Update agent tests: "build" → "builder", "plan" → "analyst" for disabled fallback - Fix well-known config mock URL in config.test.ts - Fix message-v2 test: "OpenCode" → "Altimate CLI" - Fix retry.test.ts: replace unsupported test.concurrent with test - Fix read.test.ts: update agent name to "builder" - Fix agent-color.test.ts: update config keys to "builder" - Fix registry.test.ts: remove unpublished plugin dep from test fixture - Skip adding plugin dependency in local dev mode (installDependencies) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address Sentry review comments and Python CI deps - Update theme schema URL from opencode.ai to altimate-code.dev (33 files) - Rename opencode references in ACP README.md and AGENTS.md docs - Update test fixture tmp dir prefix to altimate-code-test- - Install warehouse extras in Python CI for duckdb/boto3 test deps Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: Python CI — SqlGuardResult allows None data, restrict pytest to tests/ - Allow SqlGuardResult.data to be None (fixes lineage.check Pydantic error) - Set testpaths = ["tests"] in pyproject.toml to exclude src/test_local.py from pytest collection (it's a source module, not a test) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve ruff lint errors in Python engine - Remove unused imports in server.py (duplicate imports, unused models) - Remove unused `json` import in schema/cache.py - Remove unused `os` import in sql/feedback_store.py - Add noqa for keyring availability check import Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use import.meta.resolve to find the @opentui/core package directory instead of hardcoding node_modules path, which fails with monorepo hoisting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aming - Build: output binary as altimate-code instead of opencode - Bin wrapper: look for @altimateai/altimate-code-* scoped packages - Postinstall: resolve @AltimateAI scoped platform packages - Publish: update Docker/AUR/Homebrew refs to AltimateAI/altimate-code - Publish: make Docker/AUR/Homebrew non-fatal (infra not set up yet) - Dockerfile: update binary paths and names Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
…sion (#15762) Co-authored-by: Test User <test@test.com> Co-authored-by: Shoubhit Dash <shoubhit2005@gmail.com>
Co-authored-by: Adam <2363879+adamdotdevin@users.noreply.github.com>
…ck required tools)
Co-Authored-By: Kai (Claude Opus 4.6) <noreply@anthropic.com>
…onfig fix: restore TUI crash after upstream merge
fix: correct TEAM_MEMBERS ref from 'dev' to 'main' in pr-standards workflow
- Add `AltimateApi` client for datamate CRUD and integration resolution - Add `datamate` tool with 9 operations: list, show, create, update, delete, add (MCP connect), remove (MCP disconnect), list-integrations, status - Extract shared MCP config utilities (`resolveConfigPath`, `addMcpToConfig`, `removeMcpFromConfig`, `listMcpInConfig`) to `mcp/config.ts` - Add `/datamate-setup` skill for guided datamate onboarding - Register datamate tool in tool registry and TUI sync context - Add test suite for `AltimateApi` credential loading and API methods
feat: datamate manager — dynamic MCP server management
Replace arc-runner-altimate-code with ubuntu-latest across all workflows to eliminate security risk on public repo. Closes #109 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- New `data-diff` primary agent mode for cross-database data validation with progressive checks: row counts → column profiles → segment checksums → row-level diffs - New `/data-validate` skill with dialect-specific SQL templates for Snowflake, Postgres, BigQuery, DuckDB, Databricks, ClickHouse, MySQL - Prompt covers 4 validation levels, cross-database checksum awareness, and structured PASS/FAIL reporting - Added `/data-validate` to migrator and validator skill lists so both modes can invoke it Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… data validation
Adds the full pipeline: TypeScript tool → Bridge → Python orchestrator → Rust engine.
- `data-diff-run.ts`: TypeScript tool wrapping `Bridge.call("data_diff.run")`
- `data_diff.py`: Python orchestrator driving the cooperative state machine loop
via `altimate_core.ReladiffSession` (start → execute SQL → step → repeat)
- `server.py`: Added `data_diff.run` dispatch to JSON-RPC bridge
- `protocol.ts`: `DataDiffRunParams`/`DataDiffRunResult` interfaces + bridge method
- `registry.ts`: Registered `DataDiffRunTool` in tool registry
- `agent.ts`: Added `data_diff: "allow"` to data-diff agent permissions
- `data-diff.txt`: Rewrote prompt to use `data_diff` tool as primary approach,
with manual SQL as fallback
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…xecutor The executor returns `row_count=0` with a synthetic `["Query executed successfully"]` row when SQL returns no results. Without this guard, the Rust engine interprets the status row as actual data, causing false "duplicate key" errors in JoinDiff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ut formatting - Add `source_where_clause` and `target_where_clause` params to bridge protocol - Update `run_data_diff` to pass per-table WHERE to reladiff engine - Enhance tool output formatting with column-level match rates and sample mismatches - Expand system prompt with progressive validation guidance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 519 integration tests (516 pass + 3 xfail) across 120 test classes - Tests cover: DuckDB, Postgres, cross-warehouse, all 6 algorithms - Edge cases: NULL semantics, numeric precision, reserved keywords, composite keys - Add Docker Compose for Postgres 16 test environment - Add 28 research documents (themes A-Z) covering data validation landscape Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0ab4371 to
289cdde
Compare
…on` to run - Add `integration` marker to `test_data_diff_integration.py` - Configure `pyproject.toml` to skip integration tests by default - Integration tests require Docker Postgres and matching `altimate-core` build Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfig These files are internal development artifacts not appropriate for the open-source repository: - `docs/research/` — 28 internal research documents (2.1MB) - `test_data_diff_integration.py` — requires Docker Postgres + Rust engine - `docker-compose.yml` — test infrastructure Unit tests (`test_data_diff.py`) remain — they use mocks and need no external deps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… diff engine When `execute_sql` fails, it returns `columns=['error']` with the error message as a data row. Previously this was silently passed to the Rust engine as data, causing confusing downstream failures. Now raises `RuntimeError` immediately so the error propagates to the user. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add `_validate_where_clause()` to reject injection patterns (semicolons, comments) - Validate all WHERE clause parameters before passing to Rust engine - Remove unused `_SIDE_MAP` constant from `data_diff.py` - Add null guards for `diff_percent` and `match_percent` in TypeScript to prevent NaN display
- Delete `altimate_engine/sql/data_diff.py` — all Python orchestration now lives in `altimate_core.data_diff` (altimate-core-internal repo) - Delete `tests/test_data_diff.py` — tests moved to altimate-core-internal - Update `server.py` to import from `altimate_core.data_diff` with inline `_executor` and `_resolve_dialect` callbacks - Add try/catch on import with install instructions - Expand data-diff agent prompt with Cascade/Recon/Profile details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multi-Model Code Review: Data Validation ModeVerdict: APPROVE (with recommended fixes)
Major Issues1. TypeScript
|
| # | Issue | Location | Fix | Flagged By |
|---|---|---|---|---|
| 2 | conn.type None → KeyError in dialect lookup |
server.py:_resolve_dialect() |
Use .get(conn.type, "generic") |
MiniMax, GLM-5 |
| 3 | Recon mode unreachable from agent — no rules parameter in tool |
data-diff-run.ts + data_diff.py |
Wire up recon_rules or document as "coming soon" |
Kimi |
| 4 | str() conversion loses Decimal precision ("0.10" → "0.1") |
server.py:231-233 |
Known trade-off of string-based checksumming. Document it. | Claude |
Positive Observations
- TypeScript tool definition is thorough: Zod schemas with descriptive parameter docs, proper error handling, well-structured
formatOutcome()for human-readable reports - Production-quality agent prompt (
data-diff.txt): Algorithm decision matrix, cost awareness, progressive strategy guide, concrete examples - Clean Python orchestrator:
run_data_diff()with dependency-injectedexecutoranddialect_resolvercallbacks — zero coupling to DB drivers - Skill documentation (
SKILL.md): Clear progressive validation workflow guide
Missing Tests
formatOutcomewith actual Rust serde output shapes (verify mode detection works)conn.type=Nonein dialect resolution_executorwithDecimal/datetimetypes- Recon mode end-to-end (currently unreachable from agent)
- Empty
key_columnsarray handling
🤖 Multi-model review by Claude Opus 4.6, Kimi K2.5, Grok, MiniMax M2.5, GLM-5
✅ Tests — All PassedTypeScript — passedPython — passedTested at |
What does this PR do?
Adds a
data_difftool anddata-diffagent mode that wraps the Rust reladiff engine for deterministic table-to-table data validation. Tested end-to-end on Snowflake with up to 1M rows.Pipeline:
Files changed:
data-diff-run.ts— TypeScript tool callingBridge.call("data_diff.run")data_diff.py— Python orchestrator driving the cooperative state machine loopserver.py— Registersdata_diff.runin JSON-RPC dispatcherprotocol.ts—DataDiffRunParams/DataDiffRunResultbridge protocol typesagent.ts—data-diffagent mode with SQL/warehouse tool permissionsdata-diff.txt— System prompt for data-diff agentSKILL.md—/data-validateskill for guided validation workflowsguard.py— Updated docstrings (no longer requires API keys)Type of change
How did you verify your code works?
End-to-end tested on Snowflake across all 4 algorithms and at scale (up to 1M rows, <12s).
Issue for this PR
Internal feature — data validation mode for altimate-code CLI.
Checklist