feat: wire data_diff tool through reladiff engine#102
feat: wire data_diff tool through reladiff engine#102suryaiyer95 wants to merge 3 commits intomainfrom
Conversation
- New `data-diff` primary agent mode for cross-database data validation with progressive checks: row counts → column profiles → segment checksums → row-level diffs - New `/data-validate` skill with dialect-specific SQL templates for Snowflake, Postgres, BigQuery, DuckDB, Databricks, ClickHouse, MySQL - Prompt covers 4 validation levels, cross-database checksum awareness, and structured PASS/FAIL reporting - Added `/data-validate` to migrator and validator skill lists so both modes can invoke it Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… data validation
Adds the full pipeline: TypeScript tool → Bridge → Python orchestrator → Rust engine.
- `data-diff-run.ts`: TypeScript tool wrapping `Bridge.call("data_diff.run")`
- `data_diff.py`: Python orchestrator driving the cooperative state machine loop
via `altimate_core.ReladiffSession` (start → execute SQL → step → repeat)
- `server.py`: Added `data_diff.run` dispatch to JSON-RPC bridge
- `protocol.ts`: `DataDiffRunParams`/`DataDiffRunResult` interfaces + bridge method
- `registry.ts`: Registered `DataDiffRunTool` in tool registry
- `agent.ts`: Added `data_diff: "allow"` to data-diff agent permissions
- `data-diff.txt`: Rewrote prompt to use `data_diff` tool as primary approach,
with manual SQL as fallback
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| """Execute a single SQL task against the given warehouse.""" | ||
| result = execute_sql( | ||
| SqlExecuteParams(sql=task["sql"], warehouse=warehouse, limit=100_000) | ||
| ) | ||
|
|
||
| # Convert SqlExecuteResult rows to the format expected by ReladiffSession.step() | ||
| rows: list[list[str | None]] = [] | ||
| for row in result.rows: | ||
| rows.append([str(v) if v is not None else None for v in row]) | ||
|
|
||
| return {"id": task["id"], "rows": rows} |
There was a problem hiding this comment.
Bug: The _execute_task function doesn't check for error results from execute_sql, causing it to treat error messages as valid data rows and pass them to the Rust engine.
Severity: HIGH
Suggested Fix
In _execute_task, check if result.columns == ["error"]. If it is, propagate the error up to the caller (run_data_diff) so it can be handled properly, instead of processing the rows. This will prevent malformed data from reaching the Rust engine and ensure failures are reported correctly.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: packages/altimate-engine/src/altimate_engine/sql/data_diff.py#L52-L63
Potential issue: The `execute_sql` function returns errors, such as connection issues or
invalid SQL, as a special `SqlExecuteResult` object with `columns=["error"]` instead of
raising an exception. The `_execute_task` function in `data_diff.py` does not check for
this error state and processes the error message as if it were a valid data row. This
leads to malformed data being passed to the Rust `ReladiffSession` engine, which can
result in incorrect diffs or opaque crashes. The `run_data_diff` function will
incorrectly report `success: True` even when a SQL execution has failed.
Did we get this right? 👍 / 👎 to inform future reviews.
Reflects altimate-core change: `column_lineage` and `track_lineage` now work without credentials. SDK logging activates when initialized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
Summary
data_diffTypeScript tool that wraps the Rust reladiff engine via Bridge → Python orchestrator →altimate_core.ReladiffSessiondata_diff.pyPython orchestrator that drives the cooperative state machine loop (start → execute SQL via ConnectionRegistry → step → repeat)data_diff.runmethod in the JSON-RPC bridge dispatcherDataDiffRunParams/DataDiffRunResultto the bridge protocoldata_difftool as primary approach (deterministic Rust engine) with manual SQL as fallbackPipeline
Test plan
tsc --noEmit)bun run build— 11 platform targets)data_difftool appears in data-diff mode tool list🤖 Generated with Claude Code