feat: wire data_diff tool through reladiff engine by suryaiyer95 · Pull Request #102 · AltimateAI/altimate-code

suryaiyer95 · 2026-03-10T03:44:22Z

Summary

Adds data_diff TypeScript tool that wraps the Rust reladiff engine via Bridge → Python orchestrator → altimate_core.ReladiffSession
Creates data_diff.py Python orchestrator that drives the cooperative state machine loop (start → execute SQL via ConnectionRegistry → step → repeat)
Registers data_diff.run method in the JSON-RPC bridge dispatcher
Adds DataDiffRunParams/DataDiffRunResult to the bridge protocol
Updates data-diff agent prompt to use data_diff tool as primary approach (deterministic Rust engine) with manual SQL as fallback
Depends on: AltimateAI/altimate-core-internal PR for the reladiff Rust module

Pipeline

LLM (data-diff mode) → data_diff tool (TS) → Bridge.call("data_diff.run")
→ JSON-RPC → server.py → run_data_diff() → altimate_core.ReladiffSession (Rust)
→ cooperative loop (SQL tasks ↔ ConnectionRegistry) → structured result

Test plan

TypeScript type check passes (tsc --noEmit)
Build succeeds (bun run build — 11 platform targets)
End-to-end test with configured warehouse connections
Verify data_diff tool appears in data-diff mode tool list

🤖 Generated with Claude Code

- New `data-diff` primary agent mode for cross-database data validation with progressive checks: row counts → column profiles → segment checksums → row-level diffs - New `/data-validate` skill with dialect-specific SQL templates for Snowflake, Postgres, BigQuery, DuckDB, Databricks, ClickHouse, MySQL - Prompt covers 4 validation levels, cross-database checksum awareness, and structured PASS/FAIL reporting - Added `/data-validate` to migrator and validator skill lists so both modes can invoke it Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… data validation Adds the full pipeline: TypeScript tool → Bridge → Python orchestrator → Rust engine. - `data-diff-run.ts`: TypeScript tool wrapping `Bridge.call("data_diff.run")` - `data_diff.py`: Python orchestrator driving the cooperative state machine loop via `altimate_core.ReladiffSession` (start → execute SQL → step → repeat) - `server.py`: Added `data_diff.run` dispatch to JSON-RPC bridge - `protocol.ts`: `DataDiffRunParams`/`DataDiffRunResult` interfaces + bridge method - `registry.ts`: Registered `DataDiffRunTool` in tool registry - `agent.ts`: Added `data_diff: "allow"` to data-diff agent permissions - `data-diff.txt`: Rewrote prompt to use `data_diff` tool as primary approach, with manual SQL as fallback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry · 2026-03-10T03:46:57Z

packages/altimate-engine/src/altimate_engine/sql/data_diff.py

+    """Execute a single SQL task against the given warehouse."""
+    result = execute_sql(
+        SqlExecuteParams(sql=task["sql"], warehouse=warehouse, limit=100_000)
+    )
+
+    # Convert SqlExecuteResult rows to the format expected by ReladiffSession.step()
+    rows: list[list[str | None]] = []
+    for row in result.rows:
+        rows.append([str(v) if v is not None else None for v in row])
+
+    return {"id": task["id"], "rows": rows}


Bug: The _execute_task function doesn't check for error results from execute_sql, causing it to treat error messages as valid data rows and pass them to the Rust engine.
_{Severity: HIGH}

Suggested Fix

In _execute_task, check if result.columns == ["error"]. If it is, propagate the error up to the caller (run_data_diff) so it can be handled properly, instead of processing the rows. This will prevent malformed data from reaching the Rust engine and ensure failures are reported correctly.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: packages/altimate-engine/src/altimate_engine/sql/data_diff.py#L52-L63 Potential issue: The `execute_sql` function returns errors, such as connection issues or invalid SQL, as a special `SqlExecuteResult` object with `columns=["error"]` instead of raising an exception. The `_execute_task` function in `data_diff.py` does not check for this error state and processes the error message as if it were a valid data row. This leads to malformed data being passed to the Rust `ReladiffSession` engine, which can result in incorrect diffs or opaque crashes. The `run_data_diff` function will incorrectly report `success: True` even when a SQL execution has failed.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

Reflects altimate-core change: `column_lineage` and `track_lineage` now work without credentials. SDK logging activates when initialized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-10T04:19:36Z

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

github-actions · 2026-03-10T06:45:40Z

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

suryaiyer95 and others added 2 commits March 9, 2026 19:07

github-actions bot added the contributor label Mar 10, 2026

sentry bot reviewed Mar 10, 2026

View reviewed changes

docs: update lineage guard docstrings — no longer requires API keys

1713848

Reflects altimate-core change: `column_lineage` and `track_lineage` now work without credentials. SDK logging activates when initialized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot added the needs:compliance label Mar 10, 2026

github-actions bot removed the needs:compliance label Mar 10, 2026

github-actions bot closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire data_diff tool through reladiff engine#102

feat: wire data_diff tool through reladiff engine#102
suryaiyer95 wants to merge 3 commits intomainfrom
feat/data-validation-mode

suryaiyer95 commented Mar 10, 2026

Uh oh!

sentry bot Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

suryaiyer95 commented Mar 10, 2026

Summary

Pipeline

Test plan

Uh oh!

sentry bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant