amar-python
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎evals/FAILURE_MODES.md‎
Lines changed: 89 additions & 0 deletions b/‎evals/FAILURE_MODES.md‎
Lines changed: 89 additions & 0 deletions
diff --git a/‎evals/HANDOFF.md‎
Lines changed: 124 additions & 0 deletions b/‎evals/HANDOFF.md‎
Lines changed: 124 additions & 0 deletions
diff --git a/‎evals/PLAN.md‎
Lines changed: 131 additions & 0 deletions b/‎evals/PLAN.md‎
Lines changed: 131 additions & 0 deletions
@@ -17,6 +17,7 @@ test_output/
 *.out
 *.log
 csv/logs/
+evals/reports/
 *_report_*.txt
 *_skipped_*.csv
 *_valid_*.csv
 
@@ -328,8 +328,8 @@ Realistic Australian T&E data is loaded automatically when `include_seed_data` i
 # Against all environments
 ./tests/run_tests.sh
 
-# Run Python validator tests
-python -m unittest -v tests/test_csv_validator.py
+# Run Python tests
+python -m unittest discover -s tests -p "test*.py" -v
 
 # Manually via psql
 psql -U postgres -d te_mgmt_dev \
 
@@ -0,0 +1,89 @@
+# Failure-mode catalogue — PostgreDataMigrationApp
+
+Each row is a real-world data or operational scenario that *could* break the framework. For each: a concrete example, the **expected** behaviour, and the eval scenario number that proves it.
+
+Legend:
+- ✅ **handles correctly** — framework already produces the right outcome
+- ⚠️ **partial** — works but not asserted by an eval today
+- ❌ **gap** — silently accepts, crashes, or fails noisily without a clean signal
+
+---
+
+## Tier P — Python CSV validator (`csv/validator.py`)
+
+These run without any database. Driven by `evals/runner.py` against the actual `csv/validator.py` script as a subprocess.
+
+| # | Failure mode | Example input | Expected behaviour | Current | Eval ID |
+|---|--------------|--------------|--------------------|---------|---------|
+| P1 | Missing required env vars | (no CSV_FILE) | exit 1; stderr contains "Missing required environment variables" | ✅ | 19 |
+| P2 | CSV file path doesn't exist | `CSV_FILE=/tmp/nope.csv` | exit 1; stderr contains "CSV file not found" | ✅ | 20 |
+| P3 | Empty file (0 bytes) | `` | exit 1; stderr contains "CSV file is empty" | ✅ | 02 |
+| P4 | Header-only file (no data rows) | `id,name\n` | exit 1; stderr contains "No valid rows found" | ✅ | 04 |
+| P5 | All valid rows | `id,name\n1,Alice\n2,Bob\n3,Carol\n` | exit 0; valid count 3, skip count 0 | ✅ | 01 |
+| P6 | Mixed valid + skipped | header + 2 valid + empty row + col-mismatch row | exit 0; valid 2, skip 2, reasons "empty row" and "column mismatch" | ✅ | 05 |
+| P7 | Completely empty row | `id,name\n1,Alice\n,\n2,Bob\n` | row 3 skipped as "empty row" | ✅ | 08 |
+| P8 | Whitespace-only row | `id,name\n1,Alice\n  ,  \n` | row 3 skipped as "empty row" (cell.strip() returns empty) | ✅ | 16 |
+| P9 | Row with fewer fields than header | header has 3, row has 2 | row skipped as "column mismatch" | ✅ | 07 |
+| P10 | Row with more fields than header | header has 2, row has 3 | row skipped as "column mismatch" | ✅ | 07b |
+| P11 | Duplicate column names | `id,id,name\n…` | exit 0 but stdout contains "Duplicate column names" warning | ✅ | 06 |
+| P12 | Leading/trailing whitespace in headers | ` id , name \n…` | headers normalised to `id`, `name` (stripped) | ✅ | 17 |
+| P13 | UTF-8 BOM at start of file | `id,name\n…` | BOM stripped (file opened with `utf-8-sig`); behaves as if no BOM | ✅ | 09 |
+| P14 | UTF-8 emoji in value | `1,Alice 👋` | preserved through; row valid | ✅ | 10 |
+| P15 | UTF-8 CJK characters | `1,田中花子` | preserved through; row valid | ✅ | 18 |
+| P16 | CRLF line endings | `id,name\r\n1,Alice\r\n` | csv module handles natively; row valid | ✅ | 11 |
+| P17 | Quoted field containing comma | `id,note\n1,"Smith, John"\n` | parsed as single field `Smith, John` | ✅ | 13 |
+| P18 | Quoted field containing newline | `id,note\n1,"line1\nline2"\n` | parsed as single row, note has embedded newline | ✅ | 14 |
+| P19 | Quoted field containing escaped quote | `id,note\n1,"she said ""hi"""\n` | parsed as `she said "hi"` | ✅ | 15 |
+| P20 | Unicode RTL (Arabic) | `1,محمد` | preserved through; row valid | ✅ | 21 |
+| P21 | Very long row (50KB single field) | massive single value | accepted as one row | ✅ | 22 |
+| P22 | Mixed encoding (Latin-1 bytes inside UTF-8) | `\xe9` bytes in middle of file | clean exit 1; stderr reports unexpected decode error; no traceback | ✅ | 23 |
+
+**Of 22 modes:** all 22 are now covered by Tier P evals. Scenario numbering is 01–23 because the short and long column-mismatch cases are separate eval fixtures.
+
+---
+
+## Tier I — Idempotency
+
+| # | Failure mode | Example scenario | Expected behaviour | Current | Eval ID |
+|---|--------------|------------------|--------------------|---------|---------|
+| I1 | Re-run `deploy_all.sh dev` on a deployed DB | run twice in a row | exit 0 both times; no new rows inserted on 2nd run (`ON CONFLICT DO NOTHING` seed pattern); no `CREATE TABLE` errors | ✅ | 01 |
+| I2 | Re-run after `\set` changes to identifiers (table renamed) | edit env_dev.sql to rename a table, re-run | should fail safely (rename isn't auto-detected) — manual migration needed | ⚠️ | not in initial set |
+| I3 | Schema deployed; user manually drops a table; re-run | DROP TABLE te_dev.organisations; re-run env_dev.sql | table re-created; FK-dependent seed inserts pass | ✅ | not in initial set |
+| I4 | Re-run with PG service offline | stop PG, run deploy_all.sh | exit non-zero; stderr contains connection-refused | ✅ | not in initial set |
+
+Tier I initial scope: only I1 (the core re-runnability claim from the README).
+
+---
+
+## Tier S — SQL suite integration
+
+| # | Failure mode | Example scenario | Expected behaviour | Current | Eval ID |
+|---|--------------|------------------|--------------------|---------|---------|
+| S1 | Fresh deploy + all 5 suites must pass | `deploy_all.sh dev` then `run_tests.sh dev` | exit 0; "ALL TESTS PASSED" appears in stdout; overall pass_rate 100.0% | ✅ | 01 |
+| S2 | Deploy without seed data; suites should fail predictably | `\set include_seed_data false`; run suites | some suites expecting non-zero row counts fail; runner reports the count drift | ⚠️ | not in initial set |
+| S3 | Deploy with corrupted seed (intentionally invalid FK) | manually edit seed insert to break an FK | deploy_all fails at seed step; no schema corruption | ⚠️ | not in initial set |
+
+Tier S initial scope: only S1.
+
+---
+
+## What this catalogue does NOT yet cover
+
+- **Multi-DB equivalence** (cross-engine schema parity) — deferred until PG is locked in.
+- **Performance / scale** (1M-row load timing) — separate suite if needed later.
+- **Cross-environment structural equivalence** (Dev vs Test vs Staging vs Prod) — Tier E, future.
+- **Domain-rule deep dives beyond suite 05** — Tier D, future.
+- **Validator behaviour on >128KB single field** — beyond the current 50KB eval and Python `csv` default field-size assumptions.
+
+---
+
+## Summary
+
+| Tier | Modes catalogued | Modes in initial eval set | Deferred |
+|------|------------------|---------------------------|----------|
+| P | 22 | 22 | 0 |
+| I | 4 | 1 | 3 |
+| S | 3 | 1 | 2 |
+| **Total** | **29** | **21** | **7** |
+
+The current eval set covers every catalogued Tier P mode plus the initial Tier I and Tier S operational scenarios. The remaining deferred items are PostgreSQL operational and cross-engine scenarios.
@@ -0,0 +1,124 @@
+# Hand-off summary — evals/ package
+
+**Date:** 2026-05-26
+**Scope this round:** PostgreSQL only (Tiers P + I + S). MariaDB / SQLite / etc. deferred.
+
+## What was delivered
+
+| File / folder | What it is |
+|--------------|-----------|
+| `evals/PLAN.md` | Scope, folder layout, tiering, phases |
+| `evals/FAILURE_MODES.md` | 29 catalogued failure modes; all 22 Tier P modes now covered |
+| `evals/README.md` | How to run; CLI flags; exit codes |
+| `evals/runner.py` | Single-file scenario discovery + diff engine + JSON report writer (606 lines) |
+| `evals/datasets/tier_p/01-23/` | 23 CSV scenarios (happy + empty + malformed + unicode + generated edge cases) |
+| `evals/datasets/tier_i/01_deploy_dev_twice/` | Idempotency scenario (README only — runner drives the work) |
+| `evals/datasets/tier_s/01_fresh_deploy_then_all_tests_pass/` | SQL-suite integration scenario |
+| `evals/expected/tier_p/*.json` | 20 expected-outcome files |
+| `evals/expected/tier_i/01_deploy_dev_twice.json` | Expected outcome (exit codes + row-count parity) |
+| `evals/expected/tier_s/01_fresh_deploy_then_all_tests_pass.json` | Expected outcome (85/85 + ALL TESTS PASSED) |
+| `evals/reports/` | Auto-created at runtime; one folder per run with `summary.json` |
+
+## What was executed
+
+| Tier | Result | Notes |
+|------|--------|-------|
+| **P** (23 scenarios) | **23 PASS / 0 FAIL** | Executed locally against the real `csv/validator.py` |
+| **I** (1 scenario)   | **SKIP** | No PostgreSQL in the sandbox — clean skip with diagnostic |
+| **S** (1 scenario)   | **SKIP** | Same |
+
+Latest local run report: `evals/reports/20260531T094230Z-223fa5/summary.json`
+
+## What you should do next (in order)
+
+### 1. Verify Tier P on your machine
+
+```powershell
+cd "$env:USERPROFILE\OneDrive\Desktop\Migration using ai\PostgreDataMigrationApp"
+python evals\runner.py
+```
+
+Expect: `total: 20, passed: 20, failed: 0, skipped: 0` and exit code 0.
+
+### 2. Run Tier I + S against your local PostgreSQL
+
+```powershell
+cd "$env:USERPROFILE\OneDrive\Desktop\Migration using ai\PostgreDataMigrationApp"
+# Make sure libpq vars are set or peer auth works:
+#   $env:PGHOST = 'localhost'
+#   $env:PGUSER = 'postgres'
+#   $env:PGPASSWORD = '...'
+python evals\runner.py --tiers p,i,s
+```
+
+Expect with PostgreSQL available: 23 P pass, 1 I pass, 1 S pass = **25 / 25**.
+
+If Tier I fails on the row-count parity, check the `actual.row_counts_first` vs
+`actual.row_counts_second` in `summary.json` — that pinpoints the table whose seed
+inserts aren't using `ON CONFLICT DO NOTHING`.
+
+If Tier S fails, the runner captures the last 2 KB of the suite's stdout in
+`actual.stdout_tail` — that almost always pinpoints which assertion failed.
+
+### 3. Archive the workspace cruft
+
+The parent folder `Migration using ai\` still contains my earlier Python demo
+(`src/`, `tests/`, `evals/` at root, `config.yaml`, etc.) that has nothing to do
+with `PostgreDataMigrationApp`. A cleanup script is ready:
+
+```powershell
+cd "$env:USERPROFILE\OneDrive\Desktop\Migration using ai"
+.\cleanup_workspace.ps1            # preview (dry run)
+.\cleanup_workspace.ps1 -Apply     # commit the moves
+```
+
+Everything is **moved** to `_archive_demo\` (not deleted). Reversible by drag-back.
+`PostgreDataMigrationApp\` is on the never-touch list.
+
+### 4. (Optional) Wire evals into CI
+
+Your existing GitHub Actions workflow `python-validator-tests.yml` only runs
+the Python unit tests in `tests/`. To add the 23 Tier P evals:
+
+Add to the workflow:
+
+```yaml
+    - name: Run Tier P evals
+      run: python evals/runner.py --tiers p
+      working-directory: PostgreDataMigrationApp
+```
+
+Tier I and S can be added when you have PostgreSQL in your CI runner.
+
+## What's deferred to a later round
+
+| Item | Why deferred |
+|------|--------------|
+| Tier X — cross-DB schema equivalence (MariaDB / SQLite / Teradata) | You explicitly said "ignore them for now" |
+| Tier E — cross-environment (Dev/Test/Staging/Prod) structural parity | Not needed until you ship beyond Dev |
+| Tier D — extended domain-rule evals beyond suite 05 | Existing 85 assertions already cover the high-value rules |
+| Performance / scale (1M+ rows) | Would need a fixture generator — separate round |
+| AI-assisted anomaly detection | Out of scope for deterministic evals |
+| Single-field >128 KB | Current eval covers 50KB; larger-than-default `csv` field-size behavior remains future scope |
+
+Catalogued in `FAILURE_MODES.md` so they're not lost.
+
+## File counts
+
+| Category | Count |
+|----------|-------|
+| Markdown docs (PLAN, FAILURE_MODES, README, HANDOFF) | 4 |
+| Python (`runner.py`) | 1 |
+| CSV input fixtures | 20 (scenarios 19, 20, 22, and 23 are generated/no-input scenarios) |
+| Expected JSON files | 25 |
+| Scenario README/TXT files | 4 |
+| **Total files created** | **55** |
+
+## Open items
+
+Nothing blocking. If you want me to take the next step, ask for:
+
+- "extend Tier P to N more scenarios" (e.g. RTL, very-long-row, latin-1)
+- "add Tier X — cross-DB equivalence" once you have other DB engines installed
+- "wire the evals into the GitHub Actions workflow"
+- "build a sample-data generator that produces 100 K / 1 M row CSVs for perf"
@@ -0,0 +1,131 @@
+# Evals package — PostgreDataMigrationApp
+
+This folder holds **data-driven evals** for the T&E Database Framework. Evals complement (but don't replace) the existing SQL test suites under `tests/suites/` and the Python unit tests under `tests/test_csv_validator.py`.
+
+## Why a separate `evals/` folder
+
+| Aspect | `tests/` (existing) | `evals/` (this folder) |
+|--------|--------------------|--------------------|
+| Driver | Code (SQL assertions + Python unittest) | Data (CSV fixtures + expected JSON) |
+| Adding a new scenario | Edit Python or SQL | Drop a new `datasets/<NN>/input.csv` and `expected/<NN>.json` |
+| What gets reviewed | Code diffs | Data fixtures (much easier to eyeball) |
+| Failure granularity | One Python assert, one SQL assert | exit_code + stderr + stdout + output-file contents — all in one report |
+| Scope | Schema correctness, business rules, validator unit behaviour | End-to-end behaviour: validator interface, idempotency, full SQL-suite-passes-after-load |
+
+In short: `tests/` proves the **code is correct**; `evals/` proves the **framework behaves correctly on real-world data and operational scenarios**.
+
+## Scope (this round)
+
+- **Database engine:** PostgreSQL only.
+  - MariaDB / SQLite / InfluxDB / Redis / Teradata adapters are deliberately out of scope until Postgres evals are stable. The eval structure is engine-agnostic so they can be added later.
+- **Tiers in scope:**
+  - **Tier P** — Python CSV validator (`csv/validator.py`). Pure data-in / files-out. No DB.
+  - **Tier I** — Idempotency of `deploy_all.sh` against a clean Dev PostgreSQL.
+  - **Tier S** — SQL test suite integration: deploy fresh + run all 5 suites and assert 85/85.
+- **Tiers deferred:**
+  - **Tier X** — Cross-DB schema equivalence (MariaDB/SQLite). Out until Postgres is locked in.
+  - **Tier E** — Cross-environment (Dev/Test/Staging/Prod) structural equivalence.
+  - **Tier D** — Extended domain-rule evals beyond what suite 05 already covers.
+
+## Folder layout
+
+```
+PostgreDataMigrationApp/
+└── evals/
+    ├── PLAN.md                          ← this file
+    ├── FAILURE_MODES.md                 ← failure-mode catalogue
+    ├── README.md                        ← how to run it
+    ├── runner.py                        ← scenario discovery + diff engine + report
+    │
+    ├── datasets/
+    │   ├── tier_p/                      ← Python CSV validator scenarios
+    │   │   ├── 01_happy_path/
+    │   │   │   └── input.csv
+    │   │   ├── 02_empty_file/
+    │   │   └── … (23 scenarios)
+    │   │
+    │   ├── tier_i/                      ← idempotency scenarios
+    │   │   └── 01_deploy_dev_twice/
+    │   │       └── README.md            ← what the runner does (no CSV needed)
+    │   │
+    │   └── tier_s/                      ← SQL suite integration
+    │       └── 01_fresh_deploy_then_all_tests_pass/
+    │           └── README.md
+    │
+    ├── expected/
+    │   ├── tier_p/
+    │   │   ├── 01_happy_path.json
+    │   │   ├── 02_empty_file.json
+    │   │   └── …
+    │   ├── tier_i/
+    │   │   └── 01_deploy_dev_twice.json
+    │   └── tier_s/
+    │       └── 01_fresh_deploy_then_all_tests_pass.json
+    │
+    └── reports/                         ← runtime output (gitignored)
+        └── <run_id>/
+            ├── tier_p_summary.json
+            ├── tier_i_summary.json
+            ├── tier_s_summary.json
+            └── overall.json
+```
+
+Each scenario is self-contained. To add a new one: create a new folder under `datasets/<tier>/` and a matching `expected/<tier>/<scenario>.json`. No code edits.
+
+## Expected JSON contract
+
+Each `expected/<tier>/<scenario>.json` declares what the runner should see. Only the fields populated are diffed — everything else is ignored. Example:
+
+```json
+{
+  "scenario": "05_mixed_valid_skipped",
+  "expected": {
+    "exit_code": 0,
+    "stdout_contains": ["Valid rows   : 2", "Skipped rows : 2"],
+    "stderr_contains": [],
+    "valid_csv_rows": [["id", "name"], ["1", "Alice"], ["3", "Bob"]],
+    "skip_csv_rows_count": 2,
+    "skip_reasons_contain": ["empty row", "column mismatch"]
+  },
+  "notes": "Two valid rows, two skipped — one empty, one with too few fields."
+}
+```
+
+## Runner behaviour
+
+`runner.py`:
+
+1. Discovers every folder under `datasets/<tier>/`.
+2. For Tier P: copies the scenario's `input.csv` to a temp file, sets env vars, runs `python3 csv/validator.py`, captures exit + stdout + stderr + output files, diffs against `expected/<tier>/<scenario>.json`.
+3. For Tier I and Tier S: runs the orchestration script (`deploy_all.sh dev`, `tests/run_tests.sh dev`) and asserts the documented outcomes.
+4. Prints one line per scenario: `PASS / FAIL / SKIPPED`.
+5. Writes per-tier and overall JSON summaries under `reports/<run_id>/`.
+
+Exit code: 0 if all scenarios in selected tiers pass, 1 otherwise. CI-friendly.
+
+## Tiering
+
+- **Tier P** runs anywhere — only needs Python + the validator script. **Fully offline.**
+- **Tier I** and **Tier S** need a live PostgreSQL instance + `psql` on PATH + the values in `config.local.env`. They skip cleanly if no DB is reachable.
+
+## Phasing
+
+| Phase | What | Status |
+|-------|------|--------|
+| 0 | This PLAN + FAILURE_MODES + README | **DONE** |
+| 1 | Tier P dataset folders (23 scenarios) + expected JSONs | DONE |
+| 2 | Tier P runner.py | DONE |
+| 3 | Execute Tier P locally; show results | DONE / awaiting your review |
+| 4 | Tier I scaffolding + runner extension | next |
+| 5 | Tier S scaffolding + runner extension | next |
+| 6 | (Future) Tier X across MariaDB/SQLite once Postgres is locked in | deferred |
+
+## What this DOES NOT do
+
+- Doesn't replace the existing 85-assertion SQL suite under `tests/suites/` — Tier S runs those after a fresh deploy and counts the pass total.
+- Doesn't replace the Python unit tests in `tests/` — Tier P is broader (23 scenarios), while unit tests keep fast in-Python coverage for the validator and eval runner.
+- Doesn't include performance/load testing yet (scope creep — separate suite if needed later).
+
+## Open questions
+
+None blocking — proceeding with the plan above. If you want to change scope or add scenarios after seeing the Tier P results, edit `FAILURE_MODES.md` and let me know.