Skip to content

Add Ocotillo data migration guide#671

Draft
jeremyzilar wants to merge 2 commits intostagingfrom
data-migration-guide
Draft

Add Ocotillo data migration guide#671
jeremyzilar wants to merge 2 commits intostagingfrom
data-migration-guide

Conversation

@jeremyzilar
Copy link
Copy Markdown
Contributor

What this adds

A new guide at docs/data-migration-guide.md for the Ocotillo Data Services team covering the full migration process. This was written in the context of the upcoming geothermal migration but is intended as the team's reference for all future migrations.

Why now

The April 2026 Geothermal Discovery Report identified several process gaps from the NM_Aquifer migration: a spreadsheet tracker that became stale and untrustworthy, no clear place to track outstanding work, schema decisions that were not recorded, and no standardized way to verify a migration was complete. This guide addresses each of those directly.

What the guide covers

  • What we were doing and what we are changing — plain-language summary of the four problems from the NM_Aquifer migration and the specific practice that addresses each one
  • The two types of work — Alembic schema migrations vs Python transfer scripts, with commands
  • Schema mapping — the existing Ocotillo data hierarchy as a Mermaid ER diagram, the NM_Wells priority source schema as a second ER diagram, tools for mapping work (dbdiagram.io, field mapping tracker CSV, alembic --sql preview), and the two required ADRs before any geothermal script is written
  • Three tools, three jobs — JIRA for outstanding work, the field mapping tracker CSV for per-field decisions, and oco transfer-results for row-count verification. Includes the 13-step workflow that connects all three and explains when each source of truth is updated
  • Migration phases — six phases from discovery through production, with the gate condition for moving to the next
  • Transfer script conventions — idempotency pattern, Core batch inserts, logging standards, error handling, and a minimal template
  • Audit CLIoco transfer-results usage, output column meanings, and the convention of committing the summary to git after every run
  • Staging and production checklist — step-by-step with checkboxes
  • Geothermal-specific guidance — source database context, five-step recommended migration order, table-to-target mapping for priority NM_Wells tables, known data quality issues (unit inconsistency, duplicates, provenance gaps, transcription errors), tables confirmed not to migrate, and a note on the commented-out db/geothermal.py stubs
  • Future tooling — notes on a potential reconciliation dashboard in Ocotillo and automated CI/CD checks

FigJam note

A FigJam board with both schemas side by side (with source-to-target mapping arrows) is planned as a companion. A placeholder is left in the schema mapping section. The guide's Mermaid diagrams are immediately usable in the meantime.

Also includes

  • Added a reference to the guide in CLAUDE.md under Additional Resources

Documents the full migration process: what we learned from the NM_Aquifer
migration, the two types of work (Alembic vs transfer scripts), the three-tool
workflow (JIRA + field mapping tracker + oco transfer-results), migration phases,
transfer script conventions with a template, the audit CLI, staging/production
checklists, and geothermal-specific guidance grounded in the April 2026
Geothermal Discovery Report.

Includes inline Mermaid ER diagrams of the Ocotillo core schema and the
NM_Wells priority source tables. A FigJam URL placeholder is left in the
schema mapping section to be filled once the visual board is generated.
@jeremyzilar jeremyzilar marked this pull request as draft May 5, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant