Skip to content

Document ETL CSV usage across the repository#94

Open
GVCUTV wants to merge 1 commit intomainfrom
codex/trace-etl-csv-usage-across-project
Open

Document ETL CSV usage across the repository#94
GVCUTV wants to merge 1 commit intomainfrom
codex/trace-etl-csv-usage-across-project

Conversation

@GVCUTV
Copy link
Copy Markdown
Owner

@GVCUTV GVCUTV commented Jan 31, 2026

Motivation

  • Provide a single, code-driven reference that maps every ETL-produced CSV to its producer, schema, semantic role, and downstream consumers.
  • Surface mismatches and fragilities between ETL outputs and simulation/validation consumers to aid auditing, validation, and refactoring.

Description

  • Add docs/analysis/etl_csv_usage_analysis.md describing ETL dataflow (download → clean/merge → enrich → fit), one section per CSV (producer, inferred schema, semantics, consumers), and auxiliary exploration outputs.
  • Enumerate direct consumers in simulation/ and validation/ (notably tickets_prs_merged.csv and fit_summary.csv), orphan/legacy CSVs, and observed schema/selection drifts (e.g., fit_summary.csv generator differences and feedback/capacity column dependencies).
  • Summarize critical dependencies, fragilities, and potential sources of inconsistency to guide next steps for alignment and testing.

Testing

  • Documentation-only change; no automated tests were executed as part of this update.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant