Skip to content

Convert the .csv files from wide format to long format #6

@goth-coder

Description

@goth-coder

Currently, in csv files like processamentoViniferas.csv, each year is a separate column, and each row represents a unique combination of id, control, cultivar.

This is quite common in older data systems or spreadsheets for human readability, but it goes against relational database principles.

However, from a relational/analytical perspective, it's problematic.

Why?

  • It's harder to query:
    You can't just filter WHERE year = 1999.

  • It's less flexible:
    Adding a new year → new column.

  • Querying ranges → requires column selection, not row filtering.

  • Harder for automation and statistical modeling

In our case, API + data processing pipeline, long format is better because

  1. We want flexible filtering, joining, aggregating.
  2. We'll move towards database ingestion.
  3. We want to write clean ETL pipelines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions