Currently, in csv files like processamentoViniferas.csv, each year is a separate column, and each row represents a unique combination of id, control, cultivar.
This is quite common in older data systems or spreadsheets for human readability, but it goes against relational database principles.
However, from a relational/analytical perspective, it's problematic.
Why?
-
It's harder to query:
You can't just filter WHERE year = 1999.
-
It's less flexible:
Adding a new year → new column.
-
Querying ranges → requires column selection, not row filtering.
-
Harder for automation and statistical modeling
In our case, API + data processing pipeline, long format is better because
- We want flexible filtering, joining, aggregating.
- We'll move towards database ingestion.
- We want to write clean ETL pipelines.
Currently, in csv files like
processamentoViniferas.csv, each year is a separate column, and each row represents a unique combination of id, control, cultivar.This is quite common in older data systems or spreadsheets for human readability, but it goes against relational database principles.
However, from a relational/analytical perspective, it's problematic.
Why?
It's harder to query:
You can't just filter WHERE year = 1999.
It's less flexible:
Adding a new year → new column.
Querying ranges → requires column selection, not row filtering.
Harder for automation and statistical modeling
In our case, API + data processing pipeline, long format is better because