Skip to content

Support configurable header row for CSV/XLSX sources #2

@SerylLns

Description

@SerylLns

Problem

Some real-world files have metadata or description rows before the actual headers. For example:

(empty row)
*required, *required, *required, Ex: 45.398792, ...
public name, private name, address, GPS latitude, ...
ADRET, ADRET, COURCHEVEL 1550, ...

Currently, Sources::Csv and Sources::Xlsx always read the first row as headers (csv_content.lines.first / sheet.simple_rows.first). There is no way to skip leading rows.

Proposal

Add a header_row option (1-based index, default: 1) configurable per target or via import config:

Target DSL:

class HousingTarget < DataPorter::Target
  sources :csv, :xlsx
  header_row 3  # skip 2 description rows
end

Or via import config (runtime):

import.config = { "header_row" => 3 }

Both Sources::Csv#headers / #fetch and Sources::Xlsx#headers / #fetch would skip header_row - 1 rows before reading headers, and parse data rows from header_row + 1 onward.

Use case

Property management platforms where operators export templates with instruction rows above the actual column headers (e.g. *required, Ex: 45.398792). Asking users to manually clean files before import adds friction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions