Skip to content

feat(processors): auto-detect delimiter for CSV/TSV/DSV files#3334

Open
pierreeurope wants to merge 1 commit intokeplergl:masterfrom
pierreeurope:fix/multi-delimiter-support-202
Open

feat(processors): auto-detect delimiter for CSV/TSV/DSV files#3334
pierreeurope wants to merge 1 commit intokeplergl:masterfrom
pierreeurope:fix/multi-delimiter-support-202

Conversation

@pierreeurope
Copy link
Copy Markdown
Contributor

Summary

Add automatic delimiter detection to processCsvData so that files using tabs, semicolons, or pipe characters as delimiters are parsed correctly without requiring any user configuration.

Problem

Kepler.gl only supports comma-separated CSV files (#202). Users with tab-separated (TSV), semicolon-separated (common in European locales), or pipe-separated files cannot load their data without first converting it.

Solution

  • Add a detectDelimiter function that examines the first line of the input and tests each supported delimiter (,, \t, ;, |) to find which one produces the most columns
  • Uses d3-dsv's dsvFormat for proper parsing of quoted fields with any delimiter
  • Falls back to comma if no delimiter produces multiple columns
  • Add .tsv and .dsv to accepted file extensions

Delimiter Detection Logic

  1. Extract the first line of the raw data
  2. For each candidate delimiter, parse the first line and count resulting columns
  3. Pick the delimiter that produces the most columns (minimum 2)
  4. Use the appropriate d3-dsv parser for that delimiter

This is a superset of PR #3313 (TSV support) and also handles semicolons and pipes.

Fixes #202
Related: #168

Add automatic delimiter detection to processCsvData so that files using
tabs, semicolons, or pipe characters as delimiters are parsed correctly
without requiring any user configuration.

The detectDelimiter function checks the first line of the input against
supported delimiters (comma, tab, semicolon, pipe) and picks the one
that produces the most columns, using d3-dsv for proper handling of
quoted fields.

Also adds .tsv and .dsv file extensions to the accepted file formats
so users can drag-and-drop these files directly.

Fixes keplergl#202
Related: keplergl#168

Signed-off-by: pierreeurope <pierre.europe@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance Kepler.gl to parse different delimiter files

1 participant