feat(processors): auto-detect delimiter for CSV/TSV/DSV files#3334
Open
pierreeurope wants to merge 1 commit intokeplergl:masterfrom
Open
feat(processors): auto-detect delimiter for CSV/TSV/DSV files#3334pierreeurope wants to merge 1 commit intokeplergl:masterfrom
pierreeurope wants to merge 1 commit intokeplergl:masterfrom
Conversation
Add automatic delimiter detection to processCsvData so that files using tabs, semicolons, or pipe characters as delimiters are parsed correctly without requiring any user configuration. The detectDelimiter function checks the first line of the input against supported delimiters (comma, tab, semicolon, pipe) and picks the one that produces the most columns, using d3-dsv for proper handling of quoted fields. Also adds .tsv and .dsv file extensions to the accepted file formats so users can drag-and-drop these files directly. Fixes keplergl#202 Related: keplergl#168 Signed-off-by: pierreeurope <pierre.europe@pm.me>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add automatic delimiter detection to
processCsvDataso that files using tabs, semicolons, or pipe characters as delimiters are parsed correctly without requiring any user configuration.Problem
Kepler.gl only supports comma-separated CSV files (#202). Users with tab-separated (TSV), semicolon-separated (common in European locales), or pipe-separated files cannot load their data without first converting it.
Solution
detectDelimiterfunction that examines the first line of the input and tests each supported delimiter (,,\t,;,|) to find which one produces the most columnsd3-dsv'sdsvFormatfor proper parsing of quoted fields with any delimiter.tsvand.dsvto accepted file extensionsDelimiter Detection Logic
This is a superset of PR #3313 (TSV support) and also handles semicolons and pipes.
Fixes #202
Related: #168