Skip to content

Support multi-character input delimiters #86

@vmvarela

Description

@vmvarela

Description

The current -d / --delimiter flag accepts only a single character. Several common real-world formats use multi-character separators (e.g. ||, ;; , <TAB><TAB>, or even fixed-width delimiters). Extending delimiter support to strings would unlock these data sources without extra preprocessing.

Example

$ cat data.psv | sql-pipe -d '||' 'SELECT * FROM t'
$ cat report.txt | sql-pipe --delimiter '  ' 'SELECT * FROM t'   # two spaces

Acceptance Criteria

  • -d / --delimiter accepts strings of 1 or more characters (current single-char still works)
  • CSV parser correctly splits fields on multi-character delimiter strings
  • Quoting rules still apply: fields containing the delimiter string must be quoted
  • --tsv remains a shorthand for --delimiter $'\t' (single char, unchanged)
  • Invalid delimiter (empty string, or > some reasonable max length e.g. 8 chars) produces a usage error
  • Documented in --help, README.md, and docs/sql-pipe.1.scd
  • Tests cover: 2-char delimiter, 3-char delimiter, quoted field containing delimiter string

Notes

  • Current parser in src/csv.zig assumes single-byte delimiter — will need refactoring
  • Quoting/escaping semantics: if field contains the multi-char delimiter, it must be quoted (RFC 4180 extended)
  • Consider KMP or simple std.mem.indexOf for delimiter scanning

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:mediumShould be done soonsize:mMedium — 4 to 8 hoursstatus:readyRefined and ready for sprint selectiontype:featureNew functionality

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions