Skip to content

Latest commit

 

History

History
737 lines (557 loc) · 19.3 KB

File metadata and controls

737 lines (557 loc) · 19.3 KB

RFC 4180 Migration Guide

Overview

This guide helps developers migrate to csvToJson v3.21.0, which introduces full RFC 4180 compliance. RFC 4180 is the standard specification for CSV (Comma-Separated Values) files.

⏱️ Estimated reading time: 15-20 minutes


Table of Contents

  1. What is RFC 4180?
  2. Breaking Changes
  3. Migration Checklist
  4. Quick Migration Examples
  5. Detailed Migration Scenarios
  6. Testing Your Migration
  7. Troubleshooting
  8. FAQ

What is RFC 4180?

RFC 4180 is the Internet Engineering Task Force (IETF) standard specification for CSV files. It defines:

Key Specifications

Aspect RFC 4180 Standard
Field Delimiter Comma (,)
Record Delimiter CRLF (\r\n) or LF (\n)
Quote Character Double-quote (")
Quoted Fields Allowed for any field
Quote Escaping Use double quotes ("")
MIME Type text/csv
Charset UTF-8 (recommended)

RFC 4180 CSV Example

firstName,lastName,email,status
"Smith, John",Smith,john@example.com,"Active, Verified"
Jane,Doe,jane@example.com,"Inactive"
"Cooper, Andy",Cooper,"andy@company.com, andy.cooper@home.com","Active"

Key Features Demonstrated:

  • Comma delimiters
  • Quoted fields containing commas
  • Escaped quotes using ""
  • Field names (header row)

Breaking Changes

1. Default Delimiter: Semicolon → Comma

The most significant breaking change

Context

  • Before: Default delimiter was semicolon (;)
  • After: Default delimiter is comma (,) - per RFC 4180 standard
  • Reason: RFC 4180 specifies comma as the standard delimiter

Impact Analysis

Scenario Impact Action Required
Using explicit .fieldDelimiter() ✅ None No changes needed
Parsing comma-delimited CSV ✅ Improved May see better results
Parsing semicolon-delimited CSV ❌ Breaking Add .fieldDelimiter(';')
No delimiter configuration ❌ Breaking Verify CSV format

Before Code (Old Behavior)

const csvToJson = require('convert-csv-to-json');

// Assuming file uses semicolons as delimiter
// data.csv:
// name;email;age
// John;john@example.com;30

let result = csvToJson.getJsonFromCsv('data.csv');
// Works - automatically used semicolon

After Code (New RFC 4180 Behavior)

const csvToJson = require('convert-csv-to-json');

// data.csv (with semicolons):
// name;email;age
// John;john@example.com;30

let result = csvToJson.getJsonFromCsv('data.csv');
// ❌ BREAKS - expects comma delimiter now
// Result: { 'name;email;age': 'John;john@example.com;30' }

// ✅ FIX: Explicitly set semicolon delimiter
let result = csvToJson.fieldDelimiter(';').getJsonFromCsv('data.csv');
// Works correctly

2. Removed Deprecated Function jsonToCsv()

This function has been completely removed

Before

csvToJson.jsonToCsv('input.csv', 'output.json');

After

csvToJson.generateJsonFileFromCsv('input.csv', 'output.json');

3. Stricter Quoted Field Parsing

RFC 4180 compliance means stricter adherence to the standard.

Quote Escaping Changes

Before (Non-compliant):

// Input CSV: name,description
//            "John","He said ""Hello"""

let result = csvToJson.supportQuotedField(true).csvStringToJson(csv);
// Result might have inconsistent quote handling

After (RFC 4180 Compliant):

// Input CSV: name,description
//            "John","He said ""Hello"""

let result = csvToJson.supportQuotedField(true).csvStringToJson(csv);
// Result correctly unescapes to: He said "Hello"

Multi-line Field Handling

Before:

  • Multi-line fields in quoted regions might not parse correctly

After:

  • Full support for multi-line fields within quoted regions
  • Newlines preserved inside quoted fields
  • Line ending detection: CRLF, LF, CR all supported
let csv = 'id,description\n' +
          '1,"Multi-line\ndescription\nhere"\n' +
          '2,"Single line"\n';

let result = csvToJson.supportQuotedField(true).csvStringToJson(csv);
// Correctly handles newlines inside quoted fields

Migration Checklist

Use this checklist to identify and fix breaking changes:

  • Audit CSV files - Identify delimiter used (,, ;, |, \t, etc.)
  • Search codebase - Find all csvToJson calls
  • Check default usage - Look for calls without .fieldDelimiter()
  • Update non-comma delimiters - Add .fieldDelimiter() calls
  • Replace deprecated functions - Change jsonToCsv() to generateJsonFileFromCsv()
  • Test quoted fields - Verify fields with quotes parse correctly
  • Test multi-line fields - Verify newlines in fields work
  • Run test suite - Execute existing tests to find issues
  • Update documentation - Document any changes to your API

Quick Migration Examples

Scenario 1: Semicolon-Delimited Files

Affected Code:

// Before (v3.20 and earlier)
const csvToJson = require('convert-csv-to-json');
let data = csvToJson.getJsonFromCsv('data.csv'); // Worked with semicolons

Migration:

// After (v3.21)
const csvToJson = require('convert-csv-to-json');
let data = csvToJson.fieldDelimiter(';').getJsonFromCsv('data.csv');

Scenario 2: Tab-Delimited Files

Affected Code:

// Before
let data = csvToJson.getJsonFromCsv('data.tsv');

Migration:

// After
let data = csvToJson.fieldDelimiter('\t').getJsonFromCsv('data.tsv');

Scenario 3: Pipe-Delimited Files

Affected Code:

// Before
let data = csvToJson.getJsonFromCsv('data.psv');

Migration:

// After
let data = csvToJson.fieldDelimiter('|').getJsonFromCsv('data.psv');

Scenario 4: Standard Comma-Delimited CSV

No Changes Needed:

// Before (already works the same)
let data = csvToJson.getJsonFromCsv('data.csv');

// After (still works the same)
let data = csvToJson.getJsonFromCsv('data.csv');
// Now works out-of-the-box without explicit delimiter!

Scenario 5: Remove Deprecated Function

Before:

csvToJson.jsonToCsv('input.csv', 'output.json');

After:

csvToJson.generateJsonFileFromCsv('input.csv', 'output.json');

Detailed Migration Scenarios

Scenario A: Express.js CSV Upload Handler

Original Code (Pre-RFC 4180):

const express = require('express');
const csvToJson = require('convert-csv-to-json');
const app = express();

app.post('/upload', (req, res) => {
  try {
    // Assumes semicolon delimiter (old default)
    let result = csvToJson.getJsonFromCsv(req.file.path);
    res.json(result);
  } catch (error) {
    res.status(400).json({ error: error.message });
  }
});

Migrated Code (RFC 4180 Compliant):

const express = require('express');
const csvToJson = require('convert-csv-to-json');
const app = express();

app.post('/upload', (req, res) => {
  try {
    // Detect delimiter or use user-provided delimiter
    const delimiter = req.body.delimiter || ','; // Default to comma (RFC 4180)
    
    let result = csvToJson
      .fieldDelimiter(delimiter)
      .supportQuotedField(true)
      .getJsonFromCsv(req.file.path);
    
    res.json(result);
  } catch (error) {
    res.status(400).json({ error: error.message });
  }
});

Scenario B: Batch CSV Processing Script

Original Code:

const csvToJson = require('convert-csv-to-json');
const fs = require('fs');
const path = require('path');

// Process all CSV files in a directory
const csvDir = './data/csv';
fs.readdirSync(csvDir).forEach(file => {
  if (file.endsWith('.csv')) {
    const csvPath = path.join(csvDir, file);
    const result = csvToJson.getJsonFromCsv(csvPath);
    console.log(`Processed ${file}: ${result.length} records`);
  }
});

Migrated Code:

const csvToJson = require('convert-csv-to-json');
const fs = require('fs');
const path = require('path');

// Configuration by file extension or name pattern
const delimiterConfig = {
  'semicolon': ';',
  'tab': '\t',
  'pipe': '|',
  'default': ','
};

function detectDelimiter(filename) {
  // Example: detect based on filename pattern
  if (filename.includes('semi')) return delimiterConfig.semicolon;
  if (filename.includes('tsv')) return delimiterConfig.tab;
  if (filename.includes('pipe')) return delimiterConfig.pipe;
  return delimiterConfig.default; // RFC 4180 default
}

// Process all CSV files in a directory
const csvDir = './data/csv';
fs.readdirSync(csvDir).forEach(file => {
  if (file.endsWith('.csv') || file.endsWith('.tsv')) {
    const csvPath = path.join(csvDir, file);
    const delimiter = detectDelimiter(file);
    
    try {
      const result = csvToJson
        .fieldDelimiter(delimiter)
        .supportQuotedField(true)
        .getJsonFromCsv(csvPath);
      console.log(`✓ Processed ${file}: ${result.length} records`);
    } catch (error) {
      console.error(`✗ Error processing ${file}: ${error.message}`);
    }
  }
});

Scenario C: Data Import with Validation

Original Code:

const csvToJson = require('convert-csv-to-json');

function importUserData(csvFile) {
  let data = csvToJson.getJsonFromCsv(csvFile);
  
  // Validate and import
  data.forEach(user => {
    if (user.email && user.name) {
      saveUser(user);
    }
  });
}

Migrated Code:

const csvToJson = require('convert-csv-to-json');

function importUserData(csvFile, options = {}) {
  const delimiter = options.delimiter || ','; // RFC 4180 default
  const hasQuotedFields = options.hasQuotedFields !== false; // Default true
  
  try {
    let data = csvToJson
      .fieldDelimiter(delimiter)
      .supportQuotedField(hasQuotedFields)
      .formatValueByType(true) // Format values by type
      .getJsonFromCsv(csvFile);
    
    // Validate and import
    const imported = [];
    const errors = [];
    
    data.forEach((user, index) => {
      if (user.email && user.name) {
        try {
          saveUser(user);
          imported.push(user.email);
        } catch (error) {
          errors.push(`Row ${index + 2}: ${error.message}`);
        }
      } else {
        errors.push(`Row ${index + 2}: Missing email or name`);
      }
    });
    
    return { imported, errors };
  } catch (error) {
    throw new Error(`CSV processing failed: ${error.message}`);
  }
}

Scenario D: Backward Compatibility Wrapper

If you need to maintain backward compatibility:

const csvToJsonLib = require('convert-csv-to-json');

// Wrapper with backward-compatible defaults
class CSVConverter {
  constructor(options = {}) {
    // Use RFC 4180 standard by default, but allow override
    this.defaultDelimiter = options.defaultDelimiter || ',';
    this.legacyMode = options.legacyMode || false;
  }
  
  getJsonFromCsv(filepath, options = {}) {
    const delimiter = options.delimiter || this.defaultDelimiter;
    
    // In legacy mode, default to semicolon
    if (this.legacyMode && !options.delimiter) {
      return csvToJsonLib
        .fieldDelimiter(';')
        .getJsonFromCsv(filepath);
    }
    
    return csvToJsonLib
      .fieldDelimiter(delimiter)
      .supportQuotedField(options.supportQuotedField !== false)
      .getJsonFromCsv(filepath);
  }
}

// Usage - RFC 4180 mode (default)
const converter = new CSVConverter();
let data = converter.getJsonFromCsv('data.csv');

// Usage - Legacy mode (backward compatible)
const legacyConverter = new CSVConverter({ legacyMode: true });
let legacyData = legacyConverter.getJsonFromCsv('legacy_data.csv');

// Usage - Custom delimiter
let tabData = converter.getJsonFromCsv('data.tsv', { delimiter: '\t' });

Testing Your Migration

Step 1: Unit Tests

Create tests to verify both old and new behavior:

const csvToJson = require('convert-csv-to-json');

describe('RFC 4180 Migration Tests', () => {
  test('should parse comma-delimited CSV (RFC 4180 default)', () => {
    const csv = 'name,age\nJohn,30\nJane,25\n';
    const result = csvToJson.csvStringToJson(csv);
    
    expect(result.length).toBe(2);
    expect(result[0].name).toBe('John');
  });
  
  test('should parse semicolon-delimited CSV with explicit delimiter', () => {
    const csv = 'name;age\nJohn;30\nJane;25\n';
    const result = csvToJson
      .fieldDelimiter(';')
      .csvStringToJson(csv);
    
    expect(result.length).toBe(2);
    expect(result[0].name).toBe('John');
  });
  
  test('should handle quoted fields per RFC 4180', () => {
    const csv = 'name,description\n"Smith, John","He said ""Hello"""\n';
    const result = csvToJson
      .supportQuotedField(true)
      .csvStringToJson(csv);
    
    expect(result[0].name).toBe('Smith, John');
    expect(result[0].description).toBe('He said "Hello"');
  });
});

Step 2: Integration Tests

Test with real CSV files:

describe('Real CSV File Integration', () => {
  test('should process standard RFC 4180 CSV', () => {
    const result = csvToJson.getJsonFromCsv('./test/data/standard.csv');
    expect(result.length).toBeGreaterThan(0);
  });
  
  test('should process legacy semicolon-delimited CSV', () => {
    const result = csvToJson
      .fieldDelimiter(';')
      .getJsonFromCsv('./test/data/legacy.csv');
    expect(result.length).toBeGreaterThan(0);
  });
});

Step 3: Validation Checklist

// Test for correct parsing
 Basic parsing works
 Field count matches headers
 Empty fields handled correctly
 Quoted fields with commas work
 Escaped quotes unescaped correctly
 Multi-line fields preserved
 Line endings detected correctly (CRLF, LF, CR)
 Special characters preserved
 Type formatting works
 Boolean/number conversion works
 Error handling for malformed CSV

Troubleshooting

Problem 1: "Headers Not Found" Error

Symptom:

// Works in v3.20, breaks in v3.21
csvToJson.getJsonFromCsv('mydata.csv');
// Error: No header row found in CSV

Cause: File likely uses semicolon delimiter (old default changed to comma)

Solution:

// Add explicit delimiter
csvToJson.fieldDelimiter(';').getJsonFromCsv('mydata.csv');

// Or inspect the file first
const fs = require('fs');
const firstLine = fs.readFileSync('mydata.csv', 'utf8').split('\n')[0];
console.log('First line:', firstLine);
// Count delimiters to detect type

Problem 2: Fields Not Splitting Correctly

Symptom:

// One field shows entire row content
[{ 'field1;field2;field3': '...value...value...value' }]

Cause: Wrong delimiter configured or auto-detection failed

Solution:

// Explicitly set the delimiter
csvToJson.fieldDelimiter(';').getJsonFromCsv('file.csv');

// Or auto-detect:
function detectDelimiter(filePath) {
  const fs = require('fs');
  const line = fs.readFileSync(filePath, 'utf8').split('\n')[0];
  
  const delimiters = [',', ';', '|', '\t'];
  for (let delim of delimiters) {
    if (line.includes(delim)) {
      return delim;
    }
  }
  return ','; // Default to RFC 4180
}

const delimiter = detectDelimiter('file.csv');
csvToJson.fieldDelimiter(delimiter).getJsonFromCsv('file.csv');

Problem 3: Quoted Fields Not Working

Symptom:

// With input: name,value
//             "John","He said ""Hi"""
let result = csvToJson.getJsonFromCsv('file.csv');
// Result shows: { name: '"John"', value: '"He said ""Hi"""' }

Cause: .supportQuotedField() not enabled

Solution:

let result = csvToJson
  .supportQuotedField(true)
  .getJsonFromCsv('file.csv');
// Now works correctly

Problem 4: jsonToCsv Function Not Found

Symptom:

csvToJson.jsonToCsv('input.csv', 'output.json');
// TypeError: csvToJson.jsonToCsv is not a function

Cause: Function was removed (it was deprecated)

Solution:

// Replace with the non-deprecated function
csvToJson.generateJsonFileFromCsv('input.csv', 'output.json');

Problem 5: Multi-line Fields Not Preserved

Symptom:

// Input has field with newlines inside quotes
// But newlines are stripped or cause parsing errors

Cause: Parsing multi-line data requires supportQuotedField(true)

Solution:

let result = csvToJson
  .supportQuotedField(true)
  .getJsonFromCsv('file.csv');
// Newlines inside quoted fields now preserved

FAQ

Q: Do I need to update my code?

A: Only if you:

  • Don't explicitly set .fieldDelimiter() and use non-comma delimiters
  • Use the removed jsonToCsv() function
  • Parse quoted CSV files without .supportQuotedField(true)

Q: Is this a breaking change?

A: Yes, the default delimiter changed from semicolon to comma. However, most code using explicit .fieldDelimiter() is unaffected.

Q: Why change the default delimiter?

A: RFC 4180 (the standard specification) defines comma as the standard delimiter. This makes the library RFC 4180 compliant out-of-the-box.

Q: Can I still use semicolons?

A: Yes, just use: .fieldDelimiter(';').getJsonFromCsv('file.csv')

Q: What about backward compatibility?

A: Existing code with explicit delimiters works without changes. Only code relying on the old default delimiter needs updates.

Q: Can I detect the delimiter automatically?

A: Yes, see the troubleshooting section for delimiter detection code.

Q: Are all my CSV files now broken?

A: Only if they use non-comma delimiters AND you were relying on the old default. Explicitly set the delimiter to fix.

Q: What's RFC 4180?

A: It's the official IETF standard for CSV file format. Read more: https://tools.ietf.org/html/rfc4180

Q: Do I need to update my tests?

A: Yes, update tests that assume the old semicolon default or test the removed jsonToCsv() function.

Q: How do I test my migration?

A: Run existing tests first to identify failures, then update code and delimiters as needed. See "Testing Your Migration" section above.

Q: What if I have legacy code I can't change?

A: You can create a wrapper class (see Scenario D) that uses the old default delimiter and maintains backward compatibility.

Q: Where can I get help?

A: Check GitHub Issues: https://github.com/iuccio/csvToJson/issues


Summary

Key Points to Remember:

  1. Default delimiter is now comma (,) - per RFC 4180 standard
  2. Explicitly set delimiters for non-comma files: .fieldDelimiter(';')
  3. Remove jsonToCsv() calls - use generateJsonFileFromCsv() instead
  4. Enable quoted field support for complex CSV: .supportQuotedField(true)
  5. Test thoroughly - especially if not using explicit delimiters
  6. Refer to RFC 4180 - for standard CSV format specification

Next Steps

  1. ✅ Review breaking changes above
  2. ✅ Audit your codebase for affected code
  3. ✅ Update code with explicit delimiters as needed
  4. ✅ Run tests to identify issues
  5. ✅ Update affected tests
  6. ✅ Deploy with confidence!

For more information, see: