Skip to content

Arrow-csv does not update null flag in Field on inference #9380

@realonbebeto

Description

@realonbebeto

Describe the bug

The Format infer_schema does not update the null flag (by default it's true).

To Reproduce

Run the following on any csv file and the Fields will be always null even when they are not supposed to.

use std::fs::File;
use arrow::csv::Format;

 let mut file = File::open(path)?;
 let (schema, _) = Format::default()
            .with_header(source.metadata.has_header)
            .infer_schema(&mut file, Some(source.metadata.num_rows))?;

The line of code in infer_schema defaulting to true

        // build schema from inference results
        let fields: Fields = column_types
            .iter()
            .zip(&headers)
            .map(|(inferred, field_name)| Field::new(field_name, inferred.get(), true))
            .collect();

Expected behaviour

On running on a csv, the null flag should be updated accordingly if the field inferred is actually null or not. Besides, there is a disconnect to use the null regex and the null flag always being true.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions