Skip to content

feat: Add Parquet Export Support #194

@BingqingLyu

Description

@BingqingLyu

Feature

Add Parquet file export support to NeuG's COPY TO command, enabling users to export query results to high-performance Parquet files for analytics and data lake integration.

User Scenarios

  1. Export for external analytics: Data engineers can export Cypher query results to Parquet files for analysis with Spark, DuckDB, or Presto
  2. Compression optimization: Configure compression settings (none, snappy, zlib, zstd) to balance file size and performance
  3. Large dataset export: Export millions of rows without memory issues using streaming writes

Requirements

P1: Core Parquet Export

  • Basic export with default SNAPPY compression
  • Streaming batch writes (no OOM for large datasets)
  • Type mapping (NeuG types to Arrow/Parquet types)
  • Arrow schema metadata preservation

P2: Compression and Performance Options

  • Configurable compression: none, snappy, zlib, zstd
  • Configurable row group size
  • Dictionary encoding control

P3: Complex Data Type Support

  • Vertex/edge object serialization
  • List/array property handling
  • Date and timestamp serialization

Syntax

Basic export (default SNAPPY):
COPY (MATCH (n:person) RETURN n.*) TO 'person.parquet';

With options:
COPY (MATCH (n:person) RETURN n.*) TO 'person.parquet' (compression='zstd', row_group_size=65536);

Metadata

Metadata

Assignees

Labels

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions