Skip to content

Memory scale for large extracted array CSVs #95

@htsukamoto5

Description

@htsukamoto5

Background

Spun out from #72 (item 2). The other items in that issue (nested-object expansion depth and primitive array detail) have been resolved; this one remains open.

Problem

extractedArrays in JsPsychMetadata (packages/metadata/src/index.ts) is an in-memory Map<string, Array<Record<string, any>>> that accumulates every extracted row across all processed files before anything is written to disk. For large datasets this can be expensive:

  • Real Tobii eye-tracking dataset (DataPipe exports, ~26–70 MB JSON, ~1,700 trials/file, 38–39 files): tobii_data alone produced 94,198 rows per file → roughly 3.5 M rows held in memory simultaneously across a full study.

What to investigate / decide

  1. Is it actually a problem in practice? Profile memory usage on the full 39-file Tobii dataset. If Node's GC keeps up and peak RSS is acceptable, the current approach may be fine.
  2. If it is a problem: consider streaming / chunked writes — flush each file's extracted rows to disk as they're processed rather than accumulating across all files.
  3. Interaction with the browser frontend: the frontend holds the same data as FileSystemFileHandle blobs; the scale concern may be more acute in the browser than in Node.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions