Background
Spun out from #72 (item 2). The other items in that issue (nested-object expansion depth and primitive array detail) have been resolved; this one remains open.
Problem
extractedArrays in JsPsychMetadata (packages/metadata/src/index.ts) is an in-memory Map<string, Array<Record<string, any>>> that accumulates every extracted row across all processed files before anything is written to disk. For large datasets this can be expensive:
- Real Tobii eye-tracking dataset (DataPipe exports, ~26–70 MB JSON, ~1,700 trials/file, 38–39 files):
tobii_data alone produced 94,198 rows per file → roughly 3.5 M rows held in memory simultaneously across a full study.
What to investigate / decide
- Is it actually a problem in practice? Profile memory usage on the full 39-file Tobii dataset. If Node's GC keeps up and peak RSS is acceptable, the current approach may be fine.
- If it is a problem: consider streaming / chunked writes — flush each file's extracted rows to disk as they're processed rather than accumulating across all files.
- Interaction with the browser frontend: the frontend holds the same data as
FileSystemFileHandle blobs; the scale concern may be more acute in the browser than in Node.
Related
Background
Spun out from #72 (item 2). The other items in that issue (nested-object expansion depth and primitive array detail) have been resolved; this one remains open.
Problem
extractedArraysinJsPsychMetadata(packages/metadata/src/index.ts) is an in-memoryMap<string, Array<Record<string, any>>>that accumulates every extracted row across all processed files before anything is written to disk. For large datasets this can be expensive:tobii_dataalone produced 94,198 rows per file → roughly 3.5 M rows held in memory simultaneously across a full study.What to investigate / decide
FileSystemFileHandleblobs; the scale concern may be more acute in the browser than in Node.Related