Skip to content

Validation fails on real jsPsych dataset: time_elapsed always seeded, unnamed columns dropped, join-key prompt not gated #109

Description

@Mandyx22

Summary

Ran the CLI end-to-end against a real, published jsPsych study dataset (a mass/count language study; 3 condition folders, each with response*.csv trial data + demog*.csv demographics, exported via R). The pipeline works — it ingests, resolves columns, copies files, generates variableMeasured, and validates — but every generated project fails Psych-DS validation, for three independent reasons. Two are bugs in this repo; one is a data quirk we handle poorly.

Reproduced identically across all three condition folders.


1. time_elapsed is always seeded into variableMeasuredVARIABLE_MISSING_FROM_CSV_COLUMNS

VariablesMap's constructor unconditionally seeds trial_type, trial_index, and time_elapsed as variables (packages/metadata/src/VariablesMap.ts:116-126). This dataset's CSVs contain trial_type and trial_index as real columns but no time_elapsed column, so the validator flags time_elapsed as a variable present in variableMeasured but absent from every CSV.

This is the most impactful finding: it breaks validation for any jsPsych dataset whose exported CSVs don't include a time_elapsed column. Confirmed with an isolation run on a single clean response.csv (no other files) — it fails with exactly this one error and nothing else.

Suggested fix: only register trial_type / trial_index / time_elapsed when a column with that name is actually observed in the data, rather than seeding them in the constructor.

2. Unnamed leading column dropped silently → CSV_COLUMN_MISSING_FROM_METADATA

The R-exported files carry a row-index column, so their header starts with a bare comma (,subject_id,... → an empty-string column name). The library refuses to create a variable with an empty name and logs Name field is missing. Variable not added (repeated many times in verbose mode), leaving that column present in the CSV but missing from variableMeasured.

Suggested fix: detect unnamed/empty-header columns up front and either (a) drop the column with a single clear warning, or (b) synthesize a stable name. Either way, avoid producing a dataset that can't validate, and collapse the per-row warning spam into one message.

3. Join-key prompt fires even in fully non-interactive runs

packages/cli/src/index.ts:737 calls the interactive join-key checkbox whenever trial_index isn't unique — and the call isn't gated by isNonInteractive. Multi-subject data always restarts trial_index at 1 per subject, so it's never unique. A run with all three flags (--psych-ds-dir, --data-dir, --metadata-options) and no TTY therefore still tries to prompt and aborts with ✘ User force closed the prompt.

Suggested fix: when the run is non-interactive (no TTY / isNonInteractive), skip the prompt and fall back to a deterministic default (e.g. append a detected unique-making column like subject_id, or proceed with a warning) instead of blocking.


Repro

# A data dir with Psych-DS-compliant filenames, e.g.:
#   task-resp_data.csv   (jsPsych trial data: has trial_type + trial_index, no time_elapsed)
# A skeleton Psych-DS project (dataset_description.json with empty variableMeasured)
# A metadata-options.json with name/description/author

node packages/cli/dist/cjs/index.cjs \
  --psych-ds-dir   <project> \
  --data-dir       <data> \
  --metadata-options <options.json> \
  --verbose

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions