Summary
Ran the CLI end-to-end against a real, published jsPsych study dataset (a mass/count language study; 3 condition folders, each with response*.csv trial data + demog*.csv demographics, exported via R). The pipeline works — it ingests, resolves columns, copies files, generates variableMeasured, and validates — but every generated project fails Psych-DS validation, for three independent reasons. Two are bugs in this repo; one is a data quirk we handle poorly.
Reproduced identically across all three condition folders.
1. time_elapsed is always seeded into variableMeasured → VARIABLE_MISSING_FROM_CSV_COLUMNS
VariablesMap's constructor unconditionally seeds trial_type, trial_index, and time_elapsed as variables (packages/metadata/src/VariablesMap.ts:116-126). This dataset's CSVs contain trial_type and trial_index as real columns but no time_elapsed column, so the validator flags time_elapsed as a variable present in variableMeasured but absent from every CSV.
This is the most impactful finding: it breaks validation for any jsPsych dataset whose exported CSVs don't include a time_elapsed column. Confirmed with an isolation run on a single clean response.csv (no other files) — it fails with exactly this one error and nothing else.
Suggested fix: only register trial_type / trial_index / time_elapsed when a column with that name is actually observed in the data, rather than seeding them in the constructor.
2. Unnamed leading column dropped silently → CSV_COLUMN_MISSING_FROM_METADATA
The R-exported files carry a row-index column, so their header starts with a bare comma (,subject_id,... → an empty-string column name). The library refuses to create a variable with an empty name and logs Name field is missing. Variable not added (repeated many times in verbose mode), leaving that column present in the CSV but missing from variableMeasured.
Suggested fix: detect unnamed/empty-header columns up front and either (a) drop the column with a single clear warning, or (b) synthesize a stable name. Either way, avoid producing a dataset that can't validate, and collapse the per-row warning spam into one message.
3. Join-key prompt fires even in fully non-interactive runs
packages/cli/src/index.ts:737 calls the interactive join-key checkbox whenever trial_index isn't unique — and the call isn't gated by isNonInteractive. Multi-subject data always restarts trial_index at 1 per subject, so it's never unique. A run with all three flags (--psych-ds-dir, --data-dir, --metadata-options) and no TTY therefore still tries to prompt and aborts with ✘ User force closed the prompt.
Suggested fix: when the run is non-interactive (no TTY / isNonInteractive), skip the prompt and fall back to a deterministic default (e.g. append a detected unique-making column like subject_id, or proceed with a warning) instead of blocking.
Repro
# A data dir with Psych-DS-compliant filenames, e.g.:
# task-resp_data.csv (jsPsych trial data: has trial_type + trial_index, no time_elapsed)
# A skeleton Psych-DS project (dataset_description.json with empty variableMeasured)
# A metadata-options.json with name/description/author
node packages/cli/dist/cjs/index.cjs \
--psych-ds-dir <project> \
--data-dir <data> \
--metadata-options <options.json> \
--verbose
Notes
Summary
Ran the CLI end-to-end against a real, published jsPsych study dataset (a mass/count language study; 3 condition folders, each with
response*.csvtrial data +demog*.csvdemographics, exported via R). The pipeline works — it ingests, resolves columns, copies files, generatesvariableMeasured, and validates — but every generated project fails Psych-DS validation, for three independent reasons. Two are bugs in this repo; one is a data quirk we handle poorly.Reproduced identically across all three condition folders.
1.
time_elapsedis always seeded intovariableMeasured→VARIABLE_MISSING_FROM_CSV_COLUMNSVariablesMap's constructor unconditionally seedstrial_type,trial_index, andtime_elapsedas variables (packages/metadata/src/VariablesMap.ts:116-126). This dataset's CSVs containtrial_typeandtrial_indexas real columns but notime_elapsedcolumn, so the validator flagstime_elapsedas a variable present invariableMeasuredbut absent from every CSV.This is the most impactful finding: it breaks validation for any jsPsych dataset whose exported CSVs don't include a
time_elapsedcolumn. Confirmed with an isolation run on a single cleanresponse.csv(no other files) — it fails with exactly this one error and nothing else.Suggested fix: only register
trial_type/trial_index/time_elapsedwhen a column with that name is actually observed in the data, rather than seeding them in the constructor.2. Unnamed leading column dropped silently →
CSV_COLUMN_MISSING_FROM_METADATAThe R-exported files carry a row-index column, so their header starts with a bare comma (
,subject_id,...→ an empty-string column name). The library refuses to create a variable with an empty name and logsName field is missing. Variable not added(repeated many times in verbose mode), leaving that column present in the CSV but missing fromvariableMeasured.Suggested fix: detect unnamed/empty-header columns up front and either (a) drop the column with a single clear warning, or (b) synthesize a stable name. Either way, avoid producing a dataset that can't validate, and collapse the per-row warning spam into one message.
3. Join-key prompt fires even in fully non-interactive runs
packages/cli/src/index.ts:737calls the interactive join-key checkbox whenevertrial_indexisn't unique — and the call isn't gated byisNonInteractive. Multi-subject data always restartstrial_indexat 1 per subject, so it's never unique. A run with all three flags (--psych-ds-dir,--data-dir,--metadata-options) and no TTY therefore still tries to prompt and aborts with✘ User force closed the prompt.Suggested fix: when the run is non-interactive (no TTY /
isNonInteractive), skip the prompt and fall back to a deterministic default (e.g. append a detected unique-making column likesubject_id, or proceed with a warning) instead of blocking.Repro
subject_idresolves uniqueness correctly).Notes
levelsvalues came through with no encoding issues.