From fd9bfc3eaeabae623dcbdd5051c0af3d399ef605 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 19 Jun 2026 13:57:03 +0000 Subject: [PATCH] chore(release): version packages --- .changeset/boolean-no-levels.md | 5 - .changeset/cli-prompt-wording.md | 5 - .changeset/cli-psychds-ignore-raw.md | 8 - .changeset/cli-stderr-diagnostic-output.md | 5 - .changeset/cli-test-validate-psych-ds.md | 5 - .changeset/cli-tests.md | 5 - .changeset/cli-utils-data-tests.md | 5 - .changeset/csv-completeness-tests.md | 5 - .changeset/csv-parse-browser-esm.md | 5 - .changeset/deduplicate-directory-traversal.md | 5 - .changeset/deep-unnest-nested-objects.md | 5 - .changeset/description-object-to-text.md | 5 - .changeset/drop-unnamed-columns.md | 14 -- .changeset/expand-nested-json-columns.md | 6 - .../extract-object-columns-to-sidecar.md | 6 - .changeset/extract-primitive-arrays.md | 8 - .changeset/fix-always-empty-columns.md | 5 - .../fix-cli-spurious-prewrite-validation.md | 11 -- .changeset/fix-metadata-esm-entry.md | 12 -- .changeset/fix-metadata-generation-bugs.md | 18 --- .changeset/fix-mixed-type-column.md | 12 -- .changeset/fix-plugin-cache-parsing.md | 5 - .changeset/fix-validator-windows-path.md | 5 - .changeset/fix-whitespace-numeric-coercion.md | 5 - .../frontend-convert-data-to-psychds-csv.md | 13 -- .changeset/frontend-in-browser-validation.md | 5 - .changeset/json-to-csv-conversion.md | 9 -- .changeset/jsonl-ingestion.md | 17 --- .changeset/jsonl-source-record-id.md | 15 -- .changeset/lazy-system-variables.md | 7 - .changeset/metadata-shared-utils.md | 7 - .changeset/more-stress-tests.md | 12 -- .changeset/noninteractive-join-key.md | 7 - .changeset/psych-ds-validator-integration.md | 5 - .changeset/recurse-array-element-unnesting.md | 6 - .changeset/register-array-element-fields.md | 6 - .changeset/smart-rename-strategies.md | 5 - .changeset/stress-tests-in-ci.md | 13 -- .changeset/test-handlefiles.md | 5 - .changeset/unknown-descriptions-prompt.md | 5 - .changeset/unwrap-trials-wrapper.md | 8 - .changeset/validation-exit-code.md | 5 - .changeset/webgazer-description-parsing.md | 5 - packages/cli/CHANGELOG.md | 138 ++++++++++++++++++ packages/cli/package.json | 2 +- packages/frontend/CHANGELOG.md | 76 ++++++++++ packages/frontend/package.json | 4 +- packages/metadata/CHANGELOG.md | 129 ++++++++++++++++ packages/metadata/package.json | 2 +- 49 files changed, 347 insertions(+), 329 deletions(-) delete mode 100644 .changeset/boolean-no-levels.md delete mode 100644 .changeset/cli-prompt-wording.md delete mode 100644 .changeset/cli-psychds-ignore-raw.md delete mode 100644 .changeset/cli-stderr-diagnostic-output.md delete mode 100644 .changeset/cli-test-validate-psych-ds.md delete mode 100644 .changeset/cli-tests.md delete mode 100644 .changeset/cli-utils-data-tests.md delete mode 100644 .changeset/csv-completeness-tests.md delete mode 100644 .changeset/csv-parse-browser-esm.md delete mode 100644 .changeset/deduplicate-directory-traversal.md delete mode 100644 .changeset/deep-unnest-nested-objects.md delete mode 100644 .changeset/description-object-to-text.md delete mode 100644 .changeset/drop-unnamed-columns.md delete mode 100644 .changeset/expand-nested-json-columns.md delete mode 100644 .changeset/extract-object-columns-to-sidecar.md delete mode 100644 .changeset/extract-primitive-arrays.md delete mode 100644 .changeset/fix-always-empty-columns.md delete mode 100644 .changeset/fix-cli-spurious-prewrite-validation.md delete mode 100644 .changeset/fix-metadata-esm-entry.md delete mode 100644 .changeset/fix-metadata-generation-bugs.md delete mode 100644 .changeset/fix-mixed-type-column.md delete mode 100644 .changeset/fix-plugin-cache-parsing.md delete mode 100644 .changeset/fix-validator-windows-path.md delete mode 100644 .changeset/fix-whitespace-numeric-coercion.md delete mode 100644 .changeset/frontend-convert-data-to-psychds-csv.md delete mode 100644 .changeset/frontend-in-browser-validation.md delete mode 100644 .changeset/json-to-csv-conversion.md delete mode 100644 .changeset/jsonl-ingestion.md delete mode 100644 .changeset/jsonl-source-record-id.md delete mode 100644 .changeset/lazy-system-variables.md delete mode 100644 .changeset/metadata-shared-utils.md delete mode 100644 .changeset/more-stress-tests.md delete mode 100644 .changeset/noninteractive-join-key.md delete mode 100644 .changeset/psych-ds-validator-integration.md delete mode 100644 .changeset/recurse-array-element-unnesting.md delete mode 100644 .changeset/register-array-element-fields.md delete mode 100644 .changeset/smart-rename-strategies.md delete mode 100644 .changeset/stress-tests-in-ci.md delete mode 100644 .changeset/test-handlefiles.md delete mode 100644 .changeset/unknown-descriptions-prompt.md delete mode 100644 .changeset/unwrap-trials-wrapper.md delete mode 100644 .changeset/validation-exit-code.md delete mode 100644 .changeset/webgazer-description-parsing.md diff --git a/.changeset/boolean-no-levels.md b/.changeset/boolean-no-levels.md deleted file mode 100644 index 885a9ca..0000000 --- a/.changeset/boolean-no-levels.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -Boolean variables no longer record `levels`. Genuine boolean values (`typeof === "boolean"`) are typed `value:"boolean"` with no `levels`/`minValue`/`maxValue`, and string `"true"`/`"false"` values are kept as strings so they surface as `levels: ["true","false"]` (no longer coerced to boolean). A manual `value:"boolean"` override now drops any detected levels and warns when the detected values don't map cleanly to true/false (anything other than `true`/`false`/`0`/`1`). This also fixes a bug where raw booleans were pushed into the `levels` array, producing inconsistent `[false]`/empty output. diff --git a/.changeset/cli-prompt-wording.md b/.changeset/cli-prompt-wording.md deleted file mode 100644 index 05a2af4..0000000 --- a/.changeset/cli-prompt-wording.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Improve CLI prompt wording for clarity. Messages, choice labels, descriptions, and error text have been rewritten to use plain language, avoid jargon, and be more actionable for researchers unfamiliar with Psych-DS terminology. diff --git a/.changeset/cli-psychds-ignore-raw.md b/.changeset/cli-psychds-ignore-raw.md deleted file mode 100644 index 5898e1e..0000000 --- a/.changeset/cli-psychds-ignore-raw.md +++ /dev/null @@ -1,8 +0,0 @@ ---- -"@jspsych/metadata": patch -"@jspsych/metadata-cli": patch ---- - -The CLI now writes a `.psychds-ignore` at the dataset root when it preserves raw jsPsych originals under `data/raw/`, so the validator no longer flags them as `FILE_NOT_CHECKED`. This mirrors the behavior the frontend already had. - -The `.psychds-ignore` filename and content (`**/raw/` plus a self-reference, dictated by validator quirks) are now exported from `@jspsych/metadata` as `PSYCHDS_IGNORE_FILENAME` and `PSYCHDS_IGNORE_CONTENT`, so the CLI and frontend share one definition instead of duplicating the literal string. diff --git a/.changeset/cli-stderr-diagnostic-output.md b/.changeset/cli-stderr-diagnostic-output.md deleted file mode 100644 index 66d86d5..0000000 --- a/.changeset/cli-stderr-diagnostic-output.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Send Psych-DS validation errors and warnings to stderr. The failure header and error details use console.error; warning details and the verbose hint use console.warn. The success line remains on stdout. diff --git a/.changeset/cli-test-validate-psych-ds.md b/.changeset/cli-test-validate-psych-ds.md deleted file mode 100644 index 7e797df..0000000 --- a/.changeset/cli-test-validate-psych-ds.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Add unit tests for validatePsychDS covering clean pass, errors, warnings (verbose and non-verbose), plural forms, and validator-throws scenarios. diff --git a/.changeset/cli-tests.md b/.changeset/cli-tests.md deleted file mode 100644 index d16c64e..0000000 --- a/.changeset/cli-tests.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Add Jest test infrastructure and tests for the CLI package. Tests cover `utils.ts`, `validatefunctions.ts`, and `data.ts` (27 tests). Also modernizes `saveTextToPath` from a fire-and-forget callback to an `async` function returning `Promise`. diff --git a/.changeset/cli-utils-data-tests.md b/.changeset/cli-utils-data-tests.md deleted file mode 100644 index a095355..0000000 --- a/.changeset/cli-utils-data-tests.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Add unit tests for `preAnalyzeDirectory` in `data.ts`, covering unreadable directories, JSON and CSV duplicate detection, ignored files, worst-file selection, one-subdirectory-deep traversal, and custom join keys. diff --git a/.changeset/csv-completeness-tests.md b/.changeset/csv-completeness-tests.md deleted file mode 100644 index fbcfee4..0000000 --- a/.changeset/csv-completeness-tests.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -Add tests verifying variableMeasured completeness for CSV input. Covers always-empty columns, null-string columns, partially-empty columns, and sparse multi-trial-type CSVs where different trial types populate different columns. diff --git a/.changeset/csv-parse-browser-esm.md b/.changeset/csv-parse-browser-esm.md deleted file mode 100644 index 69bb961..0000000 --- a/.changeset/csv-parse-browser-esm.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -Fix stray empty-string expression in parseCSV and remove stale tsconfig paths entry for csv-parse/browser/esm (was pointing to a non-existent path in the installed csv-parse version). diff --git a/.changeset/deduplicate-directory-traversal.md b/.changeset/deduplicate-directory-traversal.md deleted file mode 100644 index 4254d38..0000000 --- a/.changeset/deduplicate-directory-traversal.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Deduplicate directory traversal logic in data.ts. Extracts a shared `collectDataFiles` helper used by `processDirectory`, `enumerateDataFiles`, and `preAnalyzeDirectory`, replacing three near-identical implementations of the top-level + one-subdir-deep walk. Behavior is preserved: `processDirectory` still sorts `dataset_description.json` first and counts directory read errors as failures. Diagnostics (the "can only read subdirectories one level deep" warning and directory-read errors) are gated behind a `warn` flag that only `processDirectory` sets, so the silent pre-passes (`enumerateDataFiles`, `preAnalyzeDirectory`) don't duplicate warnings the user already sees once on the same directory in the same run. diff --git a/.changeset/deep-unnest-nested-objects.md b/.changeset/deep-unnest-nested-objects.md deleted file mode 100644 index cbe32dd..0000000 --- a/.changeset/deep-unnest-nested-objects.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": minor ---- - -Recursively expand nested JSON objects more than one level deep. Previously `expandObjectFields` only expanded a single level, so a value like `response: {"Q0":{"score":4,"meta":{"valid":true}}}` registered `response.Q0` as an opaque `value:"object"` leaf and lost its sub-fields. Now nested plain objects are fully expanded into dotted sub-variables (`response.Q0.score`, `response.Q0.meta.valid`) with correct types and min/max/levels tracking at any depth. Arrays nested inside objects are now correctly typed as `value:"array"` instead of `"object"`, and nested arrays-of-objects are extracted into their own Psych-DS CSV files keyed by their dotted column name — mirroring how top-level array columns are handled. diff --git a/.changeset/description-object-to-text.md b/.changeset/description-object-to-text.md deleted file mode 100644 index 6c05ea8..0000000 --- a/.changeset/description-object-to-text.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -`variableMeasured.description` is now always serialized as a single schema.org Text value. When a column accumulated genuinely different descriptions from multiple plugins, `getList()` previously emitted `description` as an object (`{ pluginType: text }`), which made the Psych-DS validator raise an `OBJECT_TYPE_MISSING` warning. The distinct descriptions are now joined into one string with `" | "`. `getList()` is also idempotent now (a second call no longer mangles an already-collapsed string description), and empty descriptions collapse to `"unknown"`. diff --git a/.changeset/drop-unnamed-columns.md b/.changeset/drop-unnamed-columns.md deleted file mode 100644 index c7a270f..0000000 --- a/.changeset/drop-unnamed-columns.md +++ /dev/null @@ -1,14 +0,0 @@ ---- -"@jspsych/metadata": patch -"@jspsych/metadata-cli": patch -"frontend": patch ---- - -Drop unnamed columns so R-exported datasets validate. R's `write.csv` (with the default `row.names = TRUE`) prepends an unnamed row-index column, so the exported CSV header starts with a bare comma — an empty-string column name. Psych-DS variables require a name, so the column can never appear in `variableMeasured`; left in the on-disk CSV it fails validation with `CSV_COLUMN_MISSING_FROM_METADATA`. - -The strip now lives in the shared data-file path so the CLI and frontend behave identically: - -- `generate()` strips empty/whitespace-only columns from the parsed data up front, with a single warning instead of per-row spam (keeps `variableMeasured` clean and standalone library use safe), via a new exported `stripUnnamedColumns` helper. -- `buildPsychDSDataFiles` strips the main table before emitting it: a clean CSV keeps its exact bytes (verbatim `mainContent`), while a file with an unnamed column is re-serialised from the cleaned rows. Both the CLI (rename-plan and non-plan paths) and the frontend feed parsed `mainRows`, so the written/zipped/validated CSV always matches the metadata. - -Fixes finding #2 of #109. diff --git a/.changeset/expand-nested-json-columns.md b/.changeset/expand-nested-json-columns.md deleted file mode 100644 index 07a676b..0000000 --- a/.changeset/expand-nested-json-columns.md +++ /dev/null @@ -1,6 +0,0 @@ ---- -"@jspsych/metadata": minor -"@jspsych/metadata-cli": minor ---- - -Detect and expand JSON-serialized nested columns in `generate()`. Flat JSON objects (e.g. `response: {"Q0":4,"Q1":3}`) are expanded into dotted sub-variables (`response.Q0`, `response.Q1`) in `variableMeasured` with correct types and min/max tracking. JSON arrays of objects are extracted into separate Psych-DS compliant CSV files (`{stem}_measure-{col}_data.csv`) with `trial_index` and `element_index` as join keys. diff --git a/.changeset/extract-object-columns-to-sidecar.md b/.changeset/extract-object-columns-to-sidecar.md deleted file mode 100644 index 5d7d5f6..0000000 --- a/.changeset/extract-object-columns-to-sidecar.md +++ /dev/null @@ -1,6 +0,0 @@ ---- -"@jspsych/metadata": minor -"@jspsych/metadata-cli": minor ---- - -Extract plain (non-array) object columns into separate Psych-DS CSV files so their expanded sub-variables resolve to real columns. `expandObjectFields` registers dotted sub-variables for object columns (e.g. `response.cb_1`, `calibration_data.type`), but those names previously had no corresponding CSV column, so Psych-DS validation reported `VARIABLE_MISSING_FROM_CSV_COLUMNS` for every one. Object columns are now accumulated into a new `extractedObjects` map (exposed via `getExtractedObjects()`) as one row per trial, and the CLI writes a per-file sidecar CSV (`{stem}_measure-{col}_data.csv`) — mirroring the existing array-of-objects extraction. The row is threaded through the recursive expansion so a column is recorded for every registered descendant (leaf scalars, intermediate object nodes, and nested-array parents), and it reuses the same configurable `arrayJoinKeys` (one row per trial, no `element_index`). diff --git a/.changeset/extract-primitive-arrays.md b/.changeset/extract-primitive-arrays.md deleted file mode 100644 index 8a4088a..0000000 --- a/.changeset/extract-primitive-arrays.md +++ /dev/null @@ -1,8 +0,0 @@ ---- -"@jspsych/metadata": minor -"@jspsych/metadata-cli": minor ---- - -Extract arrays of primitives into sidecar CSVs so their elements become real, typed variables. Previously an array of numbers or strings (`block_order: [16,100,4,1]`, `images: [...]`) was recorded only as a single `value:"array"` column with no per-element detail. Such arrays are now extracted like arrays-of-objects, but — since primitives have no field name — each element is recorded under a synthetic `.value` column (distinct from the array parent, which stays `value:"array"`). The element variable gets its proper type with `minValue`/`maxValue` (numeric) or `levels` (string), joinable to its row via the existing join keys + `element_index`. This composes with the nested-array recursion (an array of arrays of numbers yields a grandchild table with a `.value` column) and completes Psych-DS round-tripping for all four cell shapes: scalar, object, array-of-objects, and array-of-primitives. - -Tradeoff: every non-empty primitive-array column now produces its own sidecar CSV, so datasets with many such columns generate substantially more files (e.g. one eye-tracking export grew from 304 to 380 data files). Extraction is the default and there is no new prompt. A future opt-in `primitiveArrayMode: "extract" | "summarize"` could offer an in-place summary alternative, but is intentionally not added here to avoid complicating the CLI flow. diff --git a/.changeset/fix-always-empty-columns.md b/.changeset/fix-always-empty-columns.md deleted file mode 100644 index 942fa45..0000000 --- a/.changeset/fix-always-empty-columns.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -Fix always-empty columns being silently dropped from variableMeasured. Columns whose values are null or empty across all rows in a dataset now appear in variableMeasured with a minimal `"value": "unknown"` entry, satisfying the Psych-DS requirement that every CSV column header has a corresponding entry. diff --git a/.changeset/fix-cli-spurious-prewrite-validation.md b/.changeset/fix-cli-spurious-prewrite-validation.md deleted file mode 100644 index 879f67a..0000000 --- a/.changeset/fix-cli-spurious-prewrite-validation.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -fix(cli): don't print a spurious validation failure for existing projects - -When opening an existing project, validation ran before the data files were -copied into the project, so it always failed with `MISSING_DATA_DIRECTORY` and -printed a misleading `✘ Psych-DS validation failed` to stderr even when the final -output was valid. Removed that pre-write call; the post-write validation that -actually gates the result is unchanged. diff --git a/.changeset/fix-metadata-esm-entry.md b/.changeset/fix-metadata-esm-entry.md deleted file mode 100644 index a31d69d..0000000 --- a/.changeset/fix-metadata-esm-entry.md +++ /dev/null @@ -1,12 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -fix(metadata): make the Node ESM entry (`dist/index.js`) loadable - -The build runs esbuild (which emits the bundled `dist/index.js`) followed by -`tsc`. With `declaration: true` and `outDir: ./dist` but no `emitDeclarationOnly`, -`tsc` re-emitted an unbundled `dist/index.js` over esbuild's bundle, leaving -extensionless relative imports (e.g. `./utils`) that Node's ESM loader rejects. -Added `emitDeclarationOnly: true` so `tsc` emits only the `.d.ts` declarations and -esbuild's working bundle survives; type-checking and `dist/index.d.ts` are unchanged. diff --git a/.changeset/fix-metadata-generation-bugs.md b/.changeset/fix-metadata-generation-bugs.md deleted file mode 100644 index 9e90894..0000000 --- a/.changeset/fix-metadata-generation-bugs.md +++ /dev/null @@ -1,18 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -fix(metadata): preserve string descriptions and primitive column types across generate() calls - -Two related bugs fixed in metadata generation: - -1. **String descriptions wiped on re-generate** — `VariablesMap.updateDescription` previously - replaced any non-object description with `{}` before merging, discarding user-written - descriptions loaded from an existing `dataset_description.json`. Non-object descriptions - are now promoted to `{ default: string }` so they survive subsequent `generate()` calls. - -2. **Mixed-type column typed as "array" instead of "string"** — When a column's rows contain - a mix of primitive values and arrays/objects (e.g. a `response` column with keyboard-trial - strings and survey-trial objects), later rows previously overwrote the column type to - `"array"`. The array-type override now only fires when the existing type is not already a - concrete primitive (`"string"`, `"number"`, or `"boolean"`). diff --git a/.changeset/fix-mixed-type-column.md b/.changeset/fix-mixed-type-column.md deleted file mode 100644 index 1753286..0000000 --- a/.changeset/fix-mixed-type-column.md +++ /dev/null @@ -1,12 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -fix(metadata): treat mixed-type columns as categorical, not numeric+categorical - -A column containing both numeric and non-numeric values previously produced -contradictory metadata: `value: "number"` alongside both `minValue`/`maxValue` -and `levels`. The fix decides at the cell level — once a non-numeric value -arrives in a column that had numeric min/max (or vice versa), the column is -downgraded to categorical: min/max fields are removed, boundary values are -preserved as string levels, and a `console.warn` is emitted once per column. diff --git a/.changeset/fix-plugin-cache-parsing.md b/.changeset/fix-plugin-cache-parsing.md deleted file mode 100644 index 19db901..0000000 --- a/.changeset/fix-plugin-cache-parsing.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -Fix `PluginCache` parsing errors for standard and custom jsPsych plugins. The data block was extracted with a lazy regex that overshot into the rest of the info object; replaced with brace-counting extraction that handles any nesting depth. Non-ok HTTP responses (e.g. 404 for unknown plugins) are now caught before reaching the parser rather than passing HTML error pages as source code. Additionally, JSDoc descriptions for parameters inside a `nested:` sub-object (e.g. `view_history`'s `page_index` and `viewing_time` in `jsPsych-instructions`) are now correctly extracted; previously the first nested parameter was silently consumed by the parent variable's regex match and never added to the cache. diff --git a/.changeset/fix-validator-windows-path.md b/.changeset/fix-validator-windows-path.md deleted file mode 100644 index ff6018a..0000000 --- a/.changeset/fix-validator-windows-path.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Fix Psych-DS validation always failing on Windows. The relative path passed to the validator contained backslashes on Windows, which the validator could not resolve — causing spurious MISSING_DATAFILE and MISSING_DATASET_DESCRIPTION errors even when the project was generated correctly. Normalize path separators to forward slashes before validation. diff --git a/.changeset/fix-whitespace-numeric-coercion.md b/.changeset/fix-whitespace-numeric-coercion.md deleted file mode 100644 index 2349d1b..0000000 --- a/.changeset/fix-whitespace-numeric-coercion.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -Fix whitespace-only string values being misdetected as numeric (#70). A cell containing only whitespace (e.g. a single space) passed the `isNaN(Number(value))` check because `Number(" ")` is `0`, but `parseFloat(" ")` is `NaN` — leaking through as `NaN` `minValue`/`maxValue` (serialized to `null`) on otherwise-categorical string columns. The numeric check now requires non-empty trimmed content and uses `Number` for both the test and the conversion so they cannot disagree. diff --git a/.changeset/frontend-convert-data-to-psychds-csv.md b/.changeset/frontend-convert-data-to-psychds-csv.md deleted file mode 100644 index 622400a..0000000 --- a/.changeset/frontend-convert-data-to-psychds-csv.md +++ /dev/null @@ -1,13 +0,0 @@ ---- -"@jspsych/metadata": minor -"frontend": minor -"@jspsych/metadata-cli": patch ---- - -Convert uploaded JSON data to Psych-DS CSV in the frontend so datasets validate instead of failing with `MISSING_DATAFILE`. - -Previously the frontend placed uploaded jsPsych JSON files into `data/` unchanged, so the in-browser validator (and the downloadable zip) always failed — Psych-DS only recognises CSV/TSV datafiles whose names match its keyword pattern. - -- `@jspsych/metadata` gains two shared, filesystem-agnostic helpers, `buildPsychDSDataFiles` and `deriveFallbackBase`, that turn a parsed data file (plus any extracted nested array/object columns) into its set of Psych-DS-named CSV outputs. Used by both the CLI and the frontend so the conversion lives in one place. -- The frontend's Data step now builds a converted `data/` payload during generation — a compliant main CSV, one sidecar per nested array/object column, and the original JSON preserved under `data/raw/` — and Review uses it for both validation and the zip. Auto-derived filenames use the official `subject` keyword (`subject-`) to avoid the unofficial-keyword warning, and a `.psychds-ignore` is emitted so the preserved `data/raw/` originals don't surface as `FILE_NOT_CHECKED`. -- The CLI's non-rename-plan conversion path now delegates to the shared `buildPsychDSDataFiles`. No behaviour change. diff --git a/.changeset/frontend-in-browser-validation.md b/.changeset/frontend-in-browser-validation.md deleted file mode 100644 index 746bda0..0000000 --- a/.changeset/frontend-in-browser-validation.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"frontend": minor ---- - -Add in-browser Psych-DS validation to the Review step. A "Validate dataset" button runs the official `psychds-validator` web bundle directly in the browser against the generated `dataset_description.json` and the uploaded data files, showing a pass/error/warning report inline instead of only pointing users to the CLI. The validator bundle is code-split and lazy-loaded on first use, and the command-line instructions remain available as a fallback. diff --git a/.changeset/json-to-csv-conversion.md b/.changeset/json-to-csv-conversion.md deleted file mode 100644 index ca96fad..0000000 --- a/.changeset/json-to-csv-conversion.md +++ /dev/null @@ -1,9 +0,0 @@ ---- -"@jspsych/metadata-cli": minor ---- - -Convert jsPsych JSON data files to CSV and normalize all generated data filenames to the Psych-DS `[keyword-value_]+data.csv` pattern, so generated projects pass the Psych-DS validator. - -- Each `.json` data file is converted to a `.csv` in `data/` (nested objects/arrays serialized as JSON strings via `objectsToCSV`, so no data is lost), with the untouched original preserved under `data/raw/`. The project scaffold creates the `data/raw/` directory, and `dataset_description.json` is left untouched. -- Output filenames follow the Psych-DS naming pattern. Already-compliant names are kept; for non-compliant ones the CLI prompts once for a keyword (official keywords offered to avoid validator warnings; custom keywords allowed), with the file's current name becoming the value (camelCased, since Psych-DS values forbid hyphens/underscores). The same normalized base names the converted/copied CSV and its extracted-array CSVs, fixing previously invalid extracted-array names. -- Same-named source files from different subdirectories are kept and disambiguated with a validator-safe counter — both the CSV in `data/` and the original in `data/raw/` — instead of being skipped or silently overwritten. Non-interactive runs fail with a clear message rather than inventing a keyword. diff --git a/.changeset/jsonl-ingestion.md b/.changeset/jsonl-ingestion.md deleted file mode 100644 index b92ce8e..0000000 --- a/.changeset/jsonl-ingestion.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -"@jspsych/metadata": patch -"@jspsych/metadata-cli": patch -"frontend": patch ---- - -Accept JSON-Lines (JSONL) experiment data, not just a single JSON array. Several jsPsych labs — and JATOS exports — write data as newline-delimited JSON, with one JSON value per line (typically one participant's full trial array per line) rather than one big array. Previously `generate()` ran `JSON.parse` on the whole string, so every such file failed with `Unexpected non-whitespace character after JSON` and produced no metadata. - -A new exported `parseJsonData` helper handles both shapes: a well-formed single document is returned unchanged (no behaviour change for existing single-array callers), and only when whole-string parsing fails does it fall back to parsing line by line, flattening any per-line arrays into one observation stream. It is now used wherever JSON data files are parsed: - -- `generate()` (the library) for the main ingestion path. -- the CLI's data-file reader, join-key pre-pass, and CSV-conversion path. -- the frontend's join-key pre-flight and Psych-DS file builder. - -The `.jsonl` file extension is now also recognised as a JSON data file (these exports are conventionally named `.jsonl`). The CLI processes `.jsonl` exactly like `.json` — including filename-normalization, raw-original preservation, and CSV conversion — and the frontend normalises a `.jsonl` upload to the JSON path. - -Verified end to end against the raw `.jsonl` exports in `vucml/online_experiments`: all 15 files now generate metadata and pass the Psych-DS validator with zero errors (they failed at parse time before). diff --git a/.changeset/jsonl-source-record-id.md b/.changeset/jsonl-source-record-id.md deleted file mode 100644 index abc77b3..0000000 --- a/.changeset/jsonl-source-record-id.md +++ /dev/null @@ -1,15 +0,0 @@ ---- -"@jspsych/metadata": patch -"@jspsych/metadata-cli": patch -"frontend": patch ---- - -Synthesize a `source_record_id` join key for multi-record JSON-Lines exports. Raw jsPsych exports carry no per-row identifier, so once JSONL is flattened (one record per line) `trial_index` repeats across records and can't uniquely key the extracted array/object sidecar CSVs — every record's trial 0 collapsed onto the same `(trial_index, element_index)` key, making the sidecars impossible to join back to a single parent trial. - -The synthesized column is named `source_record_id` rather than `participant_id` because a JSON-Lines line is only guaranteed to be one *source record* — usually, but not always, one participant. The honest name avoids overclaiming for exports where a line isn't a single subject. - -`parseJsonData` now takes an opt-in `{ tagSourceRecordId }` flag: in the JSON-Lines path it stamps each line's object rows with a 0-based `source_record_id` (a no-op on the single-array fast path), and reports via an optional `stats` out-param whether it actually synthesized the id. A line that already carries a `source_record_id` or a real `participant_id` is left untouched — the experiment's own identifier already groups those rows. `generate()` enables this for JSON input and promotes the identifier to the leading join key, preferring the synthesized `source_record_id` and falling back to a real `participant_id` already present in the export (`['source_record_id', 'trial_index']` or `['participant_id', 'trial_index']`), so the sidecars join unambiguously. CSV inputs are unaffected. - -When — and only when — the id was actually synthesized (i.e. absent from the source), it is given an explicit description that makes its synthetic origin unmistakable ("Synthetic source-record identifier … NOT a real subject ID from the experiment …") so a downstream user can't mistake it for a real subject ID; this also avoids serializing an empty `{}` description (an object with no `@type`, which trips the validator's `OBJECT_TYPE_MISSING`). The CLI's join-key pre-analysis/prompt and the frontend's pre-flight mirror this promotion so multi-record JSONL is no longer falsely flagged as having a non-unique join key. - -Verified end to end against the raw `.jsonl` exports in `vucml/online_experiments` (`block_cat`): the combined 30-record export generates metadata, passes the Psych-DS validator (0 errors), synthesizes `source_record_id` 0–29, and writes sidecars whose `(source_record_id, trial_index, element_index)` keys are fully unique — including the doubly-nested `recall_responses` case. Notably `subjectId` collides across the two merged datasets (two records share `601`), which `source_record_id` correctly keeps distinct. diff --git a/.changeset/lazy-system-variables.md b/.changeset/lazy-system-variables.md deleted file mode 100644 index cbdee36..0000000 --- a/.changeset/lazy-system-variables.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -Register jsPsych system variables (`trial_type`, `trial_index`, `time_elapsed`, `extension_type`, `extension_version`) lazily instead of seeding them in the `VariablesMap` constructor. They now appear in `variableMeasured` only when their column is actually present in the data. Previously `time_elapsed` (and the others) were always emitted, so any dataset whose CSVs omit `time_elapsed` — common for processed/aggregated jsPsych exports — failed Psych-DS validation with `VARIABLE_MISSING_FROM_CSV_COLUMNS`. Datasets that do contain these columns are unaffected. - -This also removes the eager `generateDefaultExtensionVariables()` seeding path, which registered both `extension_type` and `extension_version` whenever `extension_type` was observed — orphaning `extension_version` for any dataset that lacked that column. The extension variables now register lazily per-column like the other system variables. diff --git a/.changeset/metadata-shared-utils.md b/.changeset/metadata-shared-utils.md deleted file mode 100644 index 0e6ed38..0000000 --- a/.changeset/metadata-shared-utils.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -"@jspsych/metadata": minor ---- - -Export Psych-DS utility functions from the core package: `isValidPsychDSDataFilename`, `toPsychDSValue`, `deriveArrayFilename`, `objectsToCSV`, `disambiguateArrayFilename`. Previously these lived only in the CLI. Moving them to core makes them available to any downstream consumer (e.g. the frontend) and ensures the CLI and any future tools share a single implementation. - -The CLI now imports these functions from `@jspsych/metadata` instead of defining them locally. No behaviour change. diff --git a/.changeset/more-stress-tests.md b/.changeset/more-stress-tests.md deleted file mode 100644 index c47a272..0000000 --- a/.changeset/more-stress-tests.md +++ /dev/null @@ -1,12 +0,0 @@ ---- -"@jspsych/metadata": patch -"@jspsych/metadata-cli": patch ---- - -Extend the stress-test regression guards with three more Jest suites covering the CSV ingestion path, generation at scale, and cross-file output-name collisions. - -- `@jspsych/metadata` — `csv-input.stress`: pins how `generate(data, {}, "csv")` re-infers types from string cells (numeric coercion incl. whitespace/scientific-notation/`Infinity`/`NaN` rejection, mixed-column downgrade, `"true"`/`"false"` staying categorical, RFC-4180 quoting, unicode, empty/literal-`null` cells, the 50-char level cap, JSON-in-a-cell extraction), and asserts CSV/JSON parity for unambiguously-typed columns. -- `@jspsych/metadata` — `scale.stress`: feeds a 5,000-row dataset and checks exact numeric extremes, categorical dedup, high-cardinality level accumulation, boolean handling, and a throughput ceiling that guards against accidental O(n²) regressions. -- `@jspsych/metadata-cli` — `array-collision.stress`: two same-stem files in different subdirectories sharing a nested array column, asserting `processDirectory` disambiguates every main CSV, sidecar, and preserved raw original (no overwrites, all still Psych-DS compliant) — the cross-file collision gap left by the earlier rename suite. - -Test-only change; no library or CLI behavior is modified. diff --git a/.changeset/noninteractive-join-key.md b/.changeset/noninteractive-join-key.md deleted file mode 100644 index 67e1378..0000000 --- a/.changeset/noninteractive-join-key.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Don't block non-interactive runs on the join-key prompt. When `trial_index` isn't unique (the norm for multi-subject data, where it restarts per subject), the CLI previously always opened an interactive checkbox to pick additional join keys — even in a fully-flagged headless run (`--psych-ds-dir` + `--data-dir` + `--metadata-options`, no TTY), which aborted with `✘ User force closed the prompt`. The prompt is now gated on having a terminal; without one, join keys are resolved deterministically via `resolveJoinKeysNonInteractive` (add a sufficient single column, else a minimal sufficient combination, else proceed with a warning that extracted CSVs may contain duplicate rows). Fixes finding #3 of #109. - -Also hardens the rest of the non-interactive path so that "no terminal ⇒ never prompt" holds universally, not just when all three flags are supplied. The remaining prompts (metadata-options fallback, unknown-variable descriptions, missing-required-field loop) now gate on `canPrompt` rather than the flag-only `isNonInteractive`, so a no-TTY run that omits `--metadata-options` falls back to generated defaults with a notice instead of aborting with `✘ User force closed the prompt`. diff --git a/.changeset/psych-ds-validator-integration.md b/.changeset/psych-ds-validator-integration.md deleted file mode 100644 index df04f1a..0000000 --- a/.changeset/psych-ds-validator-integration.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": minor ---- - -Integrate psych-ds validator: the CLI now runs Psych-DS validation after loading an existing dataset and after writing the final dataset_description.json, printing a compliance summary with errors always shown and warnings shown under --verbose. diff --git a/.changeset/recurse-array-element-unnesting.md b/.changeset/recurse-array-element-unnesting.md deleted file mode 100644 index 237bdc2..0000000 --- a/.changeset/recurse-array-element-unnesting.md +++ /dev/null @@ -1,6 +0,0 @@ ---- -"@jspsych/metadata": minor -"@jspsych/metadata-cli": minor ---- - -Recursively unnest nested data inside extracted array elements. Previously an array-of-objects column was extracted one level deep, so an element field that was itself an object (`pointData.point`) or an array (`pointData.gazeSamples`) was kept as a single opaque JSON column. Now element fields recurse: a nested plain object is expanded into deeper dotted columns in the same sidecar row (`pointData.point.x`, `pointData.point.y`), and a nested array-of-objects is extracted into its own grandchild CSV (`..._measure-...GazeSamples_data.csv`). Grandchild tables remain joinable to their specific parent element via a qualified `.element_index` key carried alongside the existing join keys (e.g. `trial_index` + `validation_data.pointData.element_index` + the grandchild's own `element_index`), and every such key/column is registered in `variableMeasured`. This completes Psych-DS round-tripping for arbitrarily nested object/array data — arrays nested inside arrays inside objects now fully expand instead of bottoming out as JSON. diff --git a/.changeset/register-array-element-fields.md b/.changeset/register-array-element-fields.md deleted file mode 100644 index bc8b014..0000000 --- a/.changeset/register-array-element-fields.md +++ /dev/null @@ -1,6 +0,0 @@ ---- -"@jspsych/metadata": minor -"@jspsych/metadata-cli": minor ---- - -Register array-of-objects element fields in `variableMeasured` so extracted sidecar CSVs have no undeclared columns. Previously `accumulateArrayColumn` wrote each element's fields as bare columns (e.g. `x`, `y`) plus `element_index` into the extracted-array CSV, but never added them to `variableMeasured`, so Psych-DS validation reported `CSV_COLUMN_MISSING_FROM_METADATA`. Element fields are now emitted under dotted names (`tobii_data.x`, `validation_data.pointData.point`) — avoiding collisions between same-named fields of different array columns — and each is registered with its correct type and min/max/levels tracking. `element_index` is registered once. Object- and array-valued element fields are recorded one level deep (a single dotted JSON column, `value:"object"`/`"array"`); they are not further expanded or extracted. This is the array-side counterpart to the plain-object sidecar fix and completes Psych-DS column/variable round-tripping for nested data. diff --git a/.changeset/smart-rename-strategies.md b/.changeset/smart-rename-strategies.md deleted file mode 100644 index 61d662b..0000000 --- a/.changeset/smart-rename-strategies.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": minor ---- - -Add smart rename strategies for data files whose names don't follow the Psych-DS pattern. Instead of a single keyword prompt, the CLI now offers a strategy menu with a live old → new sample preview per option: use an identifier column found inside the data (e.g. participant_id, recommended when available), keep only the part that differs between the filenames, give the files fresh sequential names (subject-001, subject-002, …), or keep the whole old filename as the value. Every strategy ends in a full rename preview with collision detection, per-file manual editing, and the option to switch strategies before anything is written. The preview now also lists the sidecar CSVs each file will produce (one per extracted array/object column, e.g. `subject-01_measure-mouseTracking_data.csv`), and a single planner resolves every output name — mains and sidecars together — so the names shown are exactly the names written; if the data and the approved plan ever disagree the run aborts rather than writing an unapproved name. Files whose names are technically valid but use unofficial keywords (e.g. data-xyz.json, which draw a validator warning) now get an opt-in to join the rename flow instead of being silently kept. diff --git a/.changeset/stress-tests-in-ci.md b/.changeset/stress-tests-in-ci.md deleted file mode 100644 index 49251e0..0000000 --- a/.changeset/stress-tests-in-ci.md +++ /dev/null @@ -1,13 +0,0 @@ ---- -"@jspsych/metadata": patch -"@jspsych/metadata-cli": patch ---- - -Add stress-test regression guards to the automated suite so previously-fixed nested-data and filename-normalization behavior can't silently regress. - -Four Jest suites, ported from the standalone `stress-tests/` harnesses so they run under plain `npm test` (and CI) without a build step: - -- `@jspsych/metadata`: `generate()` coherence over a comprehensive nested-data fixture (deep objects, arrays of objects/arrays, mixed-type columns, a `trial_type`-less row, unicode, empties), plus the Psych-DS filename-normalization helper invariants. -- `@jspsych/metadata-cli`: the `processDirectory` conversion end-to-end (compliant main CSV, `data/raw/` preservation, two-way `variableMeasured` ↔ CSV-column cross-check, and a best-effort Psych-DS validation pass), plus the refusal to write a non-compliant filename non-interactively. - -Test-only change; no library or CLI behavior is modified. The shared fixture lives at `dev/stress/`. diff --git a/.changeset/test-handlefiles.md b/.changeset/test-handlefiles.md deleted file mode 100644 index b37a621..0000000 --- a/.changeset/test-handlefiles.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": patch ---- - -Add unit tests for createDirectoryWithStructure in handlefiles.ts. diff --git a/.changeset/unknown-descriptions-prompt.md b/.changeset/unknown-descriptions-prompt.md deleted file mode 100644 index 12d324c..0000000 --- a/.changeset/unknown-descriptions-prompt.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": minor ---- - -Add interactive prompt for unknown variable descriptions. After data processing and metadata options, the CLI now detects user-data variables whose descriptions could not be resolved from plugin source and asks whether to fill them in. Users can skip the entire step or skip individual variables by pressing Enter. diff --git a/.changeset/unwrap-trials-wrapper.md b/.changeset/unwrap-trials-wrapper.md deleted file mode 100644 index cac7a25..0000000 --- a/.changeset/unwrap-trials-wrapper.md +++ /dev/null @@ -1,8 +0,0 @@ ---- -"@jspsych/metadata": minor -"@jspsych/metadata-cli": patch ---- - -Accept jsPsych data exported as a `{ "trials": [...] }` wrapper (e.g. from OSF), not just a bare array. A new `unwrapTrials` helper (exported from `@jspsych/metadata`) unwraps the array when the input is exactly that single-key wrapper; every other JSON shape is returned unchanged, so `generate()` still throws on non-array input and the CLI/frontend still skip it. An object with sibling keys (`{ trials: [...], meta: {...} }`) is deliberately left untouched rather than silently discarding its top-level metadata. - -`unwrapTrials` is folded into `parseJsonData`'s whole-document fast path, so every data parse site — `generate()`, the CLI directory pipeline, and the frontend uploader — accepts the wrapper through the one shared parser. A wrapped file is converted to a Psych-DS data CSV (with sidecars) and its literal wrapped original is still preserved under `data/raw/`. Previously such files were silently skipped ("0 files read"). diff --git a/.changeset/validation-exit-code.md b/.changeset/validation-exit-code.md deleted file mode 100644 index 59fe865..0000000 --- a/.changeset/validation-exit-code.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata-cli": minor ---- - -Exit code 1 on validation errors; re-prompt for missing required fields; suggest missing recommended fields. diff --git a/.changeset/webgazer-description-parsing.md b/.changeset/webgazer-description-parsing.md deleted file mode 100644 index 5aa616c..0000000 --- a/.changeset/webgazer-description-parsing.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"@jspsych/metadata": patch ---- - -Strip JSDoc continuation `*` markers when parsing multi-line plugin/extension variable descriptions, so descriptions like the webgazer extension's `webgazer_data` no longer contain stray asterisks. Adds a regression test for webgazer-shaped multi-line JSDoc. diff --git a/packages/cli/CHANGELOG.md b/packages/cli/CHANGELOG.md index 9427316..3e4efe1 100644 --- a/packages/cli/CHANGELOG.md +++ b/packages/cli/CHANGELOG.md @@ -1,5 +1,143 @@ # @jspsych/metadata-cli +## 0.2.0 + +### Minor Changes + +- a5af08c: Detect and expand JSON-serialized nested columns in `generate()`. Flat JSON objects (e.g. `response: {"Q0":4,"Q1":3}`) are expanded into dotted sub-variables (`response.Q0`, `response.Q1`) in `variableMeasured` with correct types and min/max tracking. JSON arrays of objects are extracted into separate Psych-DS compliant CSV files (`{stem}_measure-{col}_data.csv`) with `trial_index` and `element_index` as join keys. +- aab8da8: Extract plain (non-array) object columns into separate Psych-DS CSV files so their expanded sub-variables resolve to real columns. `expandObjectFields` registers dotted sub-variables for object columns (e.g. `response.cb_1`, `calibration_data.type`), but those names previously had no corresponding CSV column, so Psych-DS validation reported `VARIABLE_MISSING_FROM_CSV_COLUMNS` for every one. Object columns are now accumulated into a new `extractedObjects` map (exposed via `getExtractedObjects()`) as one row per trial, and the CLI writes a per-file sidecar CSV (`{stem}_measure-{col}_data.csv`) — mirroring the existing array-of-objects extraction. The row is threaded through the recursive expansion so a column is recorded for every registered descendant (leaf scalars, intermediate object nodes, and nested-array parents), and it reuses the same configurable `arrayJoinKeys` (one row per trial, no `element_index`). +- 35de4b6: Extract arrays of primitives into sidecar CSVs so their elements become real, typed variables. Previously an array of numbers or strings (`block_order: [16,100,4,1]`, `images: [...]`) was recorded only as a single `value:"array"` column with no per-element detail. Such arrays are now extracted like arrays-of-objects, but — since primitives have no field name — each element is recorded under a synthetic `.value` column (distinct from the array parent, which stays `value:"array"`). The element variable gets its proper type with `minValue`/`maxValue` (numeric) or `levels` (string), joinable to its row via the existing join keys + `element_index`. This composes with the nested-array recursion (an array of arrays of numbers yields a grandchild table with a `.value` column) and completes Psych-DS round-tripping for all four cell shapes: scalar, object, array-of-objects, and array-of-primitives. + + Tradeoff: every non-empty primitive-array column now produces its own sidecar CSV, so datasets with many such columns generate substantially more files (e.g. one eye-tracking export grew from 304 to 380 data files). Extraction is the default and there is no new prompt. A future opt-in `primitiveArrayMode: "extract" | "summarize"` could offer an in-place summary alternative, but is intentionally not added here to avoid complicating the CLI flow. + +- 686093e: Convert jsPsych JSON data files to CSV and normalize all generated data filenames to the Psych-DS `[keyword-value_]+data.csv` pattern, so generated projects pass the Psych-DS validator. + + - Each `.json` data file is converted to a `.csv` in `data/` (nested objects/arrays serialized as JSON strings via `objectsToCSV`, so no data is lost), with the untouched original preserved under `data/raw/`. The project scaffold creates the `data/raw/` directory, and `dataset_description.json` is left untouched. + - Output filenames follow the Psych-DS naming pattern. Already-compliant names are kept; for non-compliant ones the CLI prompts once for a keyword (official keywords offered to avoid validator warnings; custom keywords allowed), with the file's current name becoming the value (camelCased, since Psych-DS values forbid hyphens/underscores). The same normalized base names the converted/copied CSV and its extracted-array CSVs, fixing previously invalid extracted-array names. + - Same-named source files from different subdirectories are kept and disambiguated with a validator-safe counter — both the CSV in `data/` and the original in `data/raw/` — instead of being skipped or silently overwritten. Non-interactive runs fail with a clear message rather than inventing a keyword. + +- 3960e63: Integrate psych-ds validator: the CLI now runs Psych-DS validation after loading an existing dataset and after writing the final dataset_description.json, printing a compliance summary with errors always shown and warnings shown under --verbose. +- d9e4485: Recursively unnest nested data inside extracted array elements. Previously an array-of-objects column was extracted one level deep, so an element field that was itself an object (`pointData.point`) or an array (`pointData.gazeSamples`) was kept as a single opaque JSON column. Now element fields recurse: a nested plain object is expanded into deeper dotted columns in the same sidecar row (`pointData.point.x`, `pointData.point.y`), and a nested array-of-objects is extracted into its own grandchild CSV (`..._measure-...GazeSamples_data.csv`). Grandchild tables remain joinable to their specific parent element via a qualified `.element_index` key carried alongside the existing join keys (e.g. `trial_index` + `validation_data.pointData.element_index` + the grandchild's own `element_index`), and every such key/column is registered in `variableMeasured`. This completes Psych-DS round-tripping for arbitrarily nested object/array data — arrays nested inside arrays inside objects now fully expand instead of bottoming out as JSON. +- 5fcce14: Register array-of-objects element fields in `variableMeasured` so extracted sidecar CSVs have no undeclared columns. Previously `accumulateArrayColumn` wrote each element's fields as bare columns (e.g. `x`, `y`) plus `element_index` into the extracted-array CSV, but never added them to `variableMeasured`, so Psych-DS validation reported `CSV_COLUMN_MISSING_FROM_METADATA`. Element fields are now emitted under dotted names (`tobii_data.x`, `validation_data.pointData.point`) — avoiding collisions between same-named fields of different array columns — and each is registered with its correct type and min/max/levels tracking. `element_index` is registered once. Object- and array-valued element fields are recorded one level deep (a single dotted JSON column, `value:"object"`/`"array"`); they are not further expanded or extracted. This is the array-side counterpart to the plain-object sidecar fix and completes Psych-DS column/variable round-tripping for nested data. +- 7921a10: Add smart rename strategies for data files whose names don't follow the Psych-DS pattern. Instead of a single keyword prompt, the CLI now offers a strategy menu with a live old → new sample preview per option: use an identifier column found inside the data (e.g. participant_id, recommended when available), keep only the part that differs between the filenames, give the files fresh sequential names (subject-001, subject-002, …), or keep the whole old filename as the value. Every strategy ends in a full rename preview with collision detection, per-file manual editing, and the option to switch strategies before anything is written. The preview now also lists the sidecar CSVs each file will produce (one per extracted array/object column, e.g. `subject-01_measure-mouseTracking_data.csv`), and a single planner resolves every output name — mains and sidecars together — so the names shown are exactly the names written; if the data and the approved plan ever disagree the run aborts rather than writing an unapproved name. Files whose names are technically valid but use unofficial keywords (e.g. data-xyz.json, which draw a validator warning) now get an opt-in to join the rename flow instead of being silently kept. +- 58ebde8: Add interactive prompt for unknown variable descriptions. After data processing and metadata options, the CLI now detects user-data variables whose descriptions could not be resolved from plugin source and asks whether to fill them in. Users can skip the entire step or skip individual variables by pressing Enter. +- 1435184: Exit code 1 on validation errors; re-prompt for missing required fields; suggest missing recommended fields. + +### Patch Changes + +- 28f1d57: Improve CLI prompt wording for clarity. Messages, choice labels, descriptions, and error text have been rewritten to use plain language, avoid jargon, and be more actionable for researchers unfamiliar with Psych-DS terminology. +- 585d337: The CLI now writes a `.psychds-ignore` at the dataset root when it preserves raw jsPsych originals under `data/raw/`, so the validator no longer flags them as `FILE_NOT_CHECKED`. This mirrors the behavior the frontend already had. + + The `.psychds-ignore` filename and content (`**/raw/` plus a self-reference, dictated by validator quirks) are now exported from `@jspsych/metadata` as `PSYCHDS_IGNORE_FILENAME` and `PSYCHDS_IGNORE_CONTENT`, so the CLI and frontend share one definition instead of duplicating the literal string. + +- 3752739: Send Psych-DS validation errors and warnings to stderr. The failure header and error details use console.error; warning details and the verbose hint use console.warn. The success line remains on stdout. +- 2706ca7: Add unit tests for validatePsychDS covering clean pass, errors, warnings (verbose and non-verbose), plural forms, and validator-throws scenarios. +- da2e8d2: Add Jest test infrastructure and tests for the CLI package. Tests cover `utils.ts`, `validatefunctions.ts`, and `data.ts` (27 tests). Also modernizes `saveTextToPath` from a fire-and-forget callback to an `async` function returning `Promise`. +- 07b78e5: Add unit tests for `preAnalyzeDirectory` in `data.ts`, covering unreadable directories, JSON and CSV duplicate detection, ignored files, worst-file selection, one-subdirectory-deep traversal, and custom join keys. +- 9e02b78: Deduplicate directory traversal logic in data.ts. Extracts a shared `collectDataFiles` helper used by `processDirectory`, `enumerateDataFiles`, and `preAnalyzeDirectory`, replacing three near-identical implementations of the top-level + one-subdir-deep walk. Behavior is preserved: `processDirectory` still sorts `dataset_description.json` first and counts directory read errors as failures. Diagnostics (the "can only read subdirectories one level deep" warning and directory-read errors) are gated behind a `warn` flag that only `processDirectory` sets, so the silent pre-passes (`enumerateDataFiles`, `preAnalyzeDirectory`) don't duplicate warnings the user already sees once on the same directory in the same run. +- 8edc7c2: Drop unnamed columns so R-exported datasets validate. R's `write.csv` (with the default `row.names = TRUE`) prepends an unnamed row-index column, so the exported CSV header starts with a bare comma — an empty-string column name. Psych-DS variables require a name, so the column can never appear in `variableMeasured`; left in the on-disk CSV it fails validation with `CSV_COLUMN_MISSING_FROM_METADATA`. + + The strip now lives in the shared data-file path so the CLI and frontend behave identically: + + - `generate()` strips empty/whitespace-only columns from the parsed data up front, with a single warning instead of per-row spam (keeps `variableMeasured` clean and standalone library use safe), via a new exported `stripUnnamedColumns` helper. + - `buildPsychDSDataFiles` strips the main table before emitting it: a clean CSV keeps its exact bytes (verbatim `mainContent`), while a file with an unnamed column is re-serialised from the cleaned rows. Both the CLI (rename-plan and non-plan paths) and the frontend feed parsed `mainRows`, so the written/zipped/validated CSV always matches the metadata. + + Fixes finding #2 of #109. + +- 06a84fb: fix(cli): don't print a spurious validation failure for existing projects + + When opening an existing project, validation ran before the data files were + copied into the project, so it always failed with `MISSING_DATA_DIRECTORY` and + printed a misleading `✘ Psych-DS validation failed` to stderr even when the final + output was valid. Removed that pre-write call; the post-write validation that + actually gates the result is unchanged. + +- a5311ba: Fix Psych-DS validation always failing on Windows. The relative path passed to the validator contained backslashes on Windows, which the validator could not resolve — causing spurious MISSING_DATAFILE and MISSING_DATASET_DESCRIPTION errors even when the project was generated correctly. Normalize path separators to forward slashes before validation. +- 585d337: Convert uploaded JSON data to Psych-DS CSV in the frontend so datasets validate instead of failing with `MISSING_DATAFILE`. + + Previously the frontend placed uploaded jsPsych JSON files into `data/` unchanged, so the in-browser validator (and the downloadable zip) always failed — Psych-DS only recognises CSV/TSV datafiles whose names match its keyword pattern. + + - `@jspsych/metadata` gains two shared, filesystem-agnostic helpers, `buildPsychDSDataFiles` and `deriveFallbackBase`, that turn a parsed data file (plus any extracted nested array/object columns) into its set of Psych-DS-named CSV outputs. Used by both the CLI and the frontend so the conversion lives in one place. + - The frontend's Data step now builds a converted `data/` payload during generation — a compliant main CSV, one sidecar per nested array/object column, and the original JSON preserved under `data/raw/` — and Review uses it for both validation and the zip. Auto-derived filenames use the official `subject` keyword (`subject-`) to avoid the unofficial-keyword warning, and a `.psychds-ignore` is emitted so the preserved `data/raw/` originals don't surface as `FILE_NOT_CHECKED`. + - The CLI's non-rename-plan conversion path now delegates to the shared `buildPsychDSDataFiles`. No behaviour change. + +- 3c7d1f7: Accept JSON-Lines (JSONL) experiment data, not just a single JSON array. Several jsPsych labs — and JATOS exports — write data as newline-delimited JSON, with one JSON value per line (typically one participant's full trial array per line) rather than one big array. Previously `generate()` ran `JSON.parse` on the whole string, so every such file failed with `Unexpected non-whitespace character after JSON` and produced no metadata. + + A new exported `parseJsonData` helper handles both shapes: a well-formed single document is returned unchanged (no behaviour change for existing single-array callers), and only when whole-string parsing fails does it fall back to parsing line by line, flattening any per-line arrays into one observation stream. It is now used wherever JSON data files are parsed: + + - `generate()` (the library) for the main ingestion path. + - the CLI's data-file reader, join-key pre-pass, and CSV-conversion path. + - the frontend's join-key pre-flight and Psych-DS file builder. + + The `.jsonl` file extension is now also recognised as a JSON data file (these exports are conventionally named `.jsonl`). The CLI processes `.jsonl` exactly like `.json` — including filename-normalization, raw-original preservation, and CSV conversion — and the frontend normalises a `.jsonl` upload to the JSON path. + + Verified end to end against the raw `.jsonl` exports in `vucml/online_experiments`: all 15 files now generate metadata and pass the Psych-DS validator with zero errors (they failed at parse time before). + +- 3c7d1f7: Synthesize a `source_record_id` join key for multi-record JSON-Lines exports. Raw jsPsych exports carry no per-row identifier, so once JSONL is flattened (one record per line) `trial_index` repeats across records and can't uniquely key the extracted array/object sidecar CSVs — every record's trial 0 collapsed onto the same `(trial_index, element_index)` key, making the sidecars impossible to join back to a single parent trial. + + The synthesized column is named `source_record_id` rather than `participant_id` because a JSON-Lines line is only guaranteed to be one _source record_ — usually, but not always, one participant. The honest name avoids overclaiming for exports where a line isn't a single subject. + + `parseJsonData` now takes an opt-in `{ tagSourceRecordId }` flag: in the JSON-Lines path it stamps each line's object rows with a 0-based `source_record_id` (a no-op on the single-array fast path), and reports via an optional `stats` out-param whether it actually synthesized the id. A line that already carries a `source_record_id` or a real `participant_id` is left untouched — the experiment's own identifier already groups those rows. `generate()` enables this for JSON input and promotes the identifier to the leading join key, preferring the synthesized `source_record_id` and falling back to a real `participant_id` already present in the export (`['source_record_id', 'trial_index']` or `['participant_id', 'trial_index']`), so the sidecars join unambiguously. CSV inputs are unaffected. + + When — and only when — the id was actually synthesized (i.e. absent from the source), it is given an explicit description that makes its synthetic origin unmistakable ("Synthetic source-record identifier … NOT a real subject ID from the experiment …") so a downstream user can't mistake it for a real subject ID; this also avoids serializing an empty `{}` description (an object with no `@type`, which trips the validator's `OBJECT_TYPE_MISSING`). The CLI's join-key pre-analysis/prompt and the frontend's pre-flight mirror this promotion so multi-record JSONL is no longer falsely flagged as having a non-unique join key. + + Verified end to end against the raw `.jsonl` exports in `vucml/online_experiments` (`block_cat`): the combined 30-record export generates metadata, passes the Psych-DS validator (0 errors), synthesizes `source_record_id` 0–29, and writes sidecars whose `(source_record_id, trial_index, element_index)` keys are fully unique — including the doubly-nested `recall_responses` case. Notably `subjectId` collides across the two merged datasets (two records share `601`), which `source_record_id` correctly keeps distinct. + +- ca8dc75: Extend the stress-test regression guards with three more Jest suites covering the CSV ingestion path, generation at scale, and cross-file output-name collisions. + + - `@jspsych/metadata` — `csv-input.stress`: pins how `generate(data, {}, "csv")` re-infers types from string cells (numeric coercion incl. whitespace/scientific-notation/`Infinity`/`NaN` rejection, mixed-column downgrade, `"true"`/`"false"` staying categorical, RFC-4180 quoting, unicode, empty/literal-`null` cells, the 50-char level cap, JSON-in-a-cell extraction), and asserts CSV/JSON parity for unambiguously-typed columns. + - `@jspsych/metadata` — `scale.stress`: feeds a 5,000-row dataset and checks exact numeric extremes, categorical dedup, high-cardinality level accumulation, boolean handling, and a throughput ceiling that guards against accidental O(n²) regressions. + - `@jspsych/metadata-cli` — `array-collision.stress`: two same-stem files in different subdirectories sharing a nested array column, asserting `processDirectory` disambiguates every main CSV, sidecar, and preserved raw original (no overwrites, all still Psych-DS compliant) — the cross-file collision gap left by the earlier rename suite. + + Test-only change; no library or CLI behavior is modified. + +- 5fcd392: Don't block non-interactive runs on the join-key prompt. When `trial_index` isn't unique (the norm for multi-subject data, where it restarts per subject), the CLI previously always opened an interactive checkbox to pick additional join keys — even in a fully-flagged headless run (`--psych-ds-dir` + `--data-dir` + `--metadata-options`, no TTY), which aborted with `✘ User force closed the prompt`. The prompt is now gated on having a terminal; without one, join keys are resolved deterministically via `resolveJoinKeysNonInteractive` (add a sufficient single column, else a minimal sufficient combination, else proceed with a warning that extracted CSVs may contain duplicate rows). Fixes finding #3 of #109. + + Also hardens the rest of the non-interactive path so that "no terminal ⇒ never prompt" holds universally, not just when all three flags are supplied. The remaining prompts (metadata-options fallback, unknown-variable descriptions, missing-required-field loop) now gate on `canPrompt` rather than the flag-only `isNonInteractive`, so a no-TTY run that omits `--metadata-options` falls back to generated defaults with a notice instead of aborting with `✘ User force closed the prompt`. + +- fa17a9e: Add stress-test regression guards to the automated suite so previously-fixed nested-data and filename-normalization behavior can't silently regress. + + Four Jest suites, ported from the standalone `stress-tests/` harnesses so they run under plain `npm test` (and CI) without a build step: + + - `@jspsych/metadata`: `generate()` coherence over a comprehensive nested-data fixture (deep objects, arrays of objects/arrays, mixed-type columns, a `trial_type`-less row, unicode, empties), plus the Psych-DS filename-normalization helper invariants. + - `@jspsych/metadata-cli`: the `processDirectory` conversion end-to-end (compliant main CSV, `data/raw/` preservation, two-way `variableMeasured` ↔ CSV-column cross-check, and a best-effort Psych-DS validation pass), plus the refusal to write a non-compliant filename non-interactively. + + Test-only change; no library or CLI behavior is modified. The shared fixture lives at `dev/stress/`. + +- 4fa760d: Add unit tests for createDirectoryWithStructure in handlefiles.ts. +- 31c5ba9: Accept jsPsych data exported as a `{ "trials": [...] }` wrapper (e.g. from OSF), not just a bare array. A new `unwrapTrials` helper (exported from `@jspsych/metadata`) unwraps the array when the input is exactly that single-key wrapper; every other JSON shape is returned unchanged, so `generate()` still throws on non-array input and the CLI/frontend still skip it. An object with sibling keys (`{ trials: [...], meta: {...} }`) is deliberately left untouched rather than silently discarding its top-level metadata. + + `unwrapTrials` is folded into `parseJsonData`'s whole-document fast path, so every data parse site — `generate()`, the CLI directory pipeline, and the frontend uploader — accepts the wrapper through the one shared parser. A wrapped file is converted to a Psych-DS data CSV (with sidecars) and its literal wrapped original is still preserved under `data/raw/`. Previously such files were silently skipped ("0 files read"). + +- Updated dependencies [8731c30] +- Updated dependencies [585d337] +- Updated dependencies [f96e1e6] +- Updated dependencies [ed9c25c] +- Updated dependencies [0f4cc4a] +- Updated dependencies [1511d20] +- Updated dependencies [8edc7c2] +- Updated dependencies [a5af08c] +- Updated dependencies [aab8da8] +- Updated dependencies [35de4b6] +- Updated dependencies [e80e57c] +- Updated dependencies [06a84fb] +- Updated dependencies [03a3ce4] +- Updated dependencies [ae0d01c] +- Updated dependencies [c2426be] +- Updated dependencies [e1cb44e] +- Updated dependencies [585d337] +- Updated dependencies [3c7d1f7] +- Updated dependencies [3c7d1f7] +- Updated dependencies [72f8a4b] +- Updated dependencies [6b0d1d4] +- Updated dependencies [ca8dc75] +- Updated dependencies [d9e4485] +- Updated dependencies [5fcce14] +- Updated dependencies [fa17a9e] +- Updated dependencies [31c5ba9] +- Updated dependencies [55f2f91] + - @jspsych/metadata@0.1.0 + ## 0.1.1 ### Patch Changes diff --git a/packages/cli/package.json b/packages/cli/package.json index 613a909..3e05518 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -1,6 +1,6 @@ { "name": "@jspsych/metadata-cli", - "version": "0.1.1", + "version": "0.2.0", "description": "This directory contains tools for interacting and generating CLI through the terminal.", "main": "dist/cjs/index.cjs", "author": "Victor Zhang", diff --git a/packages/frontend/CHANGELOG.md b/packages/frontend/CHANGELOG.md index 23c73c7..21d9092 100644 --- a/packages/frontend/CHANGELOG.md +++ b/packages/frontend/CHANGELOG.md @@ -1,5 +1,81 @@ # frontend +## 0.1.0 + +### Minor Changes + +- 585d337: Convert uploaded JSON data to Psych-DS CSV in the frontend so datasets validate instead of failing with `MISSING_DATAFILE`. + + Previously the frontend placed uploaded jsPsych JSON files into `data/` unchanged, so the in-browser validator (and the downloadable zip) always failed — Psych-DS only recognises CSV/TSV datafiles whose names match its keyword pattern. + + - `@jspsych/metadata` gains two shared, filesystem-agnostic helpers, `buildPsychDSDataFiles` and `deriveFallbackBase`, that turn a parsed data file (plus any extracted nested array/object columns) into its set of Psych-DS-named CSV outputs. Used by both the CLI and the frontend so the conversion lives in one place. + - The frontend's Data step now builds a converted `data/` payload during generation — a compliant main CSV, one sidecar per nested array/object column, and the original JSON preserved under `data/raw/` — and Review uses it for both validation and the zip. Auto-derived filenames use the official `subject` keyword (`subject-`) to avoid the unofficial-keyword warning, and a `.psychds-ignore` is emitted so the preserved `data/raw/` originals don't surface as `FILE_NOT_CHECKED`. + - The CLI's non-rename-plan conversion path now delegates to the shared `buildPsychDSDataFiles`. No behaviour change. + +- 03a3ce4: Add in-browser Psych-DS validation to the Review step. A "Validate dataset" button runs the official `psychds-validator` web bundle directly in the browser against the generated `dataset_description.json` and the uploaded data files, showing a pass/error/warning report inline instead of only pointing users to the CLI. The validator bundle is code-split and lazy-loaded on first use, and the command-line instructions remain available as a fallback. + +### Patch Changes + +- 8edc7c2: Drop unnamed columns so R-exported datasets validate. R's `write.csv` (with the default `row.names = TRUE`) prepends an unnamed row-index column, so the exported CSV header starts with a bare comma — an empty-string column name. Psych-DS variables require a name, so the column can never appear in `variableMeasured`; left in the on-disk CSV it fails validation with `CSV_COLUMN_MISSING_FROM_METADATA`. + + The strip now lives in the shared data-file path so the CLI and frontend behave identically: + + - `generate()` strips empty/whitespace-only columns from the parsed data up front, with a single warning instead of per-row spam (keeps `variableMeasured` clean and standalone library use safe), via a new exported `stripUnnamedColumns` helper. + - `buildPsychDSDataFiles` strips the main table before emitting it: a clean CSV keeps its exact bytes (verbatim `mainContent`), while a file with an unnamed column is re-serialised from the cleaned rows. Both the CLI (rename-plan and non-plan paths) and the frontend feed parsed `mainRows`, so the written/zipped/validated CSV always matches the metadata. + + Fixes finding #2 of #109. + +- 3c7d1f7: Accept JSON-Lines (JSONL) experiment data, not just a single JSON array. Several jsPsych labs — and JATOS exports — write data as newline-delimited JSON, with one JSON value per line (typically one participant's full trial array per line) rather than one big array. Previously `generate()` ran `JSON.parse` on the whole string, so every such file failed with `Unexpected non-whitespace character after JSON` and produced no metadata. + + A new exported `parseJsonData` helper handles both shapes: a well-formed single document is returned unchanged (no behaviour change for existing single-array callers), and only when whole-string parsing fails does it fall back to parsing line by line, flattening any per-line arrays into one observation stream. It is now used wherever JSON data files are parsed: + + - `generate()` (the library) for the main ingestion path. + - the CLI's data-file reader, join-key pre-pass, and CSV-conversion path. + - the frontend's join-key pre-flight and Psych-DS file builder. + + The `.jsonl` file extension is now also recognised as a JSON data file (these exports are conventionally named `.jsonl`). The CLI processes `.jsonl` exactly like `.json` — including filename-normalization, raw-original preservation, and CSV conversion — and the frontend normalises a `.jsonl` upload to the JSON path. + + Verified end to end against the raw `.jsonl` exports in `vucml/online_experiments`: all 15 files now generate metadata and pass the Psych-DS validator with zero errors (they failed at parse time before). + +- 3c7d1f7: Synthesize a `source_record_id` join key for multi-record JSON-Lines exports. Raw jsPsych exports carry no per-row identifier, so once JSONL is flattened (one record per line) `trial_index` repeats across records and can't uniquely key the extracted array/object sidecar CSVs — every record's trial 0 collapsed onto the same `(trial_index, element_index)` key, making the sidecars impossible to join back to a single parent trial. + + The synthesized column is named `source_record_id` rather than `participant_id` because a JSON-Lines line is only guaranteed to be one _source record_ — usually, but not always, one participant. The honest name avoids overclaiming for exports where a line isn't a single subject. + + `parseJsonData` now takes an opt-in `{ tagSourceRecordId }` flag: in the JSON-Lines path it stamps each line's object rows with a 0-based `source_record_id` (a no-op on the single-array fast path), and reports via an optional `stats` out-param whether it actually synthesized the id. A line that already carries a `source_record_id` or a real `participant_id` is left untouched — the experiment's own identifier already groups those rows. `generate()` enables this for JSON input and promotes the identifier to the leading join key, preferring the synthesized `source_record_id` and falling back to a real `participant_id` already present in the export (`['source_record_id', 'trial_index']` or `['participant_id', 'trial_index']`), so the sidecars join unambiguously. CSV inputs are unaffected. + + When — and only when — the id was actually synthesized (i.e. absent from the source), it is given an explicit description that makes its synthetic origin unmistakable ("Synthetic source-record identifier … NOT a real subject ID from the experiment …") so a downstream user can't mistake it for a real subject ID; this also avoids serializing an empty `{}` description (an object with no `@type`, which trips the validator's `OBJECT_TYPE_MISSING`). The CLI's join-key pre-analysis/prompt and the frontend's pre-flight mirror this promotion so multi-record JSONL is no longer falsely flagged as having a non-unique join key. + + Verified end to end against the raw `.jsonl` exports in `vucml/online_experiments` (`block_cat`): the combined 30-record export generates metadata, passes the Psych-DS validator (0 errors), synthesizes `source_record_id` 0–29, and writes sidecars whose `(source_record_id, trial_index, element_index)` keys are fully unique — including the doubly-nested `recall_responses` case. Notably `subjectId` collides across the two merged datasets (two records share `601`), which `source_record_id` correctly keeps distinct. + +- Updated dependencies [8731c30] +- Updated dependencies [585d337] +- Updated dependencies [f96e1e6] +- Updated dependencies [ed9c25c] +- Updated dependencies [0f4cc4a] +- Updated dependencies [1511d20] +- Updated dependencies [8edc7c2] +- Updated dependencies [a5af08c] +- Updated dependencies [aab8da8] +- Updated dependencies [35de4b6] +- Updated dependencies [e80e57c] +- Updated dependencies [06a84fb] +- Updated dependencies [03a3ce4] +- Updated dependencies [ae0d01c] +- Updated dependencies [c2426be] +- Updated dependencies [e1cb44e] +- Updated dependencies [585d337] +- Updated dependencies [3c7d1f7] +- Updated dependencies [3c7d1f7] +- Updated dependencies [72f8a4b] +- Updated dependencies [6b0d1d4] +- Updated dependencies [ca8dc75] +- Updated dependencies [d9e4485] +- Updated dependencies [5fcce14] +- Updated dependencies [fa17a9e] +- Updated dependencies [31c5ba9] +- Updated dependencies [55f2f91] + - @jspsych/metadata@0.1.0 + ## 0.0.2 ### Patch Changes diff --git a/packages/frontend/package.json b/packages/frontend/package.json index d101141..a47114d 100644 --- a/packages/frontend/package.json +++ b/packages/frontend/package.json @@ -1,7 +1,7 @@ { "name": "frontend", "private": true, - "version": "0.0.2", + "version": "0.1.0", "type": "module", "scripts": { "dev": "vite", @@ -11,7 +11,7 @@ "test": "jest" }, "dependencies": { - "@jspsych/metadata": "^0.0.3", + "@jspsych/metadata": "^0.1.0", "jsonld": "^8.3.2", "jszip": "^3.10.1", "psychds-validator": "^1.5.0", diff --git a/packages/metadata/CHANGELOG.md b/packages/metadata/CHANGELOG.md index 83ecae2..629077c 100644 --- a/packages/metadata/CHANGELOG.md +++ b/packages/metadata/CHANGELOG.md @@ -1,5 +1,134 @@ # @jspsych/metadata +## 0.1.0 + +### Minor Changes + +- 0f4cc4a: Recursively expand nested JSON objects more than one level deep. Previously `expandObjectFields` only expanded a single level, so a value like `response: {"Q0":{"score":4,"meta":{"valid":true}}}` registered `response.Q0` as an opaque `value:"object"` leaf and lost its sub-fields. Now nested plain objects are fully expanded into dotted sub-variables (`response.Q0.score`, `response.Q0.meta.valid`) with correct types and min/max/levels tracking at any depth. Arrays nested inside objects are now correctly typed as `value:"array"` instead of `"object"`, and nested arrays-of-objects are extracted into their own Psych-DS CSV files keyed by their dotted column name — mirroring how top-level array columns are handled. +- a5af08c: Detect and expand JSON-serialized nested columns in `generate()`. Flat JSON objects (e.g. `response: {"Q0":4,"Q1":3}`) are expanded into dotted sub-variables (`response.Q0`, `response.Q1`) in `variableMeasured` with correct types and min/max tracking. JSON arrays of objects are extracted into separate Psych-DS compliant CSV files (`{stem}_measure-{col}_data.csv`) with `trial_index` and `element_index` as join keys. +- aab8da8: Extract plain (non-array) object columns into separate Psych-DS CSV files so their expanded sub-variables resolve to real columns. `expandObjectFields` registers dotted sub-variables for object columns (e.g. `response.cb_1`, `calibration_data.type`), but those names previously had no corresponding CSV column, so Psych-DS validation reported `VARIABLE_MISSING_FROM_CSV_COLUMNS` for every one. Object columns are now accumulated into a new `extractedObjects` map (exposed via `getExtractedObjects()`) as one row per trial, and the CLI writes a per-file sidecar CSV (`{stem}_measure-{col}_data.csv`) — mirroring the existing array-of-objects extraction. The row is threaded through the recursive expansion so a column is recorded for every registered descendant (leaf scalars, intermediate object nodes, and nested-array parents), and it reuses the same configurable `arrayJoinKeys` (one row per trial, no `element_index`). +- 35de4b6: Extract arrays of primitives into sidecar CSVs so their elements become real, typed variables. Previously an array of numbers or strings (`block_order: [16,100,4,1]`, `images: [...]`) was recorded only as a single `value:"array"` column with no per-element detail. Such arrays are now extracted like arrays-of-objects, but — since primitives have no field name — each element is recorded under a synthetic `.value` column (distinct from the array parent, which stays `value:"array"`). The element variable gets its proper type with `minValue`/`maxValue` (numeric) or `levels` (string), joinable to its row via the existing join keys + `element_index`. This composes with the nested-array recursion (an array of arrays of numbers yields a grandchild table with a `.value` column) and completes Psych-DS round-tripping for all four cell shapes: scalar, object, array-of-objects, and array-of-primitives. + + Tradeoff: every non-empty primitive-array column now produces its own sidecar CSV, so datasets with many such columns generate substantially more files (e.g. one eye-tracking export grew from 304 to 380 data files). Extraction is the default and there is no new prompt. A future opt-in `primitiveArrayMode: "extract" | "summarize"` could offer an in-place summary alternative, but is intentionally not added here to avoid complicating the CLI flow. + +- 585d337: Convert uploaded JSON data to Psych-DS CSV in the frontend so datasets validate instead of failing with `MISSING_DATAFILE`. + + Previously the frontend placed uploaded jsPsych JSON files into `data/` unchanged, so the in-browser validator (and the downloadable zip) always failed — Psych-DS only recognises CSV/TSV datafiles whose names match its keyword pattern. + + - `@jspsych/metadata` gains two shared, filesystem-agnostic helpers, `buildPsychDSDataFiles` and `deriveFallbackBase`, that turn a parsed data file (plus any extracted nested array/object columns) into its set of Psych-DS-named CSV outputs. Used by both the CLI and the frontend so the conversion lives in one place. + - The frontend's Data step now builds a converted `data/` payload during generation — a compliant main CSV, one sidecar per nested array/object column, and the original JSON preserved under `data/raw/` — and Review uses it for both validation and the zip. Auto-derived filenames use the official `subject` keyword (`subject-`) to avoid the unofficial-keyword warning, and a `.psychds-ignore` is emitted so the preserved `data/raw/` originals don't surface as `FILE_NOT_CHECKED`. + - The CLI's non-rename-plan conversion path now delegates to the shared `buildPsychDSDataFiles`. No behaviour change. + +- 6b0d1d4: Export Psych-DS utility functions from the core package: `isValidPsychDSDataFilename`, `toPsychDSValue`, `deriveArrayFilename`, `objectsToCSV`, `disambiguateArrayFilename`. Previously these lived only in the CLI. Moving them to core makes them available to any downstream consumer (e.g. the frontend) and ensures the CLI and any future tools share a single implementation. + + The CLI now imports these functions from `@jspsych/metadata` instead of defining them locally. No behaviour change. + +- d9e4485: Recursively unnest nested data inside extracted array elements. Previously an array-of-objects column was extracted one level deep, so an element field that was itself an object (`pointData.point`) or an array (`pointData.gazeSamples`) was kept as a single opaque JSON column. Now element fields recurse: a nested plain object is expanded into deeper dotted columns in the same sidecar row (`pointData.point.x`, `pointData.point.y`), and a nested array-of-objects is extracted into its own grandchild CSV (`..._measure-...GazeSamples_data.csv`). Grandchild tables remain joinable to their specific parent element via a qualified `.element_index` key carried alongside the existing join keys (e.g. `trial_index` + `validation_data.pointData.element_index` + the grandchild's own `element_index`), and every such key/column is registered in `variableMeasured`. This completes Psych-DS round-tripping for arbitrarily nested object/array data — arrays nested inside arrays inside objects now fully expand instead of bottoming out as JSON. +- 5fcce14: Register array-of-objects element fields in `variableMeasured` so extracted sidecar CSVs have no undeclared columns. Previously `accumulateArrayColumn` wrote each element's fields as bare columns (e.g. `x`, `y`) plus `element_index` into the extracted-array CSV, but never added them to `variableMeasured`, so Psych-DS validation reported `CSV_COLUMN_MISSING_FROM_METADATA`. Element fields are now emitted under dotted names (`tobii_data.x`, `validation_data.pointData.point`) — avoiding collisions between same-named fields of different array columns — and each is registered with its correct type and min/max/levels tracking. `element_index` is registered once. Object- and array-valued element fields are recorded one level deep (a single dotted JSON column, `value:"object"`/`"array"`); they are not further expanded or extracted. This is the array-side counterpart to the plain-object sidecar fix and completes Psych-DS column/variable round-tripping for nested data. +- 31c5ba9: Accept jsPsych data exported as a `{ "trials": [...] }` wrapper (e.g. from OSF), not just a bare array. A new `unwrapTrials` helper (exported from `@jspsych/metadata`) unwraps the array when the input is exactly that single-key wrapper; every other JSON shape is returned unchanged, so `generate()` still throws on non-array input and the CLI/frontend still skip it. An object with sibling keys (`{ trials: [...], meta: {...} }`) is deliberately left untouched rather than silently discarding its top-level metadata. + + `unwrapTrials` is folded into `parseJsonData`'s whole-document fast path, so every data parse site — `generate()`, the CLI directory pipeline, and the frontend uploader — accepts the wrapper through the one shared parser. A wrapped file is converted to a Psych-DS data CSV (with sidecars) and its literal wrapped original is still preserved under `data/raw/`. Previously such files were silently skipped ("0 files read"). + +### Patch Changes + +- 8731c30: Boolean variables no longer record `levels`. Genuine boolean values (`typeof === "boolean"`) are typed `value:"boolean"` with no `levels`/`minValue`/`maxValue`, and string `"true"`/`"false"` values are kept as strings so they surface as `levels: ["true","false"]` (no longer coerced to boolean). A manual `value:"boolean"` override now drops any detected levels and warns when the detected values don't map cleanly to true/false (anything other than `true`/`false`/`0`/`1`). This also fixes a bug where raw booleans were pushed into the `levels` array, producing inconsistent `[false]`/empty output. +- 585d337: The CLI now writes a `.psychds-ignore` at the dataset root when it preserves raw jsPsych originals under `data/raw/`, so the validator no longer flags them as `FILE_NOT_CHECKED`. This mirrors the behavior the frontend already had. + + The `.psychds-ignore` filename and content (`**/raw/` plus a self-reference, dictated by validator quirks) are now exported from `@jspsych/metadata` as `PSYCHDS_IGNORE_FILENAME` and `PSYCHDS_IGNORE_CONTENT`, so the CLI and frontend share one definition instead of duplicating the literal string. + +- f96e1e6: Add tests verifying variableMeasured completeness for CSV input. Covers always-empty columns, null-string columns, partially-empty columns, and sparse multi-trial-type CSVs where different trial types populate different columns. +- ed9c25c: Fix stray empty-string expression in parseCSV and remove stale tsconfig paths entry for csv-parse/browser/esm (was pointing to a non-existent path in the installed csv-parse version). +- 1511d20: `variableMeasured.description` is now always serialized as a single schema.org Text value. When a column accumulated genuinely different descriptions from multiple plugins, `getList()` previously emitted `description` as an object (`{ pluginType: text }`), which made the Psych-DS validator raise an `OBJECT_TYPE_MISSING` warning. The distinct descriptions are now joined into one string with `" | "`. `getList()` is also idempotent now (a second call no longer mangles an already-collapsed string description), and empty descriptions collapse to `"unknown"`. +- 8edc7c2: Drop unnamed columns so R-exported datasets validate. R's `write.csv` (with the default `row.names = TRUE`) prepends an unnamed row-index column, so the exported CSV header starts with a bare comma — an empty-string column name. Psych-DS variables require a name, so the column can never appear in `variableMeasured`; left in the on-disk CSV it fails validation with `CSV_COLUMN_MISSING_FROM_METADATA`. + + The strip now lives in the shared data-file path so the CLI and frontend behave identically: + + - `generate()` strips empty/whitespace-only columns from the parsed data up front, with a single warning instead of per-row spam (keeps `variableMeasured` clean and standalone library use safe), via a new exported `stripUnnamedColumns` helper. + - `buildPsychDSDataFiles` strips the main table before emitting it: a clean CSV keeps its exact bytes (verbatim `mainContent`), while a file with an unnamed column is re-serialised from the cleaned rows. Both the CLI (rename-plan and non-plan paths) and the frontend feed parsed `mainRows`, so the written/zipped/validated CSV always matches the metadata. + + Fixes finding #2 of #109. + +- e80e57c: Fix always-empty columns being silently dropped from variableMeasured. Columns whose values are null or empty across all rows in a dataset now appear in variableMeasured with a minimal `"value": "unknown"` entry, satisfying the Psych-DS requirement that every CSV column header has a corresponding entry. +- 06a84fb: fix(metadata): make the Node ESM entry (`dist/index.js`) loadable + + The build runs esbuild (which emits the bundled `dist/index.js`) followed by + `tsc`. With `declaration: true` and `outDir: ./dist` but no `emitDeclarationOnly`, + `tsc` re-emitted an unbundled `dist/index.js` over esbuild's bundle, leaving + extensionless relative imports (e.g. `./utils`) that Node's ESM loader rejects. + Added `emitDeclarationOnly: true` so `tsc` emits only the `.d.ts` declarations and + esbuild's working bundle survives; type-checking and `dist/index.d.ts` are unchanged. + +- 03a3ce4: fix(metadata): preserve string descriptions and primitive column types across generate() calls + + Two related bugs fixed in metadata generation: + + 1. **String descriptions wiped on re-generate** — `VariablesMap.updateDescription` previously + replaced any non-object description with `{}` before merging, discarding user-written + descriptions loaded from an existing `dataset_description.json`. Non-object descriptions + are now promoted to `{ default: string }` so they survive subsequent `generate()` calls. + + 2. **Mixed-type column typed as "array" instead of "string"** — When a column's rows contain + a mix of primitive values and arrays/objects (e.g. a `response` column with keyboard-trial + strings and survey-trial objects), later rows previously overwrote the column type to + `"array"`. The array-type override now only fires when the existing type is not already a + concrete primitive (`"string"`, `"number"`, or `"boolean"`). + +- ae0d01c: fix(metadata): treat mixed-type columns as categorical, not numeric+categorical + + A column containing both numeric and non-numeric values previously produced + contradictory metadata: `value: "number"` alongside both `minValue`/`maxValue` + and `levels`. The fix decides at the cell level — once a non-numeric value + arrives in a column that had numeric min/max (or vice versa), the column is + downgraded to categorical: min/max fields are removed, boundary values are + preserved as string levels, and a `console.warn` is emitted once per column. + +- c2426be: Fix `PluginCache` parsing errors for standard and custom jsPsych plugins. The data block was extracted with a lazy regex that overshot into the rest of the info object; replaced with brace-counting extraction that handles any nesting depth. Non-ok HTTP responses (e.g. 404 for unknown plugins) are now caught before reaching the parser rather than passing HTML error pages as source code. Additionally, JSDoc descriptions for parameters inside a `nested:` sub-object (e.g. `view_history`'s `page_index` and `viewing_time` in `jsPsych-instructions`) are now correctly extracted; previously the first nested parameter was silently consumed by the parent variable's regex match and never added to the cache. +- e1cb44e: Fix whitespace-only string values being misdetected as numeric (#70). A cell containing only whitespace (e.g. a single space) passed the `isNaN(Number(value))` check because `Number(" ")` is `0`, but `parseFloat(" ")` is `NaN` — leaking through as `NaN` `minValue`/`maxValue` (serialized to `null`) on otherwise-categorical string columns. The numeric check now requires non-empty trimmed content and uses `Number` for both the test and the conversion so they cannot disagree. +- 3c7d1f7: Accept JSON-Lines (JSONL) experiment data, not just a single JSON array. Several jsPsych labs — and JATOS exports — write data as newline-delimited JSON, with one JSON value per line (typically one participant's full trial array per line) rather than one big array. Previously `generate()` ran `JSON.parse` on the whole string, so every such file failed with `Unexpected non-whitespace character after JSON` and produced no metadata. + + A new exported `parseJsonData` helper handles both shapes: a well-formed single document is returned unchanged (no behaviour change for existing single-array callers), and only when whole-string parsing fails does it fall back to parsing line by line, flattening any per-line arrays into one observation stream. It is now used wherever JSON data files are parsed: + + - `generate()` (the library) for the main ingestion path. + - the CLI's data-file reader, join-key pre-pass, and CSV-conversion path. + - the frontend's join-key pre-flight and Psych-DS file builder. + + The `.jsonl` file extension is now also recognised as a JSON data file (these exports are conventionally named `.jsonl`). The CLI processes `.jsonl` exactly like `.json` — including filename-normalization, raw-original preservation, and CSV conversion — and the frontend normalises a `.jsonl` upload to the JSON path. + + Verified end to end against the raw `.jsonl` exports in `vucml/online_experiments`: all 15 files now generate metadata and pass the Psych-DS validator with zero errors (they failed at parse time before). + +- 3c7d1f7: Synthesize a `source_record_id` join key for multi-record JSON-Lines exports. Raw jsPsych exports carry no per-row identifier, so once JSONL is flattened (one record per line) `trial_index` repeats across records and can't uniquely key the extracted array/object sidecar CSVs — every record's trial 0 collapsed onto the same `(trial_index, element_index)` key, making the sidecars impossible to join back to a single parent trial. + + The synthesized column is named `source_record_id` rather than `participant_id` because a JSON-Lines line is only guaranteed to be one _source record_ — usually, but not always, one participant. The honest name avoids overclaiming for exports where a line isn't a single subject. + + `parseJsonData` now takes an opt-in `{ tagSourceRecordId }` flag: in the JSON-Lines path it stamps each line's object rows with a 0-based `source_record_id` (a no-op on the single-array fast path), and reports via an optional `stats` out-param whether it actually synthesized the id. A line that already carries a `source_record_id` or a real `participant_id` is left untouched — the experiment's own identifier already groups those rows. `generate()` enables this for JSON input and promotes the identifier to the leading join key, preferring the synthesized `source_record_id` and falling back to a real `participant_id` already present in the export (`['source_record_id', 'trial_index']` or `['participant_id', 'trial_index']`), so the sidecars join unambiguously. CSV inputs are unaffected. + + When — and only when — the id was actually synthesized (i.e. absent from the source), it is given an explicit description that makes its synthetic origin unmistakable ("Synthetic source-record identifier … NOT a real subject ID from the experiment …") so a downstream user can't mistake it for a real subject ID; this also avoids serializing an empty `{}` description (an object with no `@type`, which trips the validator's `OBJECT_TYPE_MISSING`). The CLI's join-key pre-analysis/prompt and the frontend's pre-flight mirror this promotion so multi-record JSONL is no longer falsely flagged as having a non-unique join key. + + Verified end to end against the raw `.jsonl` exports in `vucml/online_experiments` (`block_cat`): the combined 30-record export generates metadata, passes the Psych-DS validator (0 errors), synthesizes `source_record_id` 0–29, and writes sidecars whose `(source_record_id, trial_index, element_index)` keys are fully unique — including the doubly-nested `recall_responses` case. Notably `subjectId` collides across the two merged datasets (two records share `601`), which `source_record_id` correctly keeps distinct. + +- 72f8a4b: Register jsPsych system variables (`trial_type`, `trial_index`, `time_elapsed`, `extension_type`, `extension_version`) lazily instead of seeding them in the `VariablesMap` constructor. They now appear in `variableMeasured` only when their column is actually present in the data. Previously `time_elapsed` (and the others) were always emitted, so any dataset whose CSVs omit `time_elapsed` — common for processed/aggregated jsPsych exports — failed Psych-DS validation with `VARIABLE_MISSING_FROM_CSV_COLUMNS`. Datasets that do contain these columns are unaffected. + + This also removes the eager `generateDefaultExtensionVariables()` seeding path, which registered both `extension_type` and `extension_version` whenever `extension_type` was observed — orphaning `extension_version` for any dataset that lacked that column. The extension variables now register lazily per-column like the other system variables. + +- ca8dc75: Extend the stress-test regression guards with three more Jest suites covering the CSV ingestion path, generation at scale, and cross-file output-name collisions. + + - `@jspsych/metadata` — `csv-input.stress`: pins how `generate(data, {}, "csv")` re-infers types from string cells (numeric coercion incl. whitespace/scientific-notation/`Infinity`/`NaN` rejection, mixed-column downgrade, `"true"`/`"false"` staying categorical, RFC-4180 quoting, unicode, empty/literal-`null` cells, the 50-char level cap, JSON-in-a-cell extraction), and asserts CSV/JSON parity for unambiguously-typed columns. + - `@jspsych/metadata` — `scale.stress`: feeds a 5,000-row dataset and checks exact numeric extremes, categorical dedup, high-cardinality level accumulation, boolean handling, and a throughput ceiling that guards against accidental O(n²) regressions. + - `@jspsych/metadata-cli` — `array-collision.stress`: two same-stem files in different subdirectories sharing a nested array column, asserting `processDirectory` disambiguates every main CSV, sidecar, and preserved raw original (no overwrites, all still Psych-DS compliant) — the cross-file collision gap left by the earlier rename suite. + + Test-only change; no library or CLI behavior is modified. + +- fa17a9e: Add stress-test regression guards to the automated suite so previously-fixed nested-data and filename-normalization behavior can't silently regress. + + Four Jest suites, ported from the standalone `stress-tests/` harnesses so they run under plain `npm test` (and CI) without a build step: + + - `@jspsych/metadata`: `generate()` coherence over a comprehensive nested-data fixture (deep objects, arrays of objects/arrays, mixed-type columns, a `trial_type`-less row, unicode, empties), plus the Psych-DS filename-normalization helper invariants. + - `@jspsych/metadata-cli`: the `processDirectory` conversion end-to-end (compliant main CSV, `data/raw/` preservation, two-way `variableMeasured` ↔ CSV-column cross-check, and a best-effort Psych-DS validation pass), plus the refusal to write a non-compliant filename non-interactively. + + Test-only change; no library or CLI behavior is modified. The shared fixture lives at `dev/stress/`. + +- 55f2f91: Strip JSDoc continuation `*` markers when parsing multi-line plugin/extension variable descriptions, so descriptions like the webgazer extension's `webgazer_data` no longer contain stray asterisks. Adds a regression test for webgazer-shaped multi-line JSDoc. + ## 0.0.3 ### Patch Changes diff --git a/packages/metadata/package.json b/packages/metadata/package.json index 355201b..4e63a64 100644 --- a/packages/metadata/package.json +++ b/packages/metadata/package.json @@ -1,6 +1,6 @@ { "name": "@jspsych/metadata", - "version": "0.0.3", + "version": "0.1.0", "description": "jsPsych package for creating and customizing metadata according to Psych-DS standards.", "type": "module", "main": "dist/index.js",