multi identifier preference record uploads #444

JonnavithulaGirish · 2025-08-07T05:47:26Z

Related Issues

closes PIK-166

Usage

pnpm start consent upload-preferences --auth=your-auth-token --partition=your-partition --directory=./examples/pm-test --dryRun=true --skipWorkflowTriggers=true --skipExistingRecordCheck=true --isSilent=true --attributes="Tags:transcend-cli,Source:transcend-cli" --transcendUrl=https://api.transcend.io/ --allowedIdentifierNames="email,personId,memberId" --identifierColumns="email_id,person_id,member_id"

Note

Major redesign of preference uploads with parallelism, multi-identifier support, and persistent state.

Replaces single-file flow with a directory-based, multi-process pool (worker.ts) and live dashboard; many new flags (--allowedIdentifierNames, --identifierColumns, --uploadConcurrency, --maxChunkSize, etc.)
Adds persistent schema (FileFormatState, schemaState.ts) and receipts (receiptsState.ts) with exponential-backoff reads; aggregates success/fail/pending and exports failing-updates CSV
Splits pipeline into plan and execute: buildInteractiveUploadPreferencePlan (validation/mapping) + interactivePreferenceUploaderFromPlan (batch upload with smarter retries/splitting and email validation)
Enhances identifier fetching (getPreferencesForIdentifiers): progress callbacks, recursive split on validation errors, improved retry logic
Introduces CSV transforms and mapping utilities; removes legacy uploadPreferenceManagementPreferencesInteractive
Adds utility scripts find-exact.ts and reconcile-preference-records.ts; logs active Sombra URL; minor GraphQL retry log tweak
Updates README usage/examples and CHANGELOG; bumps package to 9.0.0

^{Written by Cursor Bugbot for commit 85abeeb. This will update automatically on new commits. Configure here.}

linear · 2025-08-07T05:47:28Z

PIK-166 Fix prferences CLI to handle mutliple identifiers

bencmbrook · 2025-08-15T00:37:52Z

src/commands/consent/upload-preferences/command.ts

+      allowedIdentifierNames: {
+        kind: 'parsed',
+        parse: (value: string) => value.split(',').map((s) => s.trim()),
+        brief:
+          'Identifiers configured for the run. Comma-separated list of identifier names.',
+      },
+      identifierColumns: {
+        kind: 'parsed',
+        parse: (value: string) => value.split(',').map((s) => s.trim()),
+        brief:
+          'Columns in the CSV that should be used as identifiers. Comma-separated list of column names.',
+      },
+      columnsToIgnore: {
+        kind: 'parsed',
+        parse: (value: string) => value.split(',').map((s) => s.trim()),
+        brief:
+          'Columns in the CSV that should be ignored. Comma-separated list of column names.',
+        optional: true,


The preferred pattern for lists is to use the built-in variadic: ','. This will parse it in the same way, but it will also (A) provide better error messages for malformed inputs, and (B) it will also self-document that it's a list which can be provided as a comma-separated argument like this:

FLAGS --auth The Transcend API key. [--identifierColumns]... Identifier names configured for the run. [separator = ,]

It also allows users to pass --identifierColumns 1 --identifierColumns 2 --identifierColumns 3

And if any of these are enums (I don't think they are, but) you can also do something like this to validate (and also self-document) the expected inputs:

If you

Suggested change

allowedIdentifierNames: {

kind: 'parsed',

parse: (value: string) => value.split(',').map((s) => s.trim()),

brief:

'Identifiers configured for the run. Comma-separated list of identifier names.',

},

identifierColumns: {

kind: 'parsed',

parse: (value: string) => value.split(',').map((s) => s.trim()),

brief:

'Columns in the CSV that should be used as identifiers. Comma-separated list of column names.',

},

columnsToIgnore: {

kind: 'parsed',

parse: (value: string) => value.split(',').map((s) => s.trim()),

brief:

'Columns in the CSV that should be ignored. Comma-separated list of column names.',

optional: true,

allowedIdentifierNames: {

kind: 'parsed',

parse: String,

variadic: ',',

brief: 'Identifier names configured for the run.',

},

identifierColumns: {

kind: 'parsed',

parse: String,

variadic: ',',

brief: 'Columns in the CSV that should be used as identifiers.',

},

columnsToIgnore: {

kind: 'parsed',

parse: String,

variadic: ',',

brief: 'Columns in the CSV that should be ignored.',

optional: true,

cursor · 2025-10-30T18:29:11Z

src/lib/preference-management/parsePreferenceFileFormatFromCsv.ts

    ]);
-    currentState.timestampColum = timestampName;
+
+    currentState.setValue(timestampName, 'timestampColumn');


Bug: Incorrect Parameter Order in setValue Calls

The setValue calls in parsePreferenceFileFormatFromCsv.ts and parsePreferenceAndPurposeValuesFromCsv.ts pass parameters in the wrong order, using (value, key) instead of the expected (key, value). This is inconsistent with getValue's key-first convention and likely results in incorrect state updates.

Additional Locations (1)

src/lib/preference-management/parsePreferenceAndPurposeValuesFromCsv.ts#L209-L210

cursor · 2025-11-05T23:35:40Z

src/commands/consent/upload-preferences/command.ts

+          'When uploading preferences to v1/preferences - this is the number of concurrent requests made at any given time by a single process.' +
+          "This is NOT the batch size—it's how many batch *tasks* run in parallel. " +
+          'The number of total concurrent requests is maxed out at concurrency * uploadConcurrency.',
+        default: '75', // FIXME 25


Bug: Fix: incorrect default uploadConcurrency value persisted

The comment "// FIXME 25" on line 114 suggests the default value for uploadConcurrency should be 25 instead of 75, but was left at 75. This appears to be temporary debugging code or an unfinished change that was accidentally committed.

src/reconcile-preference-records.ts

+    Object.entries(datum).reduce(
+      (acc, [key, value]) =>
+        Object.assign(acc, {
+          [key.replace(/[^a-z_.+\-A-Z -~]/g, '')]: value,


To fix the overly permissive regular expression range, explicitly specify which characters are allowed in the CSV key names, rather than using the ambiguous -~ range.

Rewrite the regex so that only the desired characters (a-z, A-Z, underscores, dots, plus, hyphen, and space, if intended) are allowed.

If space should be allowed, add it as a literal in the character class, not as a range.

Hyphen should be placed at the beginning or end of the character class or escaped, to avoid being interpreted as a range character.

The tilde, if meant to be explicitly allowed, should be included explicitly, not as part of a range.

Update the regex in line 105 of src/reconcile-preference-records.ts to only include the specific allowed characters.

The only required code edit is in the characters in the regex within the key replacement on line 105.

cursor

Cursor Bugbot has reviewed your changes and found 8 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-01-26T16:26:10Z

src/commands/consent/upload-preferences/upload/transform/transformCsv.ts

+        ...pref,
+        lastUpdatedDate: pref.lastUpdatedDate
+          ? pref.lastUpdatedDate
+          : new Date('08/24/2025').toISOString(),


Hardcoded fallback date will become stale

Low Severity

The fallback date new Date('08/24/2025') is hardcoded for records missing lastUpdatedDate. This magic date will become outdated and may cause data integrity issues when the actual date significantly differs from this static value, making records appear to have stale timestamps.

cursor · 2026-01-26T16:26:10Z

src/commands/consent/upload-preferences/upload/transform/transformCsv.ts

+    .split(',')
+    .map((email) => email.trim().toLowerCase());
+
+  const keys = Object.keys(preferences[0]);


Empty CSV file causes crash in transformCsv

Medium Severity

The transformCsv function accesses preferences[0] without checking if the array is empty. If a CSV file contains only a header row with no data rows, preferences will be an empty array, and Object.keys(preferences[0]) will throw a TypeError because preferences[0] is undefined. This crashes the worker with an unhelpful error message.

cursor · 2026-01-26T16:26:10Z

CHANGELOG.md


+## [9.0.0] - 2025-08-15
+
+FIXME


FIXME placeholder committed in changelog

Medium Severity

The version 9.0.0 changelog entry contains only "FIXME" as placeholder text instead of actual release notes. This placeholder was committed and will be visible to users.

cursor · 2026-01-26T16:26:10Z

src/reconcile-preference-records.ts

+ */
+async function main(): Promise<void> {
+  const opts: Options = {
+    in: path.resolve('./working/costco/concerns/out.csv'),


Development script with hardcoded paths committed

Medium Severity

This 872-line script contains a hardcoded local path ./working/costco/concerns/out.csv and is not referenced anywhere in package.json or other source files. It appears to be a development/debugging utility that was accidentally committed to the repository.

cursor · 2026-01-26T16:26:10Z

src/lib/graphql/createSombraGotInstance.ts

  // Create got instance with default values
  return got.extend({
-    prefixUrl: customerUrl,
+    prefixUrl: process.env.SOMBRA_URL || customerUrl,


Sombra URL log does not match actual URL used

Medium Severity

The log message at line 49 displays customerUrl, but line 52 uses process.env.SOMBRA_URL || customerUrl as the actual prefixUrl. When the environment variable is set, the logged URL differs from the URL actually used, causing misleading diagnostic output during debugging.

cursor · 2026-01-26T16:26:10Z

src/lib/preference-management/getPreferencesForIdentifiers.ts

+    const shouldLog =
+      total % logInterval === 0 ||
+      Math.floor((total - identifiers.length) / logInterval) <
+        Math.floor(total / logInterval);


Progress logging uses wrong variable in boundary check

Medium Severity

The maybeLogProgress function receives a delta parameter but the shouldLog calculation uses identifiers.length (total count) instead of delta in the boundary-crossing check. This causes the condition (total - identifiers.length) to be negative until processing completes, making the comparison always true and logging after every single group instead of at the intended interval.

cursor · 2026-01-26T16:26:10Z

src/lib/graphql/gqls/RequestDataSilo.ts

      nodes {
        id
+        # FIXME remove
+        status


Debug field with FIXME comment committed

Low Severity

A status field was added to the GraphQL query with a comment # FIXME remove, indicating this is temporary debugging code that was accidentally committed. This adds unnecessary data fetching overhead and the field may not be used.

cursor · 2026-01-26T16:26:10Z

src/find-exact.ts

+main().catch((err) => {
+  console.error(err?.stack ?? String(err));
+  process.exit(1);
+});


Unreferenced utility script committed to source

Low Severity

This 341-line standalone script for searching files by content is not referenced anywhere in package.json or other source files. It appears to be a development utility for finding specific strings in CSV/JSON/parquet files that was accidentally committed to the repository.

JonnavithulaGirish added 2 commits August 7, 2025 05:44

Add multi-identifier support for preference upload

c2854e7

Add multi-identifier support for preference upload

6750f10

JonnavithulaGirish added 7 commits August 7, 2025 05:48

rever lock changes

c190974

clean up

720d92c

add costco specific logic

7543ac4

add columnsToIgnore

2549be4

remove transcendID prompt

69bfa8e

try increasing concurrencey

9544327

revert to bluebird

1a5c785

bencmbrook requested changes Aug 15, 2025

View reviewed changes

michaelfarrell76 added 12 commits August 15, 2025 17:24

Rewrites how receipts are stored to file

2de8843

Parrallel:

a57fbd1

basically working

fba8809

Working logs

b4a6a52

mostly wokring

c0d20f1

Working in parallel

41c509c

Working

e8f7d2c

better concurrency

9ff89fd

etter concurrency

6379b4b

perfectly balanced

1df22f3

working

fe40f17

Mostly working

1f42f36

michaelfarrell76 mentioned this pull request Aug 16, 2025

Adds some helper functions for using concurrency pools. #449

Merged

michaelfarrell76 added 6 commits August 16, 2025 14:45

Merges

20c8cdf

Uses new function

497f2a5

Merges

855df24

rm

1344aa9

adds docs to codecs

ef0d0f9

gitignore and bb

41ceb8c

merges main

0be1c91

michaelfarrell76 force-pushed the jonnavithulaGirish/multiIdentifier branch from a7aa378 to 0be1c91 Compare August 19, 2025 00:33

rm

702709e

michaelfarrell76 force-pushed the jonnavithulaGirish/multiIdentifier branch from 536ca21 to 702709e Compare August 19, 2025 00:37

michaelfarrell76 added 13 commits August 18, 2025 21:25

rm

3563168

Merges

3a10ae2

merg

cb3c694

changes eexport

03b2659

parquet splitter

42be153

Merges

7371a2c

fuxem

1171cd8

ud

65a208b

fixes tsc

bc568eb

Updates

7ea87cf

Rev

decb452

Rev

25702de

rev

9f8bdb6

This comment was marked as outdated.

Sign in to view

ud

7d12a5b

cursor bot reviewed Oct 30, 2025

View reviewed changes

MergeS

f8a91f9

cursor bot reviewed Nov 5, 2025

View reviewed changes

Ud

0402858

github-advanced-security bot found potential problems Dec 16, 2025

View reviewed changes

michaelfarrell76 added 5 commits December 15, 2025 20:20

Reverts

6817f50

Reverts

a4b1f31

ud

5ff7810

exact

f9c63b8

Delete src/trim-scripts.ts

85abeeb

cursor bot reviewed Jan 26, 2026

View reviewed changes

multi identifier preference record uploads #444

Are you sure you want to change the base?

multi identifier preference record uploads #444

Uh oh!

Conversation

JonnavithulaGirish commented Aug 7, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Usage

Uh oh!

linear bot commented Aug 7, 2025

Uh oh!

bencmbrook Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot Oct 30, 2025

Choose a reason for hiding this comment

Bug: Incorrect Parameter Order in setValue Calls

Uh oh!

cursor bot Nov 5, 2025

Choose a reason for hiding this comment

Bug: Fix: incorrect default uploadConcurrency value persisted

Uh oh!

Check warning

Copilot Autofix

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Hardcoded fallback date will become stale

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Empty CSV file causes crash in transformCsv

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

FIXME placeholder committed in changelog

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Development script with hardcoded paths committed

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Sombra URL log does not match actual URL used

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Progress logging uses wrong variable in boundary check

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Debug field with FIXME comment committed

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Unreferenced utility script committed to source

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JonnavithulaGirish commented Aug 7, 2025 •

edited by cursor bot

Loading

bencmbrook Aug 15, 2025 •

edited

Loading

Bug: Incorrect Parameter Order in `setValue` Calls