Skip to content

Conversation

@JonnavithulaGirish
Copy link
Member

@JonnavithulaGirish JonnavithulaGirish commented Aug 7, 2025

Related Issues

Usage

pnpm start consent upload-preferences --auth=your-auth-token --partition=your-partition --directory=./examples/pm-test --dryRun=true --skipWorkflowTriggers=true --skipExistingRecordCheck=true --isSilent=true --attributes="Tags:transcend-cli,Source:transcend-cli" --transcendUrl=https://api.transcend.io/ --allowedIdentifierNames="email,personId,memberId" --identifierColumns="email_id,person_id,member_id"

Note

Major redesign of preference uploads with parallelism, multi-identifier support, and persistent state.

  • Replaces single-file flow with a directory-based, multi-process pool (worker.ts) and live dashboard; many new flags (--allowedIdentifierNames, --identifierColumns, --uploadConcurrency, --maxChunkSize, etc.)
  • Adds persistent schema (FileFormatState, schemaState.ts) and receipts (receiptsState.ts) with exponential-backoff reads; aggregates success/fail/pending and exports failing-updates CSV
  • Splits pipeline into plan and execute: buildInteractiveUploadPreferencePlan (validation/mapping) + interactivePreferenceUploaderFromPlan (batch upload with smarter retries/splitting and email validation)
  • Enhances identifier fetching (getPreferencesForIdentifiers): progress callbacks, recursive split on validation errors, improved retry logic
  • Introduces CSV transforms and mapping utilities; removes legacy uploadPreferenceManagementPreferencesInteractive
  • Adds utility scripts find-exact.ts and reconcile-preference-records.ts; logs active Sombra URL; minor GraphQL retry log tweak
  • Updates README usage/examples and CHANGELOG; bumps package to 9.0.0

Written by Cursor Bugbot for commit 85abeeb. This will update automatically on new commits. Configure here.

@linear
Copy link

linear bot commented Aug 7, 2025

Comment on lines +102 to +119
allowedIdentifierNames: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Identifiers configured for the run. Comma-separated list of identifier names.',
},
identifierColumns: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Columns in the CSV that should be used as identifiers. Comma-separated list of column names.',
},
columnsToIgnore: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Columns in the CSV that should be ignored. Comma-separated list of column names.',
optional: true,
Copy link
Member

@bencmbrook bencmbrook Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred pattern for lists is to use the built-in variadic: ','. This will parse it in the same way, but it will also (A) provide better error messages for malformed inputs, and (B) it will also self-document that it's a list which can be provided as a comma-separated argument like this:

FLAGS
      --auth                    The Transcend API key.
     [--identifierColumns]...   Identifier names configured for the run. [separator = ,]

It also allows users to pass --identifierColumns 1 --identifierColumns 2 --identifierColumns 3

And if any of these are enums (I don't think they are, but) you can also do something like this to validate (and also self-document) the expected inputs:
Image

If you

Suggested change
allowedIdentifierNames: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Identifiers configured for the run. Comma-separated list of identifier names.',
},
identifierColumns: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Columns in the CSV that should be used as identifiers. Comma-separated list of column names.',
},
columnsToIgnore: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Columns in the CSV that should be ignored. Comma-separated list of column names.',
optional: true,
allowedIdentifierNames: {
kind: 'parsed',
parse: String,
variadic: ',',
brief: 'Identifier names configured for the run.',
},
identifierColumns: {
kind: 'parsed',
parse: String,
variadic: ',',
brief: 'Columns in the CSV that should be used as identifiers.',
},
columnsToIgnore: {
kind: 'parsed',
parse: String,
variadic: ',',
brief: 'Columns in the CSV that should be ignored.',
optional: true,

@michaelfarrell76 michaelfarrell76 force-pushed the jonnavithulaGirish/multiIdentifier branch from a7aa378 to 0be1c91 Compare August 19, 2025 00:33
@michaelfarrell76 michaelfarrell76 force-pushed the jonnavithulaGirish/multiIdentifier branch from 536ca21 to 702709e Compare August 19, 2025 00:37
cursor[bot]

This comment was marked as outdated.

]);
currentState.timestampColum = timestampName;

currentState.setValue(timestampName, 'timestampColumn');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect Parameter Order in setValue Calls

The setValue calls in parsePreferenceFileFormatFromCsv.ts and parsePreferenceAndPurposeValuesFromCsv.ts pass parameters in the wrong order, using (value, key) instead of the expected (key, value). This is inconsistent with getValue's key-first convention and likely results in incorrect state updates.

Additional Locations (1)

Fix in Cursor Fix in Web

'When uploading preferences to v1/preferences - this is the number of concurrent requests made at any given time by a single process.' +
"This is NOT the batch size—it's how many batch *tasks* run in parallel. " +
'The number of total concurrent requests is maxed out at concurrency * uploadConcurrency.',
default: '75', // FIXME 25
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Fix: incorrect default uploadConcurrency value persisted

The comment "// FIXME 25" on line 114 suggests the default value for uploadConcurrency should be 25 instead of 75, but was left at 75. This appears to be temporary debugging code or an unfinished change that was accidentally committed.

Fix in Cursor Fix in Web

Object.entries(datum).reduce(
(acc, [key, value]) =>
Object.assign(acc, {
[key.replace(/[^a-z_.+\-A-Z -~]/g, '')]: value,

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with A-Z in the same character class, and overlaps with a-z in the same character class.

Copilot Autofix

AI about 2 months ago

To fix the overly permissive regular expression range, explicitly specify which characters are allowed in the CSV key names, rather than using the ambiguous -~ range.

  • Rewrite the regex so that only the desired characters (a-z, A-Z, underscores, dots, plus, hyphen, and space, if intended) are allowed.
  • If space should be allowed, add it as a literal in the character class, not as a range.
  • Hyphen should be placed at the beginning or end of the character class or escaped, to avoid being interpreted as a range character.
  • The tilde, if meant to be explicitly allowed, should be included explicitly, not as part of a range.
  • Update the regex in line 105 of src/reconcile-preference-records.ts to only include the specific allowed characters.

The only required code edit is in the characters in the regex within the key replacement on line 105.


Suggested changeset 1
src/reconcile-preference-records.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/reconcile-preference-records.ts b/src/reconcile-preference-records.ts
--- a/src/reconcile-preference-records.ts
+++ b/src/reconcile-preference-records.ts
@@ -102,7 +102,7 @@
     Object.entries(datum).reduce(
       (acc, [key, value]) =>
         Object.assign(acc, {
-          [key.replace(/[^a-z_.+\-A-Z -~]/g, '')]: value,
+          [key.replace(/[^a-zA-Z0-9_.+\- ]/g, '')]: value,
         }),
       {} as any,
     ),
EOF
@@ -102,7 +102,7 @@
Object.entries(datum).reduce(
(acc, [key, value]) =>
Object.assign(acc, {
[key.replace(/[^a-z_.+\-A-Z -~]/g, '')]: value,
[key.replace(/[^a-zA-Z0-9_.+\- ]/g, '')]: value,
}),
{} as any,
),
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 8 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

...pref,
lastUpdatedDate: pref.lastUpdatedDate
? pref.lastUpdatedDate
: new Date('08/24/2025').toISOString(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded fallback date will become stale

Low Severity

The fallback date new Date('08/24/2025') is hardcoded for records missing lastUpdatedDate. This magic date will become outdated and may cause data integrity issues when the actual date significantly differs from this static value, making records appear to have stale timestamps.

Fix in Cursor Fix in Web

.split(',')
.map((email) => email.trim().toLowerCase());

const keys = Object.keys(preferences[0]);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty CSV file causes crash in transformCsv

Medium Severity

The transformCsv function accesses preferences[0] without checking if the array is empty. If a CSV file contains only a header row with no data rows, preferences will be an empty array, and Object.keys(preferences[0]) will throw a TypeError because preferences[0] is undefined. This crashes the worker with an unhelpful error message.

Fix in Cursor Fix in Web


## [9.0.0] - 2025-08-15

FIXME
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME placeholder committed in changelog

Medium Severity

The version 9.0.0 changelog entry contains only "FIXME" as placeholder text instead of actual release notes. This placeholder was committed and will be visible to users.

Fix in Cursor Fix in Web

*/
async function main(): Promise<void> {
const opts: Options = {
in: path.resolve('./working/costco/concerns/out.csv'),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Development script with hardcoded paths committed

Medium Severity

This 872-line script contains a hardcoded local path ./working/costco/concerns/out.csv and is not referenced anywhere in package.json or other source files. It appears to be a development/debugging utility that was accidentally committed to the repository.

Fix in Cursor Fix in Web

// Create got instance with default values
return got.extend({
prefixUrl: customerUrl,
prefixUrl: process.env.SOMBRA_URL || customerUrl,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sombra URL log does not match actual URL used

Medium Severity

The log message at line 49 displays customerUrl, but line 52 uses process.env.SOMBRA_URL || customerUrl as the actual prefixUrl. When the environment variable is set, the logged URL differs from the URL actually used, causing misleading diagnostic output during debugging.

Fix in Cursor Fix in Web

const shouldLog =
total % logInterval === 0 ||
Math.floor((total - identifiers.length) / logInterval) <
Math.floor(total / logInterval);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress logging uses wrong variable in boundary check

Medium Severity

The maybeLogProgress function receives a delta parameter but the shouldLog calculation uses identifiers.length (total count) instead of delta in the boundary-crossing check. This causes the condition (total - identifiers.length) to be negative until processing completes, making the comparison always true and logging after every single group instead of at the intended interval.

Fix in Cursor Fix in Web

nodes {
id
# FIXME remove
status
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug field with FIXME comment committed

Low Severity

A status field was added to the GraphQL query with a comment # FIXME remove, indicating this is temporary debugging code that was accidentally committed. This adds unnecessary data fetching overhead and the field may not be used.

Fix in Cursor Fix in Web

main().catch((err) => {
console.error(err?.stack ?? String(err));
process.exit(1);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreferenced utility script committed to source

Low Severity

This 341-line standalone script for searching files by content is not referenced anywhere in package.json or other source files. It appears to be a development utility for finding specific strings in CSV/JSON/parquet files that was accidentally committed to the repository.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants