-
Notifications
You must be signed in to change notification settings - Fork 4
multi identifier preference record uploads #444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| allowedIdentifierNames: { | ||
| kind: 'parsed', | ||
| parse: (value: string) => value.split(',').map((s) => s.trim()), | ||
| brief: | ||
| 'Identifiers configured for the run. Comma-separated list of identifier names.', | ||
| }, | ||
| identifierColumns: { | ||
| kind: 'parsed', | ||
| parse: (value: string) => value.split(',').map((s) => s.trim()), | ||
| brief: | ||
| 'Columns in the CSV that should be used as identifiers. Comma-separated list of column names.', | ||
| }, | ||
| columnsToIgnore: { | ||
| kind: 'parsed', | ||
| parse: (value: string) => value.split(',').map((s) => s.trim()), | ||
| brief: | ||
| 'Columns in the CSV that should be ignored. Comma-separated list of column names.', | ||
| optional: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred pattern for lists is to use the built-in variadic: ','. This will parse it in the same way, but it will also (A) provide better error messages for malformed inputs, and (B) it will also self-document that it's a list which can be provided as a comma-separated argument like this:
FLAGS
--auth The Transcend API key.
[--identifierColumns]... Identifier names configured for the run. [separator = ,]
It also allows users to pass --identifierColumns 1 --identifierColumns 2 --identifierColumns 3
And if any of these are enums (I don't think they are, but) you can also do something like this to validate (and also self-document) the expected inputs:

If you
| allowedIdentifierNames: { | |
| kind: 'parsed', | |
| parse: (value: string) => value.split(',').map((s) => s.trim()), | |
| brief: | |
| 'Identifiers configured for the run. Comma-separated list of identifier names.', | |
| }, | |
| identifierColumns: { | |
| kind: 'parsed', | |
| parse: (value: string) => value.split(',').map((s) => s.trim()), | |
| brief: | |
| 'Columns in the CSV that should be used as identifiers. Comma-separated list of column names.', | |
| }, | |
| columnsToIgnore: { | |
| kind: 'parsed', | |
| parse: (value: string) => value.split(',').map((s) => s.trim()), | |
| brief: | |
| 'Columns in the CSV that should be ignored. Comma-separated list of column names.', | |
| optional: true, | |
| allowedIdentifierNames: { | |
| kind: 'parsed', | |
| parse: String, | |
| variadic: ',', | |
| brief: 'Identifier names configured for the run.', | |
| }, | |
| identifierColumns: { | |
| kind: 'parsed', | |
| parse: String, | |
| variadic: ',', | |
| brief: 'Columns in the CSV that should be used as identifiers.', | |
| }, | |
| columnsToIgnore: { | |
| kind: 'parsed', | |
| parse: String, | |
| variadic: ',', | |
| brief: 'Columns in the CSV that should be ignored.', | |
| optional: true, |
a7aa378 to
0be1c91
Compare
536ca21 to
702709e
Compare
| ]); | ||
| currentState.timestampColum = timestampName; | ||
|
|
||
| currentState.setValue(timestampName, 'timestampColumn'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Incorrect Parameter Order in setValue Calls
The setValue calls in parsePreferenceFileFormatFromCsv.ts and parsePreferenceAndPurposeValuesFromCsv.ts pass parameters in the wrong order, using (value, key) instead of the expected (key, value). This is inconsistent with getValue's key-first convention and likely results in incorrect state updates.
Additional Locations (1)
| 'When uploading preferences to v1/preferences - this is the number of concurrent requests made at any given time by a single process.' + | ||
| "This is NOT the batch size—it's how many batch *tasks* run in parallel. " + | ||
| 'The number of total concurrent requests is maxed out at concurrency * uploadConcurrency.', | ||
| default: '75', // FIXME 25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Object.entries(datum).reduce( | ||
| (acc, [key, value]) => | ||
| Object.assign(acc, { | ||
| [key.replace(/[^a-z_.+\-A-Z -~]/g, '')]: value, |
Check warning
Code scanning / CodeQL
Overly permissive regular expression range Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
To fix the overly permissive regular expression range, explicitly specify which characters are allowed in the CSV key names, rather than using the ambiguous -~ range.
- Rewrite the regex so that only the desired characters (
a-z,A-Z, underscores, dots, plus, hyphen, and space, if intended) are allowed. - If space should be allowed, add it as a literal in the character class, not as a range.
- Hyphen should be placed at the beginning or end of the character class or escaped, to avoid being interpreted as a range character.
- The tilde, if meant to be explicitly allowed, should be included explicitly, not as part of a range.
- Update the regex in line 105 of
src/reconcile-preference-records.tsto only include the specific allowed characters.
The only required code edit is in the characters in the regex within the key replacement on line 105.
-
Copy modified line R105
| @@ -102,7 +102,7 @@ | ||
| Object.entries(datum).reduce( | ||
| (acc, [key, value]) => | ||
| Object.assign(acc, { | ||
| [key.replace(/[^a-z_.+\-A-Z -~]/g, '')]: value, | ||
| [key.replace(/[^a-zA-Z0-9_.+\- ]/g, '')]: value, | ||
| }), | ||
| {} as any, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 8 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| ...pref, | ||
| lastUpdatedDate: pref.lastUpdatedDate | ||
| ? pref.lastUpdatedDate | ||
| : new Date('08/24/2025').toISOString(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded fallback date will become stale
Low Severity
The fallback date new Date('08/24/2025') is hardcoded for records missing lastUpdatedDate. This magic date will become outdated and may cause data integrity issues when the actual date significantly differs from this static value, making records appear to have stale timestamps.
| .split(',') | ||
| .map((email) => email.trim().toLowerCase()); | ||
|
|
||
| const keys = Object.keys(preferences[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty CSV file causes crash in transformCsv
Medium Severity
The transformCsv function accesses preferences[0] without checking if the array is empty. If a CSV file contains only a header row with no data rows, preferences will be an empty array, and Object.keys(preferences[0]) will throw a TypeError because preferences[0] is undefined. This crashes the worker with an unhelpful error message.
|
|
||
| ## [9.0.0] - 2025-08-15 | ||
|
|
||
| FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| */ | ||
| async function main(): Promise<void> { | ||
| const opts: Options = { | ||
| in: path.resolve('./working/costco/concerns/out.csv'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Development script with hardcoded paths committed
Medium Severity
This 872-line script contains a hardcoded local path ./working/costco/concerns/out.csv and is not referenced anywhere in package.json or other source files. It appears to be a development/debugging utility that was accidentally committed to the repository.
| // Create got instance with default values | ||
| return got.extend({ | ||
| prefixUrl: customerUrl, | ||
| prefixUrl: process.env.SOMBRA_URL || customerUrl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sombra URL log does not match actual URL used
Medium Severity
The log message at line 49 displays customerUrl, but line 52 uses process.env.SOMBRA_URL || customerUrl as the actual prefixUrl. When the environment variable is set, the logged URL differs from the URL actually used, causing misleading diagnostic output during debugging.
| const shouldLog = | ||
| total % logInterval === 0 || | ||
| Math.floor((total - identifiers.length) / logInterval) < | ||
| Math.floor(total / logInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress logging uses wrong variable in boundary check
Medium Severity
The maybeLogProgress function receives a delta parameter but the shouldLog calculation uses identifiers.length (total count) instead of delta in the boundary-crossing check. This causes the condition (total - identifiers.length) to be negative until processing completes, making the comparison always true and logging after every single group instead of at the intended interval.
| nodes { | ||
| id | ||
| # FIXME remove | ||
| status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| main().catch((err) => { | ||
| console.error(err?.stack ?? String(err)); | ||
| process.exit(1); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unreferenced utility script committed to source
Low Severity
This 341-line standalone script for searching files by content is not referenced anywhere in package.json or other source files. It appears to be a development utility for finding specific strings in CSV/JSON/parquet files that was accidentally committed to the repository.
Related Issues
Usage
Note
Major redesign of preference uploads with parallelism, multi-identifier support, and persistent state.
worker.ts) and live dashboard; many new flags (--allowedIdentifierNames,--identifierColumns,--uploadConcurrency,--maxChunkSize, etc.)FileFormatState,schemaState.ts) and receipts (receiptsState.ts) with exponential-backoff reads; aggregates success/fail/pending and exports failing-updates CSVbuildInteractiveUploadPreferencePlan(validation/mapping) +interactivePreferenceUploaderFromPlan(batch upload with smarter retries/splitting and email validation)getPreferencesForIdentifiers): progress callbacks, recursive split on validation errors, improved retry logicuploadPreferenceManagementPreferencesInteractivefind-exact.tsandreconcile-preference-records.ts; logs active Sombra URL; minor GraphQL retry log tweak9.0.0Written by Cursor Bugbot for commit 85abeeb. This will update automatically on new commits. Configure here.