Skip to content

feat(compass-collection): Create Mock Data Generator Prompt and Batching Logic in Compass - CLOUDP-381914#7892

Open
jcobis wants to merge 8 commits intomainfrom
CLOUDP-381914
Open

feat(compass-collection): Create Mock Data Generator Prompt and Batching Logic in Compass - CLOUDP-381914#7892
jcobis wants to merge 8 commits intomainfrom
CLOUDP-381914

Conversation

@jcobis
Copy link
Copy Markdown
Collaborator

@jcobis jcobis commented Mar 19, 2026

Description

  • Create packages/compass-generative-ai/src/mock-data-generator/ module with Zod tool schema, system prompt, user prompt formatter, and schema batching logic, all ported from the MMS Java backend and translated into TypeScript
  • Move MockDataSchemaResponseShape, MockDataSchemaRawField, and MockDataSchemaResponse from atlas-ai-service.ts into the new module (re-exported for backwards compatibility until next ticket switches over)
  • Add unit tests for batching logic (covering splitSchemaIntoChunks, mergeChunkResponses, validateSchemaSize, needsBatching)

Checklist

  • New tests and/or benchmarks are included
  • Documentation is changed or added
  • If this change updates the UI, screenshots/videos are added and a design review is requested
  • If this change could impact the load on the MongoDB cluster, please describe the expected and worst case impact
  • I have signed the MongoDB Contributor License Agreement (https://www.mongodb.com/legal/contributor-agreement)

Motivation and Context

Part of the broader effort to migrate all Atlas AI calls to the EDU Knowledge Server.

  • Bugfix
  • New feature
  • Dependency update
  • Misc

Types of changes

  • Backport Needed
  • Patch (non-breaking change which fixes an issue)
  • Minor (non-breaking change which adds functionality)
  • Major (fix or feature that would cause existing functionality to change)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported from mms: MockDataSchemaGenerationPrompt.buildUserPrompt()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported from mms: MockDataSchemaGenerationPrompt.java

* Splits a schema into smaller chunks for processing.
* Ported from NaturalLanguageQueryGenerator.splitSchemaIntoChunks()
*/
export function splitSchemaIntoChunks(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported from mms: NaturalLanguageQueryGenerator.splitSchemaIntoChunks()

* Merges multiple chunk responses into a single response.
* Ported from NaturalLanguageQueryGenerator.mergeChunkResponses()
*/
export function mergeChunkResponses(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ported from mms: NaturalLanguageQueryGenerator.mergeChunkResponses()

@jcobis jcobis changed the title WIP feat(compass-collection): Create Mock Data Generator Prompt and Batching Logic in Compass - CLOUDP-381914 Mar 24, 2026
@github-actions github-actions bot added the feat label Mar 24, 2026
@jcobis jcobis marked this pull request as ready for review March 24, 2026 17:43
@jcobis jcobis requested a review from a team as a code owner March 24, 2026 17:43
@jcobis jcobis requested review from Copilot and ivandevp March 24, 2026 17:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new mock-data-generator module in compass-generative-ai to support generating mock data schema mappings (faker.js field mappings) with prompt content, tool schema definitions, and schema batching utilities, while keeping existing exports working via re-exports.

Changes:

  • Added Zod tool schema/types and prompt text for mock-data-schema generation in a new mock-data-generator/ module.
  • Implemented schema batching utilities (split/merge/size validation) and added unit tests for them.
  • Updated atlas-ai-service.ts to consume the moved schema types/shapes from the new module and re-export them for compatibility.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/compass-generative-ai/src/mock-data-generator/schema.ts Adds Zod tool schema + moved response types with compatibility aliases.
packages/compass-generative-ai/src/mock-data-generator/schema-batching.ts Adds chunking/merging/limits helpers for large schemas.
packages/compass-generative-ai/src/mock-data-generator/schema-batching.spec.ts Adds unit tests for batching utilities.
packages/compass-generative-ai/src/mock-data-generator/prompt.ts Adds the system prompt used to generate faker mappings.
packages/compass-generative-ai/src/mock-data-generator/format-schema-for-prompt.ts Adds user-prompt formatting for schema + validation rules.
packages/compass-generative-ai/src/mock-data-generator/index.ts Exposes the new module’s public surface area.
packages/compass-generative-ai/src/atlas-ai-service.ts Switches mock-data schema response shape/type to import from the new module and re-exports for compatibility.

@jcobis jcobis requested a review from mabaasit March 24, 2026 17:57
@jcobis jcobis added the no release notes Fix or feature not for release notes label Mar 24, 2026
if (fieldsPerChunk <= 0) {
throw new Error('fieldsPerChunk must be a positive integer');
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use lodash chunk to clean this:

return chunk(Object.entries(rawSchema), fieldsPerChunk).map(
  (chunk) => Object.fromEntries(chunk)
);


${schemaJson}

${validationRulesPhrase}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add validation rules to the prompt example as well?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original prompt didn't include this, so how about we keep it the same for now? We tested the original prompt for accuracy and such, so don't want to include too many changes at once. I will implement your other suggestions though

documentSchema: RawSchema,
validationRules?: Record<string, unknown> | null
): string {
const schemaJson = JSON.stringify(documentSchema, null, 2);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use toJSString function from mongodb-query-parser to achieve this (check utils/gen-ai-prompt).


Documents in the collection are described by the following schema:

${schemaJson}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap this in code fence (so that it is wrapped in backticks)

@jcobis jcobis requested a review from mabaasit March 30, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat no release notes Fix or feature not for release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants