Skip to content

feat: expose videoSegmentDuration, videoFramesPerSegment, and pageIndices in GenerationConfig#130

Open
dineshreddy91 wants to merge 1 commit intomainfrom
devin/1775697621-expose-video-doc-config
Open

feat: expose videoSegmentDuration, videoFramesPerSegment, and pageIndices in GenerationConfig#130
dineshreddy91 wants to merge 1 commit intomainfrom
devin/1775697621-expose-video-doc-config

Conversation

@dineshreddy91
Copy link
Copy Markdown
Contributor

@dineshreddy91 dineshreddy91 commented Apr 9, 2026

Summary

Adds three new optional parameters to GenerationConfig (and GenerationConfigParams) to expose server-side video transcription and document processing controls:

  • videoSegmentDuration (number): Duration in seconds for each video segment when chunking for transcription (maps to video_segment_duration)
  • videoFramesPerSegment (number): Number of frames to sample per video segment (maps to video_frames_per_segment)
  • pageIndices (number[]): 0-indexed page indices to selectively process document pages (maps to page_indices)

These are conditionally included in toJSON() only when set, to avoid sending unnecessary defaults to the API.

Companion PRs: vlm-lab (adds page_indices to backend GenerationConfig) and vlmrun-python-sdk (same fields in Python SDK).

Review & Testing Checklist for Human

  • Verify the conditional spread pattern in toJSON() (lines 357-359) correctly omits fields when undefined — note this differs from existing fields which are always serialized. Confirm this is the desired behavior with the backend.
  • Confirm no client-side validation is needed for videoSegmentDuration >= 1.0 and videoFramesPerSegment >= 1 (backend enforces these constraints)

Notes

  • No tests added — these are type-level additions with straightforward serialization logic
  • The GenerationConfigParams type and GenerationConfig class both updated to keep them in sync

Link to Devin session: https://app.devin.ai/sessions/2f2d14e8cb55450c80d380356f9899b6
Requested by: @dineshreddy91


Open with Devin

…s to GenerationConfig

Expose new configuration parameters from vlm-lab:
- videoSegmentDuration: control chunk size when splitting video for transcription
- videoFramesPerSegment: control frame sampling per video segment
- pageIndices: specify which document pages to process (0-indexed)

Co-Authored-By: dinesh@vlm.run <dinesh@vlm.run>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 2 additional findings in Devin Review.

Open in Devin Review

Comment on lines +300 to +302
videoSegmentDuration?: number;
videoFramesPerSegment?: number;
pageIndices?: number[];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 New GenerationConfig fields silently dropped by all prediction clients

The three new config fields (videoSegmentDuration, videoFramesPerSegment, pageIndices) are added to GenerationConfigParams/GenerationConfig and properly handled in toJSON(), but all four prediction methods in predictions.ts manually construct the config object sent to the API (lines 206-212, 329-335, 394-400, 457-463), only including the original five fields. This means a user who sets e.g. config: { pageIndices: [0, 1], domain: 'document.invoice' } on a document prediction, or config: { videoSegmentDuration: 30 } on a video prediction, will have those fields silently ignored. The domains.ts:48 path uses GenerationConfig.toJSON() properly, but the core prediction endpoints — which are the primary consumers of these new fields — all drop them.

Affected code in predictions.ts (4 identical manual constructions)
config: {
  detail: config?.detail ?? "auto",
  json_schema: jsonSchema,
  confidence: config?.confidence ?? false,
  grounding: config?.grounding ?? false,
  gql_stmt: config?.gqlStmt ?? null,
  // videoSegmentDuration, videoFramesPerSegment, pageIndices are MISSING
},
Prompt for agents
The new fields videoSegmentDuration, videoFramesPerSegment, and pageIndices are added to GenerationConfigParams and GenerationConfig.toJSON(), but the four prediction methods in src/client/predictions.ts all manually construct the config object for the API request, only including the five original fields (detail, json_schema, confidence, grounding, gql_stmt). The new fields are silently dropped.

The affected methods are:
1. ImagePredictions.generate (line ~206)
2. FilePredictions.generate (line ~329)
3. FilePredictions.execute (line ~394)
4. WebPredictions.generate (line ~457)

The fix should add the three new fields to each of these manual config constructions, using the same conditional spread pattern as in toJSON():
  ...(config?.videoSegmentDuration !== undefined && { video_segment_duration: config.videoSegmentDuration }),
  ...(config?.videoFramesPerSegment !== undefined && { video_frames_per_segment: config.videoFramesPerSegment }),
  ...(config?.pageIndices !== undefined && { page_indices: config.pageIndices }),

Alternatively, refactor these methods to use new GenerationConfig(config).toJSON() instead of manually constructing the config object, which would prevent this class of bug in the future (similar to how src/client/domains.ts:48 does it).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds videoSegmentDuration, videoFramesPerSegment, and pageIndices to the GenerationConfig class and its associated parameter interface. While the fields are correctly defined and serialized, feedback highlights that they are not yet integrated into the API request logic within src/client/predictions.ts, rendering them non-functional in the current state.

Comment on lines +300 to +302
videoSegmentDuration?: number;
videoFramesPerSegment?: number;
pageIndices?: number[];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new parameters videoSegmentDuration, videoFramesPerSegment, and pageIndices are correctly added to the GenerationConfigParams interface and GenerationConfig class, but they are not yet integrated into the API request logic.

Currently, the prediction methods in src/client/predictions.ts (specifically ImagePredictions.generate, FilePredictions.generate, FilePredictions.execute, and WebPredictions.generate) manually reconstruct the config object for the API payload and omit these new fields. For example, in FilePredictions.generate (lines 329-335 of src/client/predictions.ts), the config object is hardcoded with only a subset of fields.

To make these new parameters functional, the mapping logic in src/client/predictions.ts must be updated to include these parameters in the payload sent to the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant