feat: expose videoSegmentDuration, videoFramesPerSegment, and pageIndices in GenerationConfig#130
feat: expose videoSegmentDuration, videoFramesPerSegment, and pageIndices in GenerationConfig#130dineshreddy91 wants to merge 1 commit intomainfrom
Conversation
…s to GenerationConfig Expose new configuration parameters from vlm-lab: - videoSegmentDuration: control chunk size when splitting video for transcription - videoFramesPerSegment: control frame sampling per video segment - pageIndices: specify which document pages to process (0-indexed) Co-Authored-By: dinesh@vlm.run <dinesh@vlm.run>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| videoSegmentDuration?: number; | ||
| videoFramesPerSegment?: number; | ||
| pageIndices?: number[]; |
There was a problem hiding this comment.
🔴 New GenerationConfig fields silently dropped by all prediction clients
The three new config fields (videoSegmentDuration, videoFramesPerSegment, pageIndices) are added to GenerationConfigParams/GenerationConfig and properly handled in toJSON(), but all four prediction methods in predictions.ts manually construct the config object sent to the API (lines 206-212, 329-335, 394-400, 457-463), only including the original five fields. This means a user who sets e.g. config: { pageIndices: [0, 1], domain: 'document.invoice' } on a document prediction, or config: { videoSegmentDuration: 30 } on a video prediction, will have those fields silently ignored. The domains.ts:48 path uses GenerationConfig.toJSON() properly, but the core prediction endpoints — which are the primary consumers of these new fields — all drop them.
Affected code in predictions.ts (4 identical manual constructions)
config: {
detail: config?.detail ?? "auto",
json_schema: jsonSchema,
confidence: config?.confidence ?? false,
grounding: config?.grounding ?? false,
gql_stmt: config?.gqlStmt ?? null,
// videoSegmentDuration, videoFramesPerSegment, pageIndices are MISSING
},Prompt for agents
The new fields videoSegmentDuration, videoFramesPerSegment, and pageIndices are added to GenerationConfigParams and GenerationConfig.toJSON(), but the four prediction methods in src/client/predictions.ts all manually construct the config object for the API request, only including the five original fields (detail, json_schema, confidence, grounding, gql_stmt). The new fields are silently dropped.
The affected methods are:
1. ImagePredictions.generate (line ~206)
2. FilePredictions.generate (line ~329)
3. FilePredictions.execute (line ~394)
4. WebPredictions.generate (line ~457)
The fix should add the three new fields to each of these manual config constructions, using the same conditional spread pattern as in toJSON():
...(config?.videoSegmentDuration !== undefined && { video_segment_duration: config.videoSegmentDuration }),
...(config?.videoFramesPerSegment !== undefined && { video_frames_per_segment: config.videoFramesPerSegment }),
...(config?.pageIndices !== undefined && { page_indices: config.pageIndices }),
Alternatively, refactor these methods to use new GenerationConfig(config).toJSON() instead of manually constructing the config object, which would prevent this class of bug in the future (similar to how src/client/domains.ts:48 does it).
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Code Review
This pull request adds videoSegmentDuration, videoFramesPerSegment, and pageIndices to the GenerationConfig class and its associated parameter interface. While the fields are correctly defined and serialized, feedback highlights that they are not yet integrated into the API request logic within src/client/predictions.ts, rendering them non-functional in the current state.
| videoSegmentDuration?: number; | ||
| videoFramesPerSegment?: number; | ||
| pageIndices?: number[]; |
There was a problem hiding this comment.
The new parameters videoSegmentDuration, videoFramesPerSegment, and pageIndices are correctly added to the GenerationConfigParams interface and GenerationConfig class, but they are not yet integrated into the API request logic.
Currently, the prediction methods in src/client/predictions.ts (specifically ImagePredictions.generate, FilePredictions.generate, FilePredictions.execute, and WebPredictions.generate) manually reconstruct the config object for the API payload and omit these new fields. For example, in FilePredictions.generate (lines 329-335 of src/client/predictions.ts), the config object is hardcoded with only a subset of fields.
To make these new parameters functional, the mapping logic in src/client/predictions.ts must be updated to include these parameters in the payload sent to the API.
Summary
Adds three new optional parameters to
GenerationConfig(andGenerationConfigParams) to expose server-side video transcription and document processing controls:videoSegmentDuration(number): Duration in seconds for each video segment when chunking for transcription (maps tovideo_segment_duration)videoFramesPerSegment(number): Number of frames to sample per video segment (maps tovideo_frames_per_segment)pageIndices(number[]): 0-indexed page indices to selectively process document pages (maps topage_indices)These are conditionally included in
toJSON()only when set, to avoid sending unnecessary defaults to the API.Companion PRs: vlm-lab (adds
page_indicesto backendGenerationConfig) and vlmrun-python-sdk (same fields in Python SDK).Review & Testing Checklist for Human
toJSON()(lines 357-359) correctly omits fields whenundefined— note this differs from existing fields which are always serialized. Confirm this is the desired behavior with the backend.videoSegmentDuration >= 1.0andvideoFramesPerSegment >= 1(backend enforces these constraints)Notes
GenerationConfigParamstype andGenerationConfigclass both updated to keep them in syncLink to Devin session: https://app.devin.ai/sessions/2f2d14e8cb55450c80d380356f9899b6
Requested by: @dineshreddy91