feat: declarative client section in pipeline YAML#181
feat: declarative client section in pipeline YAML#181staging-devin-ai-integration[bot] wants to merge 17 commits intomainfrom
client section in pipeline YAML#181Conversation
- Add ClientSection, PublishConfig, WatchConfig, InputConfig, OutputConfig, FieldHint, FieldType types to yaml.rs with serde + ts-rs derives - Add client field to UserPipeline::Steps and UserPipeline::Dag variants - Add client field to compiled Pipeline struct in lib.rs - Update compile(), compile_steps(), compile_dag() to forward client - Add pre-validation in parse_yaml() for better error messages - Update populate_session_pipeline() to preserve name/description/mode/client - Update TS type generator to export all client section types - Add clientSection.ts helpers (extractClientSection, deriveSettingsFromClient) - Add comprehensive Rust + TS unit tests Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Add client sections to all 16 dynamic pipelines (gateway + relay patterns) - Add client sections to all 18 oneshot pipelines (file_upload, text, trigger, none) - Remove all 9 '# skit:' magic comments, replaced by client.input.asset_tags and client.input.field_hints - Classify each pipeline with appropriate input/output types Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Replace extractMoqPeerSettings() internals with client-section reads, removing 6 internal graph-scanning helpers (~170 lines deleted) - Update useMonitorPreview to read pipeline.client instead of scanning the compiled node graph for transport::moq::peer nodes - Rewrite moqPeerSettings tests for declarative client-section behavior - Add client: null to Pipeline objects in test files for type compliance Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
…ent section reads - Delete ~10 heuristic functions from ConvertView.tsx: checkIfTranscriptionPipeline, checkIfTTSPipeline, checkIfNoInputPipeline, checkIfVideoPipeline, FORMAT_ACCEPT_MAP, detectInputFormatSpec, resolveFormatsForField, formatHintForField, detectExpectedFormats, detectInputAssetTags - Derive isTranscription/isTTS/isNoInput/isVideo from client section useMemo - Replace format detection with parseAcceptToFormats(client.input.accept) - Add getFieldHint callback using client.input.field_hints - Remove isTranscriptionPipeline/isTTSPipeline/isNoInputPipeline state from useConvertViewState.ts (now derived values) - Update moqPeerSettings.ts to use parseClientFromYaml from clientSection.ts - Move assetMatchesTag outside the component - Add 14 new tests for parseAcceptToFormats and parseClientFromYaml - All 280 tests pass, lint clean Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Add a lint pass that validates the client section against the pipeline mode and internal consistency. Returns warnings (never blocks compilation). Rules: 1. mode-mismatch-dynamic — dynamic with oneshot fields 2. mode-mismatch-oneshot — oneshot with dynamic fields 3. missing-gateway — publish/watch without gateway_path or relay_url 4. publish-no-media — publish with both audio+video false 5. watch-no-media — watch with both audio+video false 6. input-none-with-accept — none input with accept set 7. input-trigger-with-accept — trigger input with accept set 8. field-hints-no-input — field_hints on none input 9. asset-tags-no-input — asset_tags on none/text input 10. text-no-placeholder — text input missing placeholder 11. empty-broadcast — empty broadcast string 12. duplicate-broadcast — publish.broadcast == watch.broadcast Includes 14 lint tests covering all rules plus clean-pass cases. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Remove the import of MoqPeerSettings type from clientSection.ts to prevent a circular module dependency that caused CI test failures. Inline the return type shape instead. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
| (fieldName: string): { accept?: string; hint?: string } => { | ||
| const hints = client?.input?.field_hints; | ||
| if (hints) { | ||
| const fh = hints[fieldName]; | ||
| if (fh) { | ||
| const formats = parseAcceptToFormats(fh.accept); | ||
| if (formats && formats.length > 0) { | ||
| const accept = formats.map((f) => `.${f}`).join(','); | ||
| const label = formats.map((f) => f.toUpperCase()).join(', '); | ||
| return { accept, hint: `Expected format${formats.length > 1 ? 's' : ''}: ${label}` }; | ||
| } | ||
| if (fh.accept) return { accept: fh.accept }; | ||
| } | ||
| } | ||
| // Fall back to top-level input accept | ||
| const topFormats = parseAcceptToFormats(client?.input?.accept); | ||
| if (topFormats && topFormats.length > 0) { | ||
| const accept = topFormats.map((f) => `.${f}`).join(','); | ||
| const label = topFormats.map((f) => f.toUpperCase()).join(', '); | ||
| return { accept, hint: `Expected format${topFormats.length > 1 ? 's' : ''}: ${label}` }; | ||
| } | ||
| return {}; | ||
| }, | ||
| [client] |
There was a problem hiding this comment.
🚩 ConvertView field_hints.type is parsed but never used for rendering decisions
The FieldHint type includes a type field (FieldType: 'file' | 'text') at ui/src/types/generated/api-types.ts:520, and the getFieldHint callback at ui/src/views/ConvertView.tsx:645-668 reads field_hints to derive accept and hint strings. However, fh.field_type (or fh.type in TS) is never used to decide whether to render a FileUpload or a TextArea component. Currently, all fields in multi-upload mode get FileUpload components regardless of their declared type. This limits the expressiveness of field_hints — a field marked type: text still gets a file upload control. This is relevant to the bug reported for pocket-tts-voice-clone.yml but is also a broader design gap that could affect future pipelines with mixed field types.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged. The field_hints[].type field is defined in the schema but not yet used by ConvertView to switch between FileUpload and TextArea per-field. This is intentional scope deferral — the current TTS pipelines work because input.type: text triggers the TTS code path globally. A future follow-up could use field_hints.type for mixed-mode forms (e.g., one text field + one file field without making the whole pipeline "TTS mode").
There was a problem hiding this comment.
Acknowledged — this is a known limitation. The field_hints[].type field (file | text) is currently only used by the single-field rendering path (which already switches between FileUpload and TextArea based on client.input.type). For multi-field uploads, all fields currently get FileUpload controls. The pocket-tts-voice-clone.yml pipeline works correctly because it uses the single-field TTS path (text input), not multi-field upload. Supporting mixed field types in multi-upload mode is a valid enhancement for a follow-up.
- Box ApiPipeline in ResponsePayload::Pipeline to fix large_enum_variant - Use clone_from() in populate_session_pipeline for assigning_clones - Add missing client field in compositor benchmark Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
- Change pocket-tts-voice-clone.yml input.type from file_upload to text (matches other TTS pipelines — the old heuristic detected text_chunker) - Add fallback in prepareUploads for custom pipelines without a client section that also lack http_input, preventing silent no-ops Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
ui/src/hooks/useMonitorPreview.ts
Outdated
| // Read gateway_path and output_broadcast from the pipeline's client section. | ||
| const client = pipeline?.client ?? null; | ||
| if (client) { | ||
| if (client.gateway_path) { | ||
| const currentUrl = useStreamStore.getState().serverUrl; | ||
| if (currentUrl) { | ||
| previewSetServerUrl(updateUrlPath(currentUrl, client.gateway_path)); | ||
| } | ||
| } | ||
| if (outputBroadcast) { | ||
| previewSetOutputBroadcast(outputBroadcast); | ||
| if (client.watch?.broadcast) { | ||
| previewSetOutputBroadcast(client.watch.broadcast); | ||
| } | ||
| } | ||
|
|
||
| // Detect which media types the pipeline outputs by checking the kinds of | ||
| // nodes connected to the moq_peer's input pins. | ||
| let outputsAudio = true; | ||
| let outputsVideo = true; | ||
| if (pipeline && moqNodeName) { | ||
| outputsAudio = false; | ||
| outputsVideo = false; | ||
| for (const conn of pipeline.connections) { | ||
| if (conn.to_node !== moqNodeName) continue; | ||
| const sourceNode = pipeline.nodes[conn.from_node]; | ||
| if (!sourceNode?.kind) continue; | ||
| if (sourceNode.kind.startsWith('audio::')) outputsAudio = true; | ||
| else if (sourceNode.kind.startsWith('video::')) outputsVideo = true; | ||
| } | ||
| } | ||
| // Media types default to both enabled unless the client section | ||
| // explicitly declares which types the pipeline outputs. | ||
| const outputsAudio = client?.watch?.audio ?? true; | ||
| const outputsVideo = client?.watch?.video ?? true; |
There was a problem hiding this comment.
📝 Info: MonitorView fallback deriveMoqConfigFromNodes only detects gateway peer pattern
The fallback for sessions without a client section (deriveMoqConfigFromNodes at useMonitorPreview.ts:33-58) only searches for transport::moq::peer nodes, not separate transport::moq::publisher/transport::moq::subscriber nodes. The old code in the pre-PR extractMoqPeerSettings handled both patterns. For interactively-created sessions using the external relay pattern (separate pub/sub nodes) without a client section, the fallback returns the default {outputsAudio: true, outputsVideo: true}, which is a reasonable default (just requests both media types). This only affects edge cases since all template-based sessions now have client sections, and the monitor preview would still connect successfully.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Acknowledged — this is an intentional tradeoff. Interactive MoQ session creation via drag-and-drop doesn't go through YAML compilation, so pipeline.client will be null. The media type defaults (true/true) are safe since the preview will attempt to subscribe to both and gracefully handle missing tracks. If interactive MoQ sessions become common, a future change could populate client from node params at session creation time.
There was a problem hiding this comment.
Correct — the fallback only handles the transport::moq::peer (gateway) pattern since that's the pattern used by all template pipelines. Separate publisher/subscriber nodes in interactive sessions without a client section will get the safe default of {outputsAudio: true, outputsVideo: true}, which means preview subscribes to both track types. This is an acceptable tradeoff since all standard pipelines now have explicit client sections.
| export function extractMoqPeerSettings(yamlContent: string): MoqPeerSettings | null { | ||
| try { | ||
| const parsed = load(yamlContent) as ParsedYaml; | ||
|
|
||
| if (!parsed || typeof parsed !== 'object' || !parsed.nodes) { | ||
| return null; | ||
| } | ||
| const client = parseClientFromYaml(yamlContent); | ||
| if (!client) return null; | ||
|
|
||
| // Find the first node with kind 'transport::moq::peer' | ||
| let peerNodeName: string | null = null; | ||
| let peerNodeConfig: ParsedNode | null = null; | ||
| for (const [name, nodeConfig] of Object.entries(parsed.nodes)) { | ||
| if (nodeConfig.kind === 'transport::moq::peer') { | ||
| peerNodeName = name; | ||
| peerNodeConfig = nodeConfig; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| // Gateway pattern: transport::moq::peer | ||
| if (peerNodeName && peerNodeConfig?.params) { | ||
| const { needsAudio, needsVideo } = detectPeerInputMediaTypes(peerNodeName, parsed.nodes); | ||
| const { outputsAudio, outputsVideo } = detectPeerOutputMediaTypes(peerNodeName, parsed.nodes); | ||
|
|
||
| return { | ||
| gatewayPath: peerNodeConfig.params.gateway_path, | ||
| inputBroadcast: peerNodeConfig.params.input_broadcast, | ||
| outputBroadcast: peerNodeConfig.params.output_broadcast, | ||
| hasInputBroadcast: Boolean(peerNodeConfig.params.input_broadcast), | ||
| needsAudioInput: needsAudio, | ||
| needsVideoInput: needsVideo, | ||
| outputsAudio, | ||
| outputsVideo, | ||
| }; | ||
| } | ||
|
|
||
| // External relay pattern: separate publisher/subscriber nodes | ||
| return extractPubSubSettings(parsed.nodes); | ||
| } catch { | ||
| // Only return settings for dynamic pipelines that declare MoQ transport. | ||
| if (!client.gateway_path && !client.relay_url && !client.publish && !client.watch) { | ||
| return null; | ||
| } | ||
|
|
||
| return deriveSettingsFromClient(client); | ||
| } |
There was a problem hiding this comment.
🚩 extractMoqPeerSettings now returns null for YAML without client section — check all callers
The extractMoqPeerSettings function in ui/src/utils/moqPeerSettings.ts:36-46 was rewritten from scanning the node graph (finding transport::moq::peer, transport::moq::publisher, transport::moq::subscriber nodes) to reading only from the client section. For YAML without a client section, it now returns null instead of extracted settings. If there are callers beyond useMonitorPreview (which was already migrated to read pipeline.client directly), they would lose MoQ detection for pipelines without client. The test file was updated but external callers should be verified.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Verified — extractMoqPeerSettings is called from two places: StreamView.tsx (template selector) and moqPeerSettings.test.ts. Both operate on template YAML strings which all now have client sections. The useMonitorPreview hook reads pipeline.client directly from the API response, not via extractMoqPeerSettings. No external callers are affected.
… type Without #[serde(default)], the generated TS type was 'client: ClientSection | null' instead of 'client?: ClientSection | null', causing runtime undefined vs null mismatch when the JSON key is omitted. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
There was a problem hiding this comment.
🟡 Hardcoded TTS placeholder ignores per-pipeline client.input.placeholder value
The ConvertView text area uses a hardcoded placeholder mentioning "Kokoro TTS" (ui/src/views/ConvertView.tsx:1328), but the pipeline YAML's client.input.placeholder field (e.g., "Enter text to synthesize" in kokoro-tts.yml, piper-tts.yml, matcha-tts.yml, etc.) is never read. The client object is available at ui/src/views/ConvertView.tsx:586 via useMemo, but its placeholder property is unused. This causes a misleading UX for non-Kokoro TTS pipelines (Piper, Matcha, Supertonic, Pocket TTS) where the hint text incorrectly references Kokoro. The pocket-tts-voice-clone.yml also defines a per-field placeholder ("Enter text to synthesize") via field_hints.text.placeholder that is similarly ignored.
(Refers to line 1328)
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
The hardcoded placeholder predates this PR — it was already there for the existing TTS UI path. Now that client.input.placeholder is available, it would be a nice improvement to use it, but that's a pre-existing UX issue rather than a regression introduced here. I'll note it as a follow-up enhancement.
…and tests - Add lint_client_against_nodes() with 8 cross-validation rules (13-20): input-requires-http-input, input-none-has-http-input, field-hint-unknown-field, publish-no-transport, watch-no-transport, gateway-path-mismatch, relay-url-mismatch, broadcast-mismatch - Add NodeInfo struct for lightweight node representation - Derive Default on ClientSection for ergonomic test construction - Add 16 Rust unit tests covering all 8 new lint rules (positive + negative) - Remove leftover # skit: comment from loadtest sample pipeline - Add 14 UI unit tests for monitor preview config derivation and ConvertView pipeline classification (trigger/none, output-type renderer) Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
All 10 oneshot pipelines that use ogg::demuxer → opus::decoder only accept Ogg/Opus input, not arbitrary audio formats. Change accept from the blanket 'audio/*' to the specific 'audio/ogg' MIME type so the browser file picker filters correctly. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
audio/opus is the correct MIME type for Opus-encoded audio, which is what all these pipelines actually decode via opus::decoder. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Parse pipelineYaml once into a shared useMemo and derive both the client section (via extractClientFromParsed) and http_input fields (via deriveHttpInputFieldsFromParsed) from the same parsed object. Previously parseClientFromYaml and deriveHttpInputFields each independently called load(yaml), parsing the same string twice on every pipelineYaml change. Also adds extractClientFromParsed() to clientSection.ts with 4 unit tests. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
ui/src/utils/clientSection.ts
Outdated
| /** | ||
| * Derives `MoqPeerSettings`-shaped data from a declarative `ClientSection`. | ||
| * | ||
| * The return type mirrors the `MoqPeerSettings` interface defined in | ||
| * `moqPeerSettings.ts`. We avoid importing that interface here to prevent | ||
| * a circular module dependency. | ||
| */ |
There was a problem hiding this comment.
This isn't amazing, surely there's a way to use the original type and avoid the circular import.
There was a problem hiding this comment.
Good call — moved deriveSettingsFromClient into moqPeerSettings.ts where the MoqPeerSettings interface lives, so it now uses the proper type directly instead of duplicating the shape inline. No more circular import workaround needed.
Eliminates the duplicated inline return type by co-locating the function with the MoqPeerSettings interface it returns. No more circular import workaround needed. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
| const yaml = ` | ||
| client: | ||
| input: | ||
| type: file_upload | ||
| accept: "audio/*" | ||
| output: |
There was a problem hiding this comment.
🟡 Test asserts non-existent json_stream output type instead of valid json
The test at ui/src/utils/clientSection.test.ts:441 uses type: json_stream as an output type, but this value does not exist in the Rust OutputType enum (crates/api/src/yaml.rs:197-206), which defines only Transcription, Json, Audio, and Video (serialized via #[serde(rename_all = "snake_case")] as "transcription", "json", "audio", "video"). The generated TypeScript type confirms this: OutputType = "transcription" | "json" | "audio" | "video" (ui/src/types/generated/api-types.ts:514). The test passes only because parseClientFromYaml uses raw YAML parsing (js-yaml) without schema validation, so it round-trips any string. A pipeline YAML with type: json_stream would be rejected by the Rust backend's serde deserialization. The correct value should be json.
| const yaml = ` | |
| client: | |
| input: | |
| type: file_upload | |
| accept: "audio/*" | |
| output: | |
| type: json | |
| steps: | |
| - kind: streamkit::http_input | |
| `; | |
| const client = parseClientFromYaml(yaml); | |
| expect(client?.output?.type).toBe('json'); |
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Fixed — changed json_stream to json to match the Rust OutputType enum.
- useMonitorPreview now falls back to scanning the pipeline's node graph (transport::moq::peer params + connection graph) when pipeline.client is null. This preserves monitor preview for interactively-created sessions that don't have a client section. - Extracted deriveMoqConfigFromNodes() helper to keep callback complexity within lint thresholds. - Fixed test that used non-existent 'json_stream' output type; the correct Rust enum value is 'json'. Co-Authored-By: Claudio Costa <cstcld91@gmail.com>
| type: file_upload | ||
| accept: "audio/opus" |
There was a problem hiding this comment.
🚩 Asset filtering narrower with explicit accept vs old heuristic format detection
The old detectExpectedFormats scanned the pipeline YAML for demuxer/decoder node kinds and inferred compatible formats (e.g. ogg demuxer → ['ogg', 'opus']). The new code uses parseAcceptToFormats(client?.input?.accept) which only returns formats explicitly listed in the accept attribute. Many sample pipelines declare accept: "audio/opus" which maps to ["opus"], excluding .ogg assets that the pipeline could actually process (since it has containers::ogg::demuxer). If there are audio assets with .ogg format, they would no longer appear in the asset picker for these pipelines. Consider updating sample pipeline accept strings to "audio/opus,audio/ogg" or ".opus,.ogg" if OGG files should remain selectable.
Was this helpful? React with 👍 or 👎 to provide feedback.
Debug
There was a problem hiding this comment.
Good observation. The accept: "audio/opus" value was explicitly chosen per reviewer feedback (streamer45 requested audio/opus over audio/* or audio/ogg for precision). The old heuristic matched both ogg and opus extensions because it scanned for demuxer node kinds, but the actual codec in all these pipelines is Opus — OGG is just the container. If OGG-extension assets need to remain selectable, the accept strings can be broadened to "audio/opus,audio/ogg" in a follow-up, but the current values reflect the reviewer's intent.
Summary
Replaces heuristic-based UI detection (string matching, graph inspection,
# skit:magic comments) with an explicit, declarativeclientsection in pipeline YAML. Theclientsection is UI metadata — forwarded unchanged through compilation and ignored by the engine.5-phase implementation:
Schema — Added
ClientSectionand nested types toyaml.rswith serde + ts-rs derives. Addedclient: Option<ClientSection>to bothUserPipelinevariants andPipeline. Pre-validatesclientinparse_yaml()before untagged enum deserialization. Regenerated TS types. CreatedclientSection.tswith extraction and derivation helpers.Migration — Added
clientsections to all 34 sample pipelines (samples/pipelines/{oneshot,dynamic}/). Removed all 9# skit:magic comments. Removed leftover# skit:comment fromsamples/loadtest/.Dynamic UI — Rewrote
extractMoqPeerSettings()to read from client section viaderiveSettingsFromClient(). Deleted 6 internal graph-inspection helpers (~170 lines). UpdateduseMonitorPreview.tsto readpipeline.clientdirectly.Oneshot UI — Rewrote ConvertView to derive
isTranscription/isTTS/isNoInput/isVideofromclient.output.typeandclient.input.typeinstead of string-matching YAML content. Replaced format detection withparseAcceptToFormats(client.input.accept). AddedgetFieldHintcallback usingclient.input.field_hints. Deleted ~10 heuristic functions. Removed 3 boolean state variables fromuseConvertViewState.ts(now derived viauseMemo).Lint pass — Added
lint_client_section()with 12 structural/mode validation rules, pluslint_client_against_nodes()with 8 node-graph cross-validation rules (20 rules total):Test coverage: 292 UI tests (28 new), 78 Rust tests (30 new — 14 client parsing + 16 cross-validation lint tests). All pass, lint clean.
Post-review fixes:
pocket-tts-voice-clone.ymlinput.typefromfile_upload→text(matches other TTS pipelines)prepareUploadsfor custom pipelines without aclientsection#[serde(default)]onPipeline.clientfor correct TS optional type generationclientSection.ts↔moqPeerSettings.tslarge_enum_variantby boxingApiPipelineinResponsePayload::Pipelineassigning_clonesinpopulate_session_pipelineReview & Testing Checklist for Human
just skit/just ui), navigate to/stream, select a dynamic pipeline template — confirm the stream view correctly shows/hides audio/video input and output controls based on theclientsection/convert, select different oneshot templates (transcription, TTS, file conversion, video generation) — confirm input mode (file upload / text / none) and output mode are correctly derived from theclientsection/convert— confirm it renders a text area + voice file upload (not two file uploads)/convert, select a pipeline withasset_tags(e.g. whisper transcription withspeechtag) — confirm the asset picker filters correctlyfield_hintsor specificacceptvalues — confirm file upload dialogs show correct format filtersclientsection accurately describes the pipeline's behaviorNotes
deriveHttpInputFields()is intentionally kept — the server owns the multipart contract and the UI must respect whatever fieldshttp_inputdeclaresstreamStoreHelpers.tsis unchanged — async wait patterns are timing constraints, not UI metadatasplitTtsFields/resolveTextFieldare kept because they're used bybuildTtsUploadsandgenerateCliCommand, which operate on runtime upload mechanics rather than UI classificationpipeline.clientpopulated — media type defaults (true/true) are safe since the preview gracefully handles missing tracksfield_hints[].typeis defined in the schema but not yet used for per-field rendering decisions — a future follow-up could use it for mixed-mode formsNodeInfostruct andlint_client_against_nodes()are public API for future CLI/editor integrationLink to Devin session: https://staging.itsdev.in/sessions/c7489b29ff9d480bb083cd2582affe67
Requested by: @streamer45