Skip to content

[FEATURE] Reusable YAML/JSON-based processing pipeline #791

@EchoOfCode

Description

@EchoOfCode

Feature description
Introduce reusable YAML/JSON-based processing pipeline presets that allow users to define and save multi-step workflows. This would let users execute common processing chains with a single command instead of manually repeating each operation every time.

Example configuration:

pipeline:
  - extract_frames
  - remove_background
  - resize: 512x512
  - convert: webp

This feature would make the project significantly more scalable and production-ready for media preprocessing and AI dataset engineering workflows.

Problem this solves
Currently, users need to manually repeat the same sequence of operations for every dataset or media batch. This becomes inefficient and error-prone when working with large-scale preprocessing workflows or iterative experimentation.

For example, users preparing AI training datasets may repeatedly:

  • extract frames
  • resize images
  • remove backgrounds
  • convert formats

Manually running these steps every time reduces reproducibility and slows down workflow automation.

Proposed solution
Add support for reusable pipeline configuration files using YAML or JSON.

Possible implementation idea:

  • Create a pipelines/ directory for user-defined presets
  • Add CLI support such as:
reframe run pipeline.yaml
  • Parse pipeline steps sequentially
  • Allow parameterized operations
  • Validate configs before execution
  • Provide execution logs and step-level error reporting

Potential future enhancements:

  • conditional steps
  • parallel execution
  • pipeline templates
  • plugin/custom processor support

Alternatives considered
An alternative approach would be shell scripts or manually chaining CLI commands together. However:

  • scripts are less portable
  • harder to validate
  • difficult for non-technical users
  • lack standardized structure

A built-in pipeline system would provide a cleaner and more maintainable workflow experience.

Additional context
This feature could significantly improve usability for:

  • AI dataset preparation
  • bulk media processing
  • automated preprocessing workflows
  • reproducible experiments

It would also make the project more attractive for production and research use cases where repeatable processing pipelines are essential.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions