Feature description
Introduce reusable YAML/JSON-based processing pipeline presets that allow users to define and save multi-step workflows. This would let users execute common processing chains with a single command instead of manually repeating each operation every time.
Example configuration:
pipeline:
- extract_frames
- remove_background
- resize: 512x512
- convert: webp
This feature would make the project significantly more scalable and production-ready for media preprocessing and AI dataset engineering workflows.
Problem this solves
Currently, users need to manually repeat the same sequence of operations for every dataset or media batch. This becomes inefficient and error-prone when working with large-scale preprocessing workflows or iterative experimentation.
For example, users preparing AI training datasets may repeatedly:
- extract frames
- resize images
- remove backgrounds
- convert formats
Manually running these steps every time reduces reproducibility and slows down workflow automation.
Proposed solution
Add support for reusable pipeline configuration files using YAML or JSON.
Possible implementation idea:
- Create a
pipelines/ directory for user-defined presets
- Add CLI support such as:
reframe run pipeline.yaml
- Parse pipeline steps sequentially
- Allow parameterized operations
- Validate configs before execution
- Provide execution logs and step-level error reporting
Potential future enhancements:
- conditional steps
- parallel execution
- pipeline templates
- plugin/custom processor support
Alternatives considered
An alternative approach would be shell scripts or manually chaining CLI commands together. However:
- scripts are less portable
- harder to validate
- difficult for non-technical users
- lack standardized structure
A built-in pipeline system would provide a cleaner and more maintainable workflow experience.
Additional context
This feature could significantly improve usability for:
- AI dataset preparation
- bulk media processing
- automated preprocessing workflows
- reproducible experiments
It would also make the project more attractive for production and research use cases where repeatable processing pipelines are essential.
Feature description
Introduce reusable YAML/JSON-based processing pipeline presets that allow users to define and save multi-step workflows. This would let users execute common processing chains with a single command instead of manually repeating each operation every time.
Example configuration:
This feature would make the project significantly more scalable and production-ready for media preprocessing and AI dataset engineering workflows.
Problem this solves
Currently, users need to manually repeat the same sequence of operations for every dataset or media batch. This becomes inefficient and error-prone when working with large-scale preprocessing workflows or iterative experimentation.
For example, users preparing AI training datasets may repeatedly:
Manually running these steps every time reduces reproducibility and slows down workflow automation.
Proposed solution
Add support for reusable pipeline configuration files using YAML or JSON.
Possible implementation idea:
pipelines/directory for user-defined presetsPotential future enhancements:
Alternatives considered
An alternative approach would be shell scripts or manually chaining CLI commands together. However:
A built-in pipeline system would provide a cleaner and more maintainable workflow experience.
Additional context
This feature could significantly improve usability for:
It would also make the project more attractive for production and research use cases where repeatable processing pipelines are essential.