Skip to content

Designing my video-based pipeline #5888

@sdpetrides

Description

@sdpetrides

Describe the question.

I'm very interested in using DALI for my model training. My simple benchmarks show great speed-ups over OpenCV when loading directly to the GPU.

However, I'd like some insight, from the developer/maintainer/community, on how I can adapt my current data loading paradigm to DALI. I believe I can adapt everything except for two aspects.

First, the datasets I am working with do not have consistent frame rates. They can range from 20 to 60 FPS. Currently, I control for this when I sample/read frames to make sure the "best" frame is chosen for a given scenario. With rn.readers.video having constant stride param complicates this.

Second, I would like the flexibility to sample at different rates for different segments of frame sequence. For example, I would like the ability to sample at 1 FPS for X frames at offset 0, then 2 FPS for Y frames at offset X for a total of X + Y frames in the resulting tensor.

Third, some datasets are stored as images rather than videos I'll need to support both.

Lastly, I'd like to keep the PyTorch DataLoader and Sampler interface, at the top level. (This one I am willing to budge if the other challenges can be solved.) It seems there are ways to do this but they may not be compatible with my demands above.

--
TLDR: I'd like explicit control over that frames are included to the sequence when loading directly to the GPU, and I want to control over batching, sampling, distribution across devices.

Looking forward to any suggestions, thank you!

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions