Describe the question.
I'm very interested in using DALI for my model training. My simple benchmarks show great speed-ups over OpenCV when loading directly to the GPU.
However, I'd like some insight, from the developer/maintainer/community, on how I can adapt my current data loading paradigm to DALI. I believe I can adapt everything except for two aspects.
First, the datasets I am working with do not have consistent frame rates. They can range from 20 to 60 FPS. Currently, I control for this when I sample/read frames to make sure the "best" frame is chosen for a given scenario. With rn.readers.video having constant stride param complicates this.
Second, I would like the flexibility to sample at different rates for different segments of frame sequence. For example, I would like the ability to sample at 1 FPS for X frames at offset 0, then 2 FPS for Y frames at offset X for a total of X + Y frames in the resulting tensor.
Third, some datasets are stored as images rather than videos I'll need to support both.
Lastly, I'd like to keep the PyTorch DataLoader and Sampler interface, at the top level. (This one I am willing to budge if the other challenges can be solved.) It seems there are ways to do this but they may not be compatible with my demands above.
--
TLDR: I'd like explicit control over that frames are included to the sequence when loading directly to the GPU, and I want to control over batching, sampling, distribution across devices.
Looking forward to any suggestions, thank you!
Check for duplicates
Describe the question.
I'm very interested in using DALI for my model training. My simple benchmarks show great speed-ups over OpenCV when loading directly to the GPU.
However, I'd like some insight, from the developer/maintainer/community, on how I can adapt my current data loading paradigm to DALI. I believe I can adapt everything except for two aspects.
First, the datasets I am working with do not have consistent frame rates. They can range from 20 to 60 FPS. Currently, I control for this when I sample/read frames to make sure the "best" frame is chosen for a given scenario. With
rn.readers.videohaving constantstrideparam complicates this.Second, I would like the flexibility to sample at different rates for different segments of frame sequence. For example, I would like the ability to sample at 1 FPS for X frames at offset 0, then 2 FPS for Y frames at offset X for a total of X + Y frames in the resulting tensor.
Third, some datasets are stored as images rather than videos I'll need to support both.
Lastly, I'd like to keep the PyTorch DataLoader and Sampler interface, at the top level. (This one I am willing to budge if the other challenges can be solved.) It seems there are ways to do this but they may not be compatible with my demands above.
--
TLDR: I'd like explicit control over that frames are included to the sequence when loading directly to the GPU, and I want to control over batching, sampling, distribution across devices.
Looking forward to any suggestions, thank you!
Check for duplicates