I'm training loras for wan 2.1. Wan needs video to be 16 fps and in lots of (n*16)+1 frames, so 17 frames, 33 frames, 49 frames etc.
Usually I use 50 frames just to be on the safe side, but now I have some videos that are exactly 49 frames and I have no option to change that. Unfortunately diffusion pipe rejects them all, saying:
'video with frames=48 is being skipped because it is too short'
The problem is due to the way dataset.py calculates how many frames there are.
try:
if image_file.suffix in VIDEO_EXTENSIONS:
# 100% accurate frame count, but much slower.
# frames = 0
# for frame in imageio.v3.imiter(image_file):
# frames += 1
# height, width = frame.shape[:2]
# TODO: this is an estimate of frame count. What happens if variable frame rate? Is
# it still close enough?
meta = imageio.v3.immeta(filepath_or_file)
first_frame = next(imageio.v3.imiter(filepath_or_file))
height, width = first_frame.shape[:2]
assert self.framerate is not None, "Need model framerate but don't have it. This shouldn't happen. Is the framerate attribute on the model set?"
frames = int(self.framerate * meta['duration'])
else:
pil_img = Image.open(filepath_or_file)
width, height = pil_img.size
frames = 1
There's code to count the exact frames, but it's too slow in practice (I tried it). So instead diffusion pipe works it out based on the frame rate and the video duration. Unfortunately this is inaccurate, probably due to floating point issues or mp4 just not being exactly right about the duration.
A quick hack to fix this is to see how close the frame count is to a complete new frame, and if it's close enough to just +1 to the frame count. I set it at 95% of a complete frame, and since it returns a frame count of 48.96 for my videos, this works out fine.
if image_file.suffix in VIDEO_EXTENSIONS:
meta = imageio.v3.immeta(filepath_or_file)
first_frame = next(imageio.v3.imiter(filepath_or_file))
height, width = first_frame.shape[:2]
assert self.framerate is not None, "Need model framerate but don't have it. This shouldn't happen. Is the framerate attribute on the model set?"
frames_exact = self.framerate * meta['duration']
frames = int(frames_exact)
# HACK: if the frame count is suspiciously close to the next integer, round up.
# This works around floating point imprecision in meta['duration'] causing e.g.
# a 49-frame video to be counted as 48 and skipped or bucketed incorrectly.
if (frames_exact - frames) > 0.95:
print(f'WARNING (hack): frame count {frames_exact:.6f} rounded up from {frames} to {frames + 1} for {image_file}')
frames += 1
else:
pil_img = Image.open(filepath_or_file)
width, height = pil_img.size
frames = 1
Now I get a blizzard of warnings about 48.96 frame videos being rounded up to 49 frames, but other than that it seems to load the videos and train on them just fine now. (I've yet to complete a training run since adding this code, but so far it isn't crashing and seems to work.)
I don't know if this is a viable solution in general. Perhaps it could be enabled as an option for situations where you have the exact number of frames you need for training and can't change it.
I'm training loras for wan 2.1. Wan needs video to be 16 fps and in lots of (n*16)+1 frames, so 17 frames, 33 frames, 49 frames etc.
Usually I use 50 frames just to be on the safe side, but now I have some videos that are exactly 49 frames and I have no option to change that. Unfortunately diffusion pipe rejects them all, saying:
'video with frames=48 is being skipped because it is too short'
The problem is due to the way dataset.py calculates how many frames there are.
There's code to count the exact frames, but it's too slow in practice (I tried it). So instead diffusion pipe works it out based on the frame rate and the video duration. Unfortunately this is inaccurate, probably due to floating point issues or mp4 just not being exactly right about the duration.
A quick hack to fix this is to see how close the frame count is to a complete new frame, and if it's close enough to just +1 to the frame count. I set it at 95% of a complete frame, and since it returns a frame count of 48.96 for my videos, this works out fine.
Now I get a blizzard of warnings about 48.96 frame videos being rounded up to 49 frames, but other than that it seems to load the videos and train on them just fine now. (I've yet to complete a training run since adding this code, but so far it isn't crashing and seems to work.)
I don't know if this is a viable solution in general. Perhaps it could be enabled as an option for situations where you have the exact number of frames you need for training and can't change it.