Skip to content

Feature - stitching multiple video streams#133

Merged
sneakers-the-rat merged 94 commits into
mainfrom
feat-stitch-video
Apr 21, 2026
Merged

Feature - stitching multiple video streams#133
sneakers-the-rat merged 94 commits into
mainfrom
feat-stitch-video

Conversation

@t-sasatani
Copy link
Copy Markdown
Collaborator

@t-sasatani t-sasatani commented Sep 11, 2025

This PR adds a stitching function for multiple data streams based on device metadata. Also makes frame index operations rigorous to support this.

The metadata CSV for testing stitching is also inflating the line count, but I'm sorry that this ended up quite long.

Functions

  1. Load multiple pairs of videos (.avi) and metadata (.csv) from parallel recording streams.
  2. Align frames across recordings using frame_num from metadata.
  3. For each aligned frame, score candidates to pick the best version:
    • Metadata scoring: prefer frames with more buffers and less black padding.
    • Image scoring (tiebreaker): use Sobel edge detection to pick the least sharp frame (high edge response likely indicates artifacts/noise).
  4. Reconstruct the selected frames into a single stitched video and metadata file, with reconstructed_frame_index renumbered sequentially.
  5. Optionally export debug outputs (composite comparison video + CSV with per-frame selection decisions, diff pixel counts, and edge scores).
  6. Supports a full workflow pipeline (mio process workflow): stitch → crop (trim start/end frames) → denoise, with upfront validation and frame-count alignment checks at each step.

Main updates

  • mio/process/stitch.py: Main part that scores multiple recording videos from a bundle, gets the best ones, and exports while tracking and adjusting the metadata too.
  • mio/process/video.py: Made the frame index tracking much more rigorous and better path resolving for getting a data bundle after running.
  • mio/cli/process.py: Add workflow and stitch CLI commands alongside the existing denoise, add interactive prompts to proceed/quit when metadata validation fails.
  • mio/models/stitch.py: Model for tracking frames from multiple video sources.
  • mio/utils.py: Added some validation functions and formatters for video-metadata pairs.
  • tests/test_process/test_stitch.py: Testing for stitching and cropping frames. Tests stuff like alignment of video and metadata, video hash, etc.

📚 Documentation preview 📚: https://miniscope-io--133.org.readthedocs.build/en/133/

@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Sep 11, 2025

Coverage Status

coverage: 72.344% (-8.3%) from 80.624%
when pulling dbf07a9 on feat-stitch-video
into 338e847 on main.

@t-sasatani t-sasatani force-pushed the feat-stitch-video branch 3 times, most recently from 0c0926b to aba37b4 Compare September 23, 2025 14:06
@t-sasatani t-sasatani force-pushed the feat-stitch-video branch 5 times, most recently from e987781 to 9b77947 Compare October 1, 2025 13:59
Comment thread mio/cli/process.py
default=0,
help="Number of frames to remove from the end (default: 0).",
)
def workflow(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes from the original here:

  • removed all the error checking that happens in the middle: if there are errors we should raise them in the processing methods themselves, otherwise casting to Recording validates that the frame counts are identical between the video and the metadata csv
  • didn't override the output_dir in the denoise config: we should use that if it's present, if it should output to another directory, the config should be changed.

otherwise should be the same. please try this out and see if it breaks your workflow, lmk what needs to change or what i have broken

Copy link
Copy Markdown
Collaborator

@sneakers-the-rat sneakers-the-rat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok returning to you @t-sasatani - managed to take about a thousand lines off while adding the dataset models, I didn't try and retrofit everything here to use them, but hopefully it's clear why they might be useful: we should avoid trying to juggle a million paths manually and should instead create a model of how all this stuff fits together. In the future this should also be some metadata file rather than being based on file naming conventions, but again trying to just set us up for future steps, and it ended up being simpler to add the dataset models now than having multiple different methods for validating and locating metadata files/etc.

I think the major observable difference here is in path manipulation, i am trying to go for a model where all the data for a given dataset stays in that directory, and for now a lot of that is flat with filenames to indicate what things are, but trying not to have multiple different directories where files from the same dataset are stored. once we get a better handle on how we want to organize files we can add back in more subdirectory structure within the context of a dataset.

give this a run and see if it does all you expect it to - the tests do pass, and they seem to test the major functionality, but i want to be sure i didn't break anything you are relying on (if there is anything that breaks when you run it, lmk and i can write more tests)

@sneakers-the-rat
Copy link
Copy Markdown
Collaborator

if you're good with this, would you do the honors of writing the changelog and then feel free to merge :) otherwise lmk if changes requested

Comment thread tests/test_cli_process.py Outdated

STITCH_DATA_DIR = Path(__file__).parent / "data" / "stitch"

EXPECTED_STITCHED_VIDEO_HASH = "69217a3079908094e11121d042354a7c1f55b6482ca1a51e1b250dfd1ed0eef9"
Copy link
Copy Markdown
Collaborator Author

@t-sasatani t-sasatani Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed the output file name was different, and this hash you changed to seems to be a hash for an empty output.

>>> import hashlib
>>> print(hashlib.new('blake2s').hexdigest())
69217a3079908094e11121d042354a7c1f55b6482ca1a51e1b250dfd1ed0eef9

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy shit good catch lmao thank you

@t-sasatani
Copy link
Copy Markdown
Collaborator Author

t-sasatani commented Apr 17, 2026

I haven't read through all the changes yet, but I at least ran the script locally and made some fixes.
The stitching part is fairly decoupled from the other parts, so I think it's good to merge it. The following are more like notes. Might be worth just showing better error messages, but also not critical:

  • This is an upstream issue, but there are some datasets where the metadata and frame count are off by one (don't know yet if this is a termination issue or anything else). I think I made a small prompt asking whether the code should move forward, but the current version just stopping is probably better.
  • The code fails when the output filepath exists. My typical way to run this is to change some processing parameters and rerun the same command, expecting to overwrite, so I got confused for a moment. But I guess this is me overfitting to a suboptimal interface, and I'm not sure what a good interface would be.

@sneakers-the-rat
Copy link
Copy Markdown
Collaborator

there are some datasets where the metadata and frame count are off by one

As in the metadata has an extra frame that is not in the video? Or what? Maybe we just trim that from the metadata?

The code fails when the output filepath exists.

What about a --force flag that enables overwriting?

@t-sasatani
Copy link
Copy Markdown
Collaborator Author

the metadata has an extra frame that is not in the video?
Yes, and it's tricky that I have no idea how to reproduce this. I think removing the extra metadata row before processing makes the most sense.

What about a --force flag that enables overwriting?
Yes, that'll be perfect. Don't know why I didn't think of this.

@sneakers-the-rat
Copy link
Copy Markdown
Collaborator

ok added those, will need to write tests for the --force flag and whatnot

@sneakers-the-rat
Copy link
Copy Markdown
Collaborator

ok i'm gonna merge this puppy and please raise issues with anything that comes up!!!

@sneakers-the-rat sneakers-the-rat merged commit 52d0257 into main Apr 21, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants