Skip to content

CLI for removing specified frame indecies from video/metadata/timestamp bundle#154

Merged
sneakers-the-rat merged 9 commits into
mainfrom
remove-frames
Apr 22, 2026
Merged

CLI for removing specified frame indecies from video/metadata/timestamp bundle#154
sneakers-the-rat merged 9 commits into
mainfrom
remove-frames

Conversation

@t-sasatani
Copy link
Copy Markdown
Collaborator

@t-sasatani t-sasatani commented Mar 4, 2026

@sneakers-the-rat
Copy link
Copy Markdown
Collaborator

This looks like it should follow #133 - that will merge in the fn for removing frames from the metadata, and this should be a pretty simple add on top of that.

@sneakers-the-rat
Copy link
Copy Markdown
Collaborator

alright @t-sasatani i caught this up to main and adapted it to use the recording class (not much to change this time), but I am a bit unclear about what these timestamp metadata files are - how do they get made? it seems from the docstring that they are 1-row per frame with a column for the start timestamp of the frame and a column for the end timestamp - that would be something that would be pretty fast to compute with a group_by(), so in the interest of trying to keep the structure of the datasets simple, would that be something that would be ok to access from some @property on the Recording class? or is it important that we also write those out to file

i preserved your work pre-rebase in this branch: https://github.com/miniscope/mio/tree/remove-frames-pre-rebase

Comment thread mio/cli/process.py
"-o",
"--output",
type=click.Path(dir_okay=False),
default=None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm fine with having defaults be adding suffixes, but i think we should encourage paths to be passed explicitly or else have a predictable naming system for items within a dataset

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to go towards a predictable naming system, but whichever way you think is better.

@t-sasatani
Copy link
Copy Markdown
Collaborator Author

t-sasatani commented Apr 22, 2026

is it important that we also write those out to file

Yes, this is important now, and I think the file was generated in the stitch PR or somewhere. Might have been dropped in #133 . My bad if it wasn't in the tests.
I was thinking that the output of this whole preprocessing workflow should be just a pair of a video and a CSV file for each frame's timestamps. That is what the later pipelines (Minian, CaImAn, etc) use and passing the whole metadata couples everything to mio.

@t-sasatani
Copy link
Copy Markdown
Collaborator Author

t-sasatani commented Apr 22, 2026

But I wasn't sure where the best place was to generate this cleaned-up timestamp file, because someone might use only stitch or both stitch and denoise. We could just have a separate command to generate this timestamp, and maybe also include that for the end of the workflow command bundle.

@t-sasatani
Copy link
Copy Markdown
Collaborator Author

And I assume you already figured this out, but there are start and end timestamps because each frame has multiple buffers, each with independent timestamps.

@t-sasatani
Copy link
Copy Markdown
Collaborator Author

Never mind the timestamp CSV export still existed, and it's here

def _make_frame_timestamp_csv(csv_df: pd.DataFrame) -> pd.DataFrame:

@t-sasatani
Copy link
Copy Markdown
Collaborator Author

t-sasatani commented Apr 22, 2026

I think this mess started with me, but the overall output going to [SPECIFIED_PATH]/user_data/output is somewhat confusing, so we probably want to think about the export structure.
So for this example

  • video1__video2_stitched_patch.avi
  • video1__video2_stitched_patch_timestamp.csv

are the two files used for later processing (minian, placecell analysis, etc.). The rest are mostly used for diagnosis and evaluating communication quality.

> pdm run mio process workflow -i video1.avi -i video2.avi -o test_output_path -c denoise_example
> tree test_output_path 
test_output_path
├── debug
│   ├── video1__video2_stitched_freq_mask.avi
│   ├── video1__video2_stitched_freq_mask_freq_domain.avi
│   ├── video1__video2_stitched_freq_mask_freq_mask.png
│   ├── video1__video2_stitched_patch_dropped_frames.txt
│   ├── video1__video2_stitched_patch_noisy_frames.avi
│   └── video1__video2_stitched_patch_patched_area.avi
├── user_data
│   └── output
│       ├── video1__video2_stitched_patch.avi
│       ├── video1__video2_stitched_patch.csv
│       └── video1__video2_stitched_patch_timestamp.csv
├── video1__video2_stitched.avi
├── video1__video2_stitched.csv
└── video1__video2_stitched_scores.csv

One way I can think of is having an output directory (maybe .mio or something, just to be clear where the root is) that has subdirectories like:

  • raw
  • [operation_1]
  • [operation_1]_[operation_2]

So basically, having subdirectories named by operation that contain the operation's output and debug files.
And appending operation names in the subdirectory for consecutive operations. This would work for our case, but this also gets closer to defining our own format, which we want to avoid, so I'm not sure what a good way is.

@sneakers-the-rat
Copy link
Copy Markdown
Collaborator

sneakers-the-rat commented Apr 22, 2026

OK that all makes sense to me, thanks for catching me up.

I was thinking that the output of this whole preprocessing workflow should be just a pair of a video and a CSV file for each frame's timestamps. That is what the later pipelines (Minian, CaImAn, etc) use and passing the whole metadata couples everything to mio.

agreed! yes one of the things i am trying to do is sketch out what we want to have count as a recording - some video and some set of associated metadata - and having timestamps be part of that especially makes sense if that's what downstream tools need. I am not a huge fan of filename-based conventions, but the current situation is that for a given {name}.avi, it has the accompanying mio metadata table named {name}.csv, and i think it would be confusing to have {name}.csv mean different things for different recordings. so if {name}_timestamps.csv accompanies all of the videos that could definitely be done.

We could just have a separate command to generate this timestamp, and maybe also include that for the end of the workflow command bundle.

yes definitely, want to avoid having to juggle these and do a specific set of commands in a specific order as possible, so what i was thinking above is to automatically derive this whenever the recording is created - initially i was thinking of just doing this with a @property but if it's better to have it as a file for downstream consumers then that sounds good to me!

I think this mess started with me, but the overall output going to [SPECIFIED_PATH]/user_data/output is somewhat confusing, so we probably want to think about the export structure.
having subdirectories named by operation that contain the operation's output and debug files.

yes i think this is worth sorting out as a set of dedicated PRs instead of trying to do them in-place here :). I like subdirectories with outcomes of different operations, and then within those we can store metadata about those operations (parameters, source files) in the subdirectories (the derived_from/RecordingDerivation thing i sketched previously that doesn't have a concrete implementation). That's definitely better than just stacking a ton of stuff into the filename like {name}_denoised_stitched_patch_final_final_v2.avi. I would prefer having that exist as proper metadata rather than having the metadata embedded in file and directory names in some dataset.yaml file, but that would take a bit more work to make smooth, backporting that into the existing processing operations.


so ok i'll start another issue to discuss dataset structure, and for now will add "timestamps" as an additional field in the Recording model that gets auto-created by the Recording (and we can also revisit that decision later too).

edit: wait we already have #165 . so will add links back there

Comment thread mio/process/video.py
Comment on lines -636 to -675
def _drop_frames_by_index(
df: pd.DataFrame,
dropped_frame_indices: list[int],
index_column: str = "reconstructed_frame_index",
) -> pd.DataFrame:
"""
Drop rows whose ``index_column`` value is in the dropped frame indices,
and then recreate ``index_column`` monotonically increasing from 0.
"""
filtered = df[~df[index_column].isin(dropped_frame_indices)].copy()
filtered[index_column] = filtered.groupby(index_column).ngroup()
return filtered


def _make_frame_timestamp_csv(csv_df: pd.DataFrame) -> pd.DataFrame:
"""
Export a frame-timestamp CSV file mapping reconstructed_frame_index to unix timestamps.

The CSV includes both the first and last buffer timestamps for each frame.

Parameters
----------
output_video_path : Path
Path to the output video file.
csv_df : pd.DataFrame
The modified CSV DataFrame with reconstructed_frame_index and
buffer_recv_unix_time.
"""
frame_timestamps = (
csv_df.groupby("reconstructed_frame_index")["buffer_recv_unix_time"]
.agg(["min", "max"])
.reset_index()
)

frame_timestamps.columns = ["frame", "timestamp_first", "timestamp_last"]

frame_timestamps = frame_timestamps.sort_values("frame")
return frame_timestamps


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just moved these to the bottom of the file unchanged, was weird having private functions right in the middle there

Comment thread mio/process/video.py
filtered = _drop_frames_by_index(recording.metadata, removal_set)
filtered.to_csv(output_path.with_suffix(".csv"), index=False)

return Recording.from_video(output_path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so timestamps now created here automatically on instantiating the recording class, since we expect timestamps to be part of the recording model (if metadata exists)

@sneakers-the-rat sneakers-the-rat changed the base branch from feat-stitch-video to main April 22, 2026 23:07
@sneakers-the-rat
Copy link
Copy Markdown
Collaborator

ok kewl merging this, should keep the same behavior as before (with the exception of passing an expected file path rather than expected directory, so we either have {input}_removed.avi in the same directory as the default, or an explicit path, and we can revisit path structure later).

it looks like the windows package manager chocolatey is down right now, which is the cause of all the failing dinwso tests, so going ahead with the red x's

@sneakers-the-rat sneakers-the-rat merged commit 3366e64 into main Apr 22, 2026
30 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants