CLI for removing specified frame indecies from video/metadata/timestamp bundle#154
Conversation
|
This looks like it should follow #133 - that will merge in the fn for removing frames from the metadata, and this should be a pretty simple add on top of that. |
30d115c to
a0a6e7d
Compare
|
alright @t-sasatani i caught this up to main and adapted it to use the recording class (not much to change this time), but I am a bit unclear about what these timestamp metadata files are - how do they get made? it seems from the docstring that they are 1-row per frame with a column for the start timestamp of the frame and a column for the end timestamp - that would be something that would be pretty fast to compute with a i preserved your work pre-rebase in this branch: https://github.com/miniscope/mio/tree/remove-frames-pre-rebase |
| "-o", | ||
| "--output", | ||
| type=click.Path(dir_okay=False), | ||
| default=None, |
There was a problem hiding this comment.
i'm fine with having defaults be adding suffixes, but i think we should encourage paths to be passed explicitly or else have a predictable naming system for items within a dataset
There was a problem hiding this comment.
I do want to go towards a predictable naming system, but whichever way you think is better.
Yes, this is important now, and I think the file was generated in the stitch PR or somewhere. Might have been dropped in #133 . My bad if it wasn't in the tests. |
|
But I wasn't sure where the best place was to generate this cleaned-up timestamp file, because someone might use only stitch or both stitch and denoise. We could just have a separate command to generate this timestamp, and maybe also include that for the end of the workflow command bundle. |
|
And I assume you already figured this out, but there are start and end timestamps because each frame has multiple buffers, each with independent timestamps. |
|
Never mind the timestamp CSV export still existed, and it's here Line 650 in 52d0257 |
|
I think this mess started with me, but the overall output going to
are the two files used for later processing (minian, placecell analysis, etc.). The rest are mostly used for diagnosis and evaluating communication quality. > pdm run mio process workflow -i video1.avi -i video2.avi -o test_output_path -c denoise_example
> tree test_output_path
test_output_path
├── debug
│ ├── video1__video2_stitched_freq_mask.avi
│ ├── video1__video2_stitched_freq_mask_freq_domain.avi
│ ├── video1__video2_stitched_freq_mask_freq_mask.png
│ ├── video1__video2_stitched_patch_dropped_frames.txt
│ ├── video1__video2_stitched_patch_noisy_frames.avi
│ └── video1__video2_stitched_patch_patched_area.avi
├── user_data
│ └── output
│ ├── video1__video2_stitched_patch.avi
│ ├── video1__video2_stitched_patch.csv
│ └── video1__video2_stitched_patch_timestamp.csv
├── video1__video2_stitched.avi
├── video1__video2_stitched.csv
└── video1__video2_stitched_scores.csvOne way I can think of is having an output directory (maybe
So basically, having subdirectories named by operation that contain the operation's output and debug files. |
|
OK that all makes sense to me, thanks for catching me up.
agreed! yes one of the things i am trying to do is sketch out what we want to have count as a recording - some video and some set of associated metadata - and having timestamps be part of that especially makes sense if that's what downstream tools need. I am not a huge fan of filename-based conventions, but the current situation is that for a given
yes definitely, want to avoid having to juggle these and do a specific set of commands in a specific order as possible, so what i was thinking above is to automatically derive this whenever the recording is created - initially i was thinking of just doing this with a
yes i think this is worth sorting out as a set of dedicated PRs instead of trying to do them in-place here :). I like subdirectories with outcomes of different operations, and then within those we can store metadata about those operations (parameters, source files) in the subdirectories (the so ok i'll start another issue to discuss dataset structure, and for now will add "timestamps" as an additional field in the edit: wait we already have #165 . so will add links back there |
| def _drop_frames_by_index( | ||
| df: pd.DataFrame, | ||
| dropped_frame_indices: list[int], | ||
| index_column: str = "reconstructed_frame_index", | ||
| ) -> pd.DataFrame: | ||
| """ | ||
| Drop rows whose ``index_column`` value is in the dropped frame indices, | ||
| and then recreate ``index_column`` monotonically increasing from 0. | ||
| """ | ||
| filtered = df[~df[index_column].isin(dropped_frame_indices)].copy() | ||
| filtered[index_column] = filtered.groupby(index_column).ngroup() | ||
| return filtered | ||
|
|
||
|
|
||
| def _make_frame_timestamp_csv(csv_df: pd.DataFrame) -> pd.DataFrame: | ||
| """ | ||
| Export a frame-timestamp CSV file mapping reconstructed_frame_index to unix timestamps. | ||
|
|
||
| The CSV includes both the first and last buffer timestamps for each frame. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| output_video_path : Path | ||
| Path to the output video file. | ||
| csv_df : pd.DataFrame | ||
| The modified CSV DataFrame with reconstructed_frame_index and | ||
| buffer_recv_unix_time. | ||
| """ | ||
| frame_timestamps = ( | ||
| csv_df.groupby("reconstructed_frame_index")["buffer_recv_unix_time"] | ||
| .agg(["min", "max"]) | ||
| .reset_index() | ||
| ) | ||
|
|
||
| frame_timestamps.columns = ["frame", "timestamp_first", "timestamp_last"] | ||
|
|
||
| frame_timestamps = frame_timestamps.sort_values("frame") | ||
| return frame_timestamps | ||
|
|
||
|
|
There was a problem hiding this comment.
just moved these to the bottom of the file unchanged, was weird having private functions right in the middle there
| filtered = _drop_frames_by_index(recording.metadata, removal_set) | ||
| filtered.to_csv(output_path.with_suffix(".csv"), index=False) | ||
|
|
||
| return Recording.from_video(output_path) |
There was a problem hiding this comment.
so timestamps now created here automatically on instantiating the recording class, since we expect timestamps to be part of the recording model (if metadata exists)
|
ok kewl merging this, should keep the same behavior as before (with the exception of passing an expected file path rather than expected directory, so we either have it looks like the windows package manager chocolatey is down right now, which is the cause of all the failing dinwso tests, so going ahead with the red x's |
📚 Documentation preview 📚: https://miniscope-io--154.org.readthedocs.build/en/154/