add mcap dataset support#10
Open
abhichothani42 wants to merge 9 commits intomainfrom
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adds MCAP as an additional dataset export format to the SLAM dataset-generation pipeline, allowing datasets to be inspected as typed, time-aligned episode recordings (e.g., in Foxglove Studio), while retaining the existing Zarr replay-buffer export.
Changes:
- Added
07_generate_mcap_dataset.pyto write one.mcapfile per episode with robot state topics and JPEG-compressed camera images (foxglove.CompressedImage). - Renamed/retitled step-7 Zarr generation script to
07_generate_zarr_dataset.pyand updated usage docs accordingly. - Updated
dataset_generation_pipeline.pyandREADME.mdto support--format {mcap,zarr}and document both dataset formats; addedmcapdependency.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/scripts_slam_pipeline/07_generate_zarr_dataset.py | Updates usage text to match the renamed Zarr dataset generator entrypoint. |
| scripts/scripts_slam_pipeline/07_generate_mcap_dataset.py | New MCAP exporter producing per-episode MCAP files with JSON-encoded schemas and JPEG images. |
| scripts/dataset_generation_pipeline.py | Adds --format/-f flag and routes step 7 to MCAP (default) or Zarr generator. |
| pyproject.toml | Adds mcap as a runtime dependency. |
| README.md | Documents dataset format selection and output structure for MCAP vs Zarr. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lukeschmitt-tr
requested changes
May 4, 2026
Contributor
lukeschmitt-tr
left a comment
There was a problem hiding this comment.
Please provide us with an example mcap file
| "-or", | ||
| "--out_res", | ||
| type=str, | ||
| default="224,224", |
Contributor
There was a problem hiding this comment.
will this crop the images before they are saved? we should include the native resolution and crop it during replay
57b2fda to
d1f96ef
Compare
d1f96ef to
fd6687f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

TL;DR
Adds MCAP as a supported dataset output format.
What changed?
07_generate_mcap_dataset.pywrites one.mcapfile per episode to adataset_mcap/directory. Each file contains time-aligned robot state messages and JPEG-compressed camera images using thefoxglove.CompressedImageschema.07_generate_replay_buffer.pyhas been renamed to07_generate_zarr_dataset.pyto better reflect its purpose.dataset_generation_pipeline.pynow accepts a--format/-fflag (mcaporzarr, defaulting tomcap) that selects which generation script to invoke in step 7.## Dataset Formatssection in the README documents both formats, their output structure, and the data they contain.mcap>=1.3.1has been added as a dependency.How to test?
Run the pipeline with each format and verify the outputs:
Open a generated
.mcapfile in Foxglove Studio to verify robot state topics and camera image streams are correctly populated and time-aligned.