Skip to content

add mcap dataset support#10

Open
abhichothani42 wants to merge 9 commits intomainfrom
05-01-add_mcap_dataset_support
Open

add mcap dataset support#10
abhichothani42 wants to merge 9 commits intomainfrom
05-01-add_mcap_dataset_support

Conversation

@abhichothani42
Copy link
Copy Markdown
Contributor

@abhichothani42 abhichothani42 commented May 1, 2026

TL;DR

Adds MCAP as a supported dataset output format.

What changed?

  • Add script 07_generate_mcap_dataset.py writes one .mcap file per episode to a dataset_mcap/ directory. Each file contains time-aligned robot state messages and JPEG-compressed camera images using the foxglove.CompressedImage schema.
  • The existing 07_generate_replay_buffer.py has been renamed to 07_generate_zarr_dataset.py to better reflect its purpose.
  • dataset_generation_pipeline.py now accepts a --format / -f flag (mcap or zarr, defaulting to mcap) that selects which generation script to invoke in step 7.
  • A new ## Dataset Formats section in the README documents both formats, their output structure, and the data they contain.
  • mcap>=1.3.1 has been added as a dependency.

How to test?

Run the pipeline with each format and verify the outputs:

# MCAP (default)
uv run python scripts/dataset_generation_pipeline.py <session_dir>

# Zarr
uv run python scripts/dataset_generation_pipeline.py -f zarr <session_dir>

Open a generated .mcap file in Foxglove Studio to verify robot state topics and camera image streams are correctly populated and time-aligned.

Copy link
Copy Markdown
Contributor Author

abhichothani42 commented May 1, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds MCAP as an additional dataset export format to the SLAM dataset-generation pipeline, allowing datasets to be inspected as typed, time-aligned episode recordings (e.g., in Foxglove Studio), while retaining the existing Zarr replay-buffer export.

Changes:

  • Added 07_generate_mcap_dataset.py to write one .mcap file per episode with robot state topics and JPEG-compressed camera images (foxglove.CompressedImage).
  • Renamed/retitled step-7 Zarr generation script to 07_generate_zarr_dataset.py and updated usage docs accordingly.
  • Updated dataset_generation_pipeline.py and README.md to support --format {mcap,zarr} and document both dataset formats; added mcap dependency.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/scripts_slam_pipeline/07_generate_zarr_dataset.py Updates usage text to match the renamed Zarr dataset generator entrypoint.
scripts/scripts_slam_pipeline/07_generate_mcap_dataset.py New MCAP exporter producing per-episode MCAP files with JSON-encoded schemas and JPEG images.
scripts/dataset_generation_pipeline.py Adds --format/-f flag and routes step 7 to MCAP (default) or Zarr generator.
pyproject.toml Adds mcap as a runtime dependency.
README.md Documents dataset format selection and output structure for MCAP vs Zarr.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/scripts_slam_pipeline/07_generate_mcap_dataset.py
Comment thread scripts/scripts_slam_pipeline/07_generate_mcap_dataset.py Outdated
Comment thread scripts/scripts_slam_pipeline/07_generate_mcap_dataset.py Outdated
Comment thread scripts/scripts_slam_pipeline/07_generate_mcap_dataset.py Outdated
Copy link
Copy Markdown
Contributor

@lukeschmitt-tr lukeschmitt-tr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide us with an example mcap file

"-or",
"--out_res",
type=str,
default="224,224",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this crop the images before they are saved? we should include the native resolution and crop it during replay

@abhichothani42 abhichothani42 force-pushed the 05-01-add_mcap_dataset_support branch from 57b2fda to d1f96ef Compare May 5, 2026 14:54
@abhichothani42 abhichothani42 force-pushed the 05-01-add_mcap_dataset_support branch from d1f96ef to fd6687f Compare May 5, 2026 15:23
Base automatically changed from 04-22-add_timecode_sync_update_readme to main May 5, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants