Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
99d02f2
fbcode/deeplearning/projects/sam3_release/sam3/train/data
Jan 27, 2026
2ec3c07
fbcode/deeplearning/projects/sam3_release/sam3/train/data
Feb 3, 2026
ae29316
Fix export edge cases in encoder/decoder
rbavery Feb 5, 2026
3d5e8d4
Add export artifact scripts and update export fixes
rbavery Feb 5, 2026
5f70777
Stabilize artifact export and add benchmarking
rbavery Feb 6, 2026
7b9f84a
Align export scripts with decoder-only artifacts
rbavery Feb 6, 2026
daa7294
Add standalone full pipeline export and timing helpers
rbavery Feb 7, 2026
732232b
Add feature-level control to SAM3 and timing helpers
rbavery Feb 7, 2026
ad2c338
Tidy export timing benchmark output
rbavery Feb 7, 2026
8fd38fd
Clean export logs and mark export tests slow
rbavery Feb 7, 2026
35d756e
Register slow export tests
rbavery Feb 7, 2026
3a5fa59
Relax position encoding check for dynamic prompts
rbavery Feb 12, 2026
f0399e7
Fix compilation of position encoding
Feb 17, 2026
9a35a53
Merge pull request #6 from wherobots/export-2input-debug
rbavery Feb 17, 2026
2d08d73
Introduce torch compile option
raedle Feb 20, 2026
f6e51f5
Add empty-frames guard to cv2 video loader
Feb 24, 2026
86ed770
Make decord import lazy and add image-only inference build target
Mar 16, 2026
9f22cb9
SAM 3.1 Release (#503)
arpitkalla Mar 27, 2026
5a3143c
Apply linter
Mar 27, 2026
e54adc4
Extract shared segmentation inputs into reusable component
Mar 30, 2026
29aecc4
Revert unintended changes from D98651970
Mar 31, 2026
bfbed07
Daily `arc lint --take BLACK`
Mar 31, 2026
44ef224
Fix SAM3 io_utils to handle extensionless video files from OIL
Apr 12, 2026
967fdd6
Fix Unused Import issue in fbcode/deeplearning/projects/sam3_release/…
Apr 16, 2026
2e0009e
v3 code
quantran244 Apr 22, 2026
7567e80
Fix PYRE_MISSING_ANNOTATIONS issues in fbcode/deeplearning/projects/s…
Apr 24, 2026
c3a42ff
SAM3 multi-GPU session as default
quantran244 Apr 24, 2026
e6f0eae
Fix PYRE_MISSING_ANNOTATIONS issues in fbcode/deeplearning/projects/s…
Apr 26, 2026
875ed6f
Fix PYRE_MISSING_ANNOTATIONS issues in fbcode/deeplearning/projects/s…
Apr 26, 2026
eef9c1e
Fix PYRE_MISSING_ANNOTATIONS issues in fbcode/deeplearning/projects/s…
Apr 27, 2026
c97c893
Fix PYRE_MISSING_ANNOTATIONS issues in fbcode/deeplearning/projects/s…
Apr 27, 2026
46c6f23
Merge wherobots/export-artifacts into latest upstream main
rbavery Apr 29, 2026
3bb7a1e
Bump torch to 2.11 and torchvision to 0.26
rbavery Apr 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -124,12 +124,15 @@ dmypy.json
# Model weights and checkpoints
*.pth
*.pt
*.pt2
*.bin
*.ckpt
*.safetensors
weights/
checkpoints/
sam3_logs/
artifacts/
tests/export/export_logs/

# Data files
*.h5
Expand Down
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ This breakthrough is driven by an innovative data engine that has automatically
<img src="assets/player.gif" width=380 />
</p>

## Latest updates

**03/27/2026 -- SAM 3.1 Object Multiplex is released. It introduces a shared-memory approach for joint multi-object tracking that is significantly faster without sacrificing accuracy.**

- A new suite of improved model checkpoints (denoted as **SAM 3.1**) are released on [Hugging Face](https://huggingface.co/facebook/sam3.1). See [`RELEASE_SAM3p1.md`](RELEASE_SAM3p1.md) for full details.
* To use the new SAM 3.1 checkpoints, you need the latest model code from this repo. If you have installed an earlier version of this repo, pull the latest code from this repo (with `git pull`), and then reinstall the repo following [Installation](#installation) below.

## Installation

### Prerequisites
Expand All @@ -74,7 +81,7 @@ conda activate sam3
2. **Install PyTorch with CUDA support:**

```bash
pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install torch==2.10.0 torchvision --index-url https://download.pytorch.org/whl/cu128
```

3. **Clone the repository and install the package:**
Expand All @@ -95,6 +102,12 @@ pip install -e ".[notebooks]"
pip install -e ".[train,dev]"
```

5. **Optional dependencies for faster inference**
```bash
pip install einops ninja && pip install flash-attn-3 --no-deps --index-url https://download.pytorch.org/whl/cu128
pip install git+https://github.com/ronghanghu/cc_torch.git
```

## Getting Started

⚠️ Before using SAM 3, please request access to the checkpoints on the SAM 3
Expand Down
150 changes: 150 additions & 0 deletions RELEASE_SAM3p1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Release Notes

## SAM 3.1 — March 27, 2026

SAM 3.1 introduces **Object Multiplex**, a shared-memory approach for joint multi-object tracking that is significantly faster without sacrificing accuracy. This release also includes new model checkpoints and optimized inference.

### Object Multiplex

SAM 3's video pipeline processes each tracked object independently, which scales linearly with the number of objects. Object Multiplex groups objects into fixed-capacity buckets and processes them jointly, drastically reducing redundant computation. For technical details, see Appendix H (Object Multiplex) in the [SAM 3 paper](https://arxiv.org/abs/2511.16719).

<p align="center">
<img src="assets/sam3.1_diagram.png" width="720" />
</p>

#### Key Improvements
- **~7x speedup** at 128 objects on a single H100 GPU compared to the SAM 3 November 2025 release
- Inference optimizations that significantly improve multi-object tracking efficiency:
- Reduced CPU-GPU synchronization in detection-tracker association and other heuristics
- Enhanced `torch.compile` support with improved operation fusion
- Batched postprocessing and vision encoder to increase GPU utilization
- Mixed results on SA-Co/VEval video benchmarks, with notable improvement on YT-Temporal-1B (+2.1 cgF1)
- Improved VOS performance on 6 out of 7 benchmarks, including +2.0 on the challenging MOSEv2

#### Inference Efficiency

<p align="center">
<img src="assets/sam3.1_efficiency.png" width="720" />
</p>

#### Video PCS with Text Prompt

<div align="center">
<table style="min-width: 80%; border: 2px solid #ddd; border-collapse: collapse">
<thead>
<tr>
<th rowspan="3" style="border-right: 2px solid #ddd; padding: 12px 16px">Model</th>
<th colspan="6" style="text-align: center; border-right: 2px solid #ddd; padding: 10px 16px">SA-Co/VEval benchmark test split</th>
<th colspan="4" style="text-align: center; padding: 10px 16px">Public benchmarks</th>
</tr>
<tr>
<th colspan="2" style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">SA-V</th>
<th colspan="2" style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">YT-Temporal-1B</th>
<th colspan="2" style="text-align: center; border-right: 2px solid #ddd; padding: 10px 16px">SmartGlasses</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">LVVIS</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">BURST</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">YTVIS21</th>
<th style="text-align: center; padding: 10px 16px">OVIS</th>
</tr>
<tr>
<th style="text-align: center; padding: 10px 16px">cgF1</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">pHOTA</th>
<th style="text-align: center; padding: 10px 16px">cgF1</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">pHOTA</th>
<th style="text-align: center; padding: 10px 16px">cgF1</th>
<th style="text-align: center; border-right: 2px solid #ddd; padding: 10px 16px">pHOTA</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">test mAP</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">test HOTA</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">val mAP</th>
<th style="text-align: center; padding: 10px 16px">val mAP</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border-right: 2px solid #ddd; padding: 10px 16px">SAM 3</td>
<td style="text-align: center; padding: 10px 16px">30.3</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">58.0</td>
<td style="text-align: center; padding: 10px 16px">50.8</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">69.9</td>
<td style="text-align: center; padding: 10px 16px">36.4</td>
<td style="text-align: center; border-right: 2px solid #ddd; padding: 10px 16px">63.6</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">36.3</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">44.5</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">57.4</td>
<td style="text-align: center; padding: 10px 16px">60.5</td>
</tr>
<tr style="border-top: 2px solid #b19c9cff">
<td style="border-right: 2px solid #ddd; padding: 10px 16px">SAM 3.1</td>
<td style="text-align: center; padding: 10px 16px">30.5</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">58.7</td>
<td style="text-align: center; padding: 10px 16px">52.9</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">70.7</td>
<td style="text-align: center; padding: 10px 16px">36.3</td>
<td style="text-align: center; border-right: 2px solid #ddd; padding: 10px 16px">64.4</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">34.3</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">43.3</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">56.6</td>
<td style="text-align: center; padding: 10px 16px">61.5</td>
</tr>
</tbody>
</table>

</div>

#### Video Object Segmentation (VOS)

<div align="center">
<table style="min-width: 60%; border: 2px solid #ddd; border-collapse: collapse">
<thead>
<tr>
<th rowspan="2" style="border-right: 2px solid #ddd; padding: 12px 16px">Model</th>
<th colspan="5" style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">J&amp;F</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">G</th>
<th style="text-align: center; padding: 10px 16px">J&amp;Ḟ</th>
</tr>
<tr>
<th style="text-align: center; padding: 10px 16px">MOSEv1 val</th>
<th style="text-align: center; padding: 10px 16px">DAVIS17 val</th>
<th style="text-align: center; padding: 10px 16px">LVOSv2 val</th>
<th style="text-align: center; padding: 10px 16px">SA-V val</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">SA-V test</th>
<th style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">YTVOS19 val</th>
<th style="text-align: center; padding: 10px 16px">MOSEv2 val</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border-right: 2px solid #ddd; padding: 10px 16px">SAM 3</td>
<td style="text-align: center; padding: 10px 16px">78.4</td>
<td style="text-align: center; padding: 10px 16px">92.2</td>
<td style="text-align: center; padding: 10px 16px">88.5</td>
<td style="text-align: center; padding: 10px 16px">83.5</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">84.4</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">89.7</td>
<td style="text-align: center; padding: 10px 16px">60.3</td>
</tr>
<tr>
<td style="border-right: 2px solid #ddd; padding: 10px 16px">SAM 3.1</td>
<td style="text-align: center; padding: 10px 16px">79.6</td>
<td style="text-align: center; padding: 10px 16px">92.7</td>
<td style="text-align: center; padding: 10px 16px">89.2</td>
<td style="text-align: center; padding: 10px 16px">83.8</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">85.1</td>
<td style="text-align: center; border-right: 1px solid #eee; padding: 10px 16px">89.3</td>
<td style="text-align: center; padding: 10px 16px">62.3</td>
</tr>
</tbody>
</table>
</div>

### New Checkpoints

The SAM 3.1 checkpoints are available on the [Hugging Face repo](https://huggingface.co/facebook/sam3.1). See [Getting Started](README.md#getting-started) for download and authentication instructions.

### Notebooks

- [`sam3.1_video_predictor_example.ipynb`](examples/sam3.1_video_predictor_example.ipynb): Demonstrates how to use SAM 3.1 with Object Multiplex for video segmentation and dense tracking with text and point prompts.

### Contributors

[Arpit Kalla](https://github.com/arpitkalla), [Chaitanya Ryali](https://scholar.google.com/citations?user=4LWx24UAAAAJ&hl=en), [Christian Puhrsch](https://github.com/cpuhrsch), [Ho Kei Cheng](https://hkchengrex.com/), [Joseph Greer](https://scholar.google.com/citations?user=guL96CkAAAAJ&hl=en), [Meng Wang](https://github.com/mengwa41), [Miran Heo](https://sites.google.com/view/miranheo), [Pengchuan Zhang](https://pzzhang.github.io/pzzhang/), [Roman Rädle](https://scholar.google.com/citations?user=Tpt57v0AAAAJ&hl=en), [Yuan-Ting Hu](https://scholar.google.com/citations?user=E8DVVYQAAAAJ&hl=en)
Binary file added assets/images/cat_dog.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/sam3.1_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/sam3.1_efficiency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading