diff --git a/.gitignore b/.gitignore
index fcda494..e67edc1 100644
--- a/.gitignore
+++ b/.gitignore
@@ -124,12 +124,15 @@ dmypy.json
# Model weights and checkpoints
*.pth
*.pt
+*.pt2
*.bin
*.ckpt
*.safetensors
weights/
checkpoints/
sam3_logs/
+artifacts/
+tests/export/export_logs/
# Data files
*.h5
diff --git a/README.md b/README.md
index 669242d..9aad534 100644
--- a/README.md
+++ b/README.md
@@ -55,6 +55,13 @@ This breakthrough is driven by an innovative data engine that has automatically
+## Latest updates
+
+**03/27/2026 -- SAM 3.1 Object Multiplex is released. It introduces a shared-memory approach for joint multi-object tracking that is significantly faster without sacrificing accuracy.**
+
+- A new suite of improved model checkpoints (denoted as **SAM 3.1**) are released on [Hugging Face](https://huggingface.co/facebook/sam3.1). See [`RELEASE_SAM3p1.md`](RELEASE_SAM3p1.md) for full details.
+ * To use the new SAM 3.1 checkpoints, you need the latest model code from this repo. If you have installed an earlier version of this repo, pull the latest code from this repo (with `git pull`), and then reinstall the repo following [Installation](#installation) below.
+
## Installation
### Prerequisites
@@ -74,7 +81,7 @@ conda activate sam3
2. **Install PyTorch with CUDA support:**
```bash
-pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
+pip install torch==2.10.0 torchvision --index-url https://download.pytorch.org/whl/cu128
```
3. **Clone the repository and install the package:**
@@ -95,6 +102,12 @@ pip install -e ".[notebooks]"
pip install -e ".[train,dev]"
```
+5. **Optional dependencies for faster inference**
+```bash
+pip install einops ninja && pip install flash-attn-3 --no-deps --index-url https://download.pytorch.org/whl/cu128
+pip install git+https://github.com/ronghanghu/cc_torch.git
+```
+
## Getting Started
⚠️ Before using SAM 3, please request access to the checkpoints on the SAM 3
diff --git a/RELEASE_SAM3p1.md b/RELEASE_SAM3p1.md
new file mode 100644
index 0000000..cf4f66f
--- /dev/null
+++ b/RELEASE_SAM3p1.md
@@ -0,0 +1,150 @@
+# Release Notes
+
+## SAM 3.1 — March 27, 2026
+
+SAM 3.1 introduces **Object Multiplex**, a shared-memory approach for joint multi-object tracking that is significantly faster without sacrificing accuracy. This release also includes new model checkpoints and optimized inference.
+
+### Object Multiplex
+
+SAM 3's video pipeline processes each tracked object independently, which scales linearly with the number of objects. Object Multiplex groups objects into fixed-capacity buckets and processes them jointly, drastically reducing redundant computation. For technical details, see Appendix H (Object Multiplex) in the [SAM 3 paper](https://arxiv.org/abs/2511.16719).
+
+
+
+
+
+#### Key Improvements
+- **~7x speedup** at 128 objects on a single H100 GPU compared to the SAM 3 November 2025 release
+- Inference optimizations that significantly improve multi-object tracking efficiency:
+ - Reduced CPU-GPU synchronization in detection-tracker association and other heuristics
+ - Enhanced `torch.compile` support with improved operation fusion
+ - Batched postprocessing and vision encoder to increase GPU utilization
+- Mixed results on SA-Co/VEval video benchmarks, with notable improvement on YT-Temporal-1B (+2.1 cgF1)
+- Improved VOS performance on 6 out of 7 benchmarks, including +2.0 on the challenging MOSEv2
+
+#### Inference Efficiency
+
+
+
+
+
+#### Video PCS with Text Prompt
+
+
+
+
+
+ | Model |
+ SA-Co/VEval benchmark test split |
+ Public benchmarks |
+
+
+ | SA-V |
+ YT-Temporal-1B |
+ SmartGlasses |
+ LVVIS |
+ BURST |
+ YTVIS21 |
+ OVIS |
+
+
+ | cgF1 |
+ pHOTA |
+ cgF1 |
+ pHOTA |
+ cgF1 |
+ pHOTA |
+ test mAP |
+ test HOTA |
+ val mAP |
+ val mAP |
+
+
+
+
+ | SAM 3 |
+ 30.3 |
+ 58.0 |
+ 50.8 |
+ 69.9 |
+ 36.4 |
+ 63.6 |
+ 36.3 |
+ 44.5 |
+ 57.4 |
+ 60.5 |
+
+
+ | SAM 3.1 |
+ 30.5 |
+ 58.7 |
+ 52.9 |
+ 70.7 |
+ 36.3 |
+ 64.4 |
+ 34.3 |
+ 43.3 |
+ 56.6 |
+ 61.5 |
+
+
+
+
+
+
+#### Video Object Segmentation (VOS)
+
+
+
+
+
+ | Model |
+ J&F |
+ G |
+ J&Ḟ |
+
+
+ | MOSEv1 val |
+ DAVIS17 val |
+ LVOSv2 val |
+ SA-V val |
+ SA-V test |
+ YTVOS19 val |
+ MOSEv2 val |
+
+
+
+
+ | SAM 3 |
+ 78.4 |
+ 92.2 |
+ 88.5 |
+ 83.5 |
+ 84.4 |
+ 89.7 |
+ 60.3 |
+
+
+ | SAM 3.1 |
+ 79.6 |
+ 92.7 |
+ 89.2 |
+ 83.8 |
+ 85.1 |
+ 89.3 |
+ 62.3 |
+
+
+
+
+
+### New Checkpoints
+
+The SAM 3.1 checkpoints are available on the [Hugging Face repo](https://huggingface.co/facebook/sam3.1). See [Getting Started](README.md#getting-started) for download and authentication instructions.
+
+### Notebooks
+
+- [`sam3.1_video_predictor_example.ipynb`](examples/sam3.1_video_predictor_example.ipynb): Demonstrates how to use SAM 3.1 with Object Multiplex for video segmentation and dense tracking with text and point prompts.
+
+### Contributors
+
+[Arpit Kalla](https://github.com/arpitkalla), [Chaitanya Ryali](https://scholar.google.com/citations?user=4LWx24UAAAAJ&hl=en), [Christian Puhrsch](https://github.com/cpuhrsch), [Ho Kei Cheng](https://hkchengrex.com/), [Joseph Greer](https://scholar.google.com/citations?user=guL96CkAAAAJ&hl=en), [Meng Wang](https://github.com/mengwa41), [Miran Heo](https://sites.google.com/view/miranheo), [Pengchuan Zhang](https://pzzhang.github.io/pzzhang/), [Roman Rädle](https://scholar.google.com/citations?user=Tpt57v0AAAAJ&hl=en), [Yuan-Ting Hu](https://scholar.google.com/citations?user=E8DVVYQAAAAJ&hl=en)
diff --git a/assets/images/cat_dog.jpg b/assets/images/cat_dog.jpg
new file mode 100644
index 0000000..0368769
Binary files /dev/null and b/assets/images/cat_dog.jpg differ
diff --git a/assets/sam3.1_diagram.png b/assets/sam3.1_diagram.png
new file mode 100644
index 0000000..c08585d
Binary files /dev/null and b/assets/sam3.1_diagram.png differ
diff --git a/assets/sam3.1_efficiency.png b/assets/sam3.1_efficiency.png
new file mode 100644
index 0000000..576020e
Binary files /dev/null and b/assets/sam3.1_efficiency.png differ
diff --git a/examples/sam3.1_video_predictor_example.ipynb b/examples/sam3.1_video_predictor_example.ipynb
new file mode 100644
index 0000000..3867357
--- /dev/null
+++ b/examples/sam3.1_video_predictor_example.ipynb
@@ -0,0 +1,643 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Copyright (c) Meta Platforms, Inc. and affiliates."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Video segmentation and tracking with SAM 3.1 (Object Multiplex)\n",
+ "\n",
+ "This notebook demonstrates how to use SAM 3.1 with Object Multiplex for interactive video segmentation and dense tracking. Object Multiplex groups objects into fixed-capacity buckets and processes them jointly, drastically reducing redundant computation compared to SAM 3's per-object inference. It covers the following capabilities:\n",
+ "\n",
+ "- **Text prompts**: Using natural language descriptions to segment objects (e.g., \"person\", \"shoe\")\n",
+ "- **Point prompts**: Adding positive/negative clicks to segment and refine objects\n",
+ "\n",
+ "We use the terms _segment_ or _mask_ to refer to the model prediction for an object on a single frame, and _masklet_ to refer to the spatio-temporal masks across the entire video. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "
\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "using_colab = False"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "if using_colab:\n",
+ " import torch\n",
+ " import torchvision\n",
+ " print(\"PyTorch version:\", torch.__version__)\n",
+ " print(\"Torchvision version:\", torchvision.__version__)\n",
+ " print(\"CUDA is available:\", torch.cuda.is_available())\n",
+ " import sys\n",
+ " !{sys.executable} -m pip install opencv-python matplotlib scikit-learn\n",
+ " !{sys.executable} -m pip install 'git+https://github.com/facebookresearch/sam3.git'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!nvidia-smi"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Set-up"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import sam3\n",
+ "import torch\n",
+ "\n",
+ "sam3_root = os.path.join(os.path.dirname(sam3.__file__), \"..\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Inference and visualization utils"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sam3.model_builder import build_sam3_multiplex_video_predictor\n",
+ "\n",
+ "predictor = build_sam3_multiplex_video_predictor()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "jupyter": {
+ "source_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import glob\n",
+ "import os\n",
+ "\n",
+ "import cv2\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "from PIL import Image\n",
+ "from sam3.visualization_utils import (\n",
+ " load_frame,\n",
+ " prepare_masks_for_visualization,\n",
+ " visualize_formatted_frame_output,\n",
+ ")\n",
+ "\n",
+ "plt.rcParams[\"axes.titlesize\"] = 12\n",
+ "plt.rcParams[\"figure.titlesize\"] = 12\n",
+ "\n",
+ "\n",
+ "def propagate_in_video(predictor, session_id):\n",
+ " outputs_per_frame = {}\n",
+ " for response in predictor.handle_stream_request(\n",
+ " request=dict(\n",
+ " type=\"propagate_in_video\",\n",
+ " session_id=session_id,\n",
+ " )\n",
+ " ):\n",
+ " outputs_per_frame[response[\"frame_index\"]] = response[\"outputs\"]\n",
+ "\n",
+ " return outputs_per_frame\n",
+ "\n",
+ "\n",
+ "def abs_to_rel_coords(coords, IMG_WIDTH, IMG_HEIGHT, coord_type=\"point\"):\n",
+ " \"\"\"Convert absolute coordinates to relative coordinates (0-1 range)\n",
+ "\n",
+ " Args:\n",
+ " coords: List of coordinates\n",
+ " coord_type: 'point' for [x, y] or 'box' for [x, y, w, h]\n",
+ " \"\"\"\n",
+ " if coord_type == \"point\":\n",
+ " return [[x / IMG_WIDTH, y / IMG_HEIGHT] for x, y in coords]\n",
+ " elif coord_type == \"box\":\n",
+ " return [\n",
+ " [x / IMG_WIDTH, y / IMG_HEIGHT, w / IMG_WIDTH, h / IMG_HEIGHT]\n",
+ " for x, y, w, h in coords\n",
+ " ]\n",
+ " else:\n",
+ " raise ValueError(f\"Unknown coord_type: {coord_type}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Loading an example video\n",
+ "\n",
+ "We assume that the video is stored as either **a list of JPEG frames with filenames like `.jpg`** or **an MP4 video**.\n",
+ "\n",
+ "Note that you can extract their JPEG frames using ffmpeg (https://ffmpeg.org/) as follows:\n",
+ "```\n",
+ "ffmpeg -i .mp4 -q:v 2 -start_number 0 /'%05d.jpg'\n",
+ "```\n",
+ "where `-q:v` generates high-quality JPEG frames and `-start_number 0` asks ffmpeg to start the JPEG file from `00000.jpg`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# \"video_path\" needs to be either a JPEG folder or a MP4 video file\n",
+ "video_path = f\"{sam3_root}/assets/videos/0001\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# load \"video_frames_for_vis\" for visualization purposes (they are not used by the model)\n",
+ "if isinstance(video_path, str) and video_path.endswith(\".mp4\"):\n",
+ " cap = cv2.VideoCapture(video_path)\n",
+ " video_frames_for_vis = []\n",
+ " while True:\n",
+ " ret, frame = cap.read()\n",
+ " if not ret:\n",
+ " break\n",
+ " video_frames_for_vis.append(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))\n",
+ " cap.release()\n",
+ "else:\n",
+ " video_frames_for_vis = glob.glob(os.path.join(video_path, \"*.jpg\"))\n",
+ " try:\n",
+ " video_frames_for_vis.sort(\n",
+ " key=lambda p: int(os.path.splitext(os.path.basename(p))[0])\n",
+ " )\n",
+ " except ValueError:\n",
+ " print(\n",
+ " f'frame names are not in \".jpg\" format: {video_frames_for_vis[:5]=}, '\n",
+ " f\"falling back to lexicographic sort.\"\n",
+ " )\n",
+ " video_frames_for_vis.sort()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Opening an inference session on this video\n",
+ "\n",
+ "SAM 3.1 requires stateful inference for interactive video segmentation, so we need to initialize an **inference session** on this video.\n",
+ "\n",
+ "During initialization, it loads all the video frames and stores their pixels in the session state."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "response = predictor.handle_request(\n",
+ " request=dict(\n",
+ " type=\"start_session\",\n",
+ " resource_path=video_path,\n",
+ " )\n",
+ ")\n",
+ "session_id = response[\"session_id\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Video promptable concept segmentation with text\n",
+ "\n",
+ "Using SAM 3.1 you can describe objects using natural language, and the model will automatically detect and track all instances of that object throughout the video.\n",
+ "\n",
+ "In the example below, we add a text prompt on frame 0 and propagation throughout the video. Here we use the text prompt \"person\" to detect all people in the video. SAM 3.1 will automatically identify multiple person instances and assign each a unique object ID.\n",
+ "\n",
+ "Note that the first call might be slower due to setting up buffers. **You can rerun all the cells below when measuring speed.**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# note: in case you already ran one text prompt and now want to switch to another text prompt\n",
+ "# it's required to reset the session first (otherwise the results would be wrong)\n",
+ "_ = predictor.handle_request(\n",
+ " request=dict(\n",
+ " type=\"reset_session\",\n",
+ " session_id=session_id,\n",
+ " )\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prompt_text_str = \"person\"\n",
+ "frame_idx = 0 # add a text prompt on frame 0\n",
+ "response = predictor.handle_request(\n",
+ " request=dict(\n",
+ " type=\"add_prompt\",\n",
+ " session_id=session_id,\n",
+ " frame_index=frame_idx,\n",
+ " text=prompt_text_str,\n",
+ " )\n",
+ ")\n",
+ "out = response[\"outputs\"]\n",
+ "\n",
+ "plt.close(\"all\")\n",
+ "visualize_formatted_frame_output(\n",
+ " frame_idx,\n",
+ " video_frames_for_vis,\n",
+ " outputs_list=[prepare_masks_for_visualization({frame_idx: out})],\n",
+ " titles=[\"SAM 3.1 Dense Tracking outputs\"],\n",
+ " figsize=(6, 4),\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# now we propagate the outputs from frame 0 to the end of the video and collect all outputs\n",
+ "outputs_per_frame = propagate_in_video(predictor, session_id)\n",
+ "\n",
+ "# finally, we reformat the outputs for visualization and plot the outputs every 60 frames\n",
+ "outputs_per_frame = prepare_masks_for_visualization(outputs_per_frame)\n",
+ "\n",
+ "vis_frame_stride = 60\n",
+ "plt.close(\"all\")\n",
+ "for frame_idx in range(0, len(outputs_per_frame), vis_frame_stride):\n",
+ " visualize_formatted_frame_output(\n",
+ " frame_idx,\n",
+ " video_frames_for_vis,\n",
+ " outputs_list=[outputs_per_frame],\n",
+ " titles=[\"SAM 3.1 Dense Tracking outputs\"],\n",
+ " figsize=(6, 4),\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Removing objects\n",
+ "\n",
+ "We can remove individual objects using their id.\n",
+ "\n",
+ "As an example, let's remove object 2 (which is the dancer in the front)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# we pick id 2, which is the dancer in the front\n",
+ "obj_id = 2\n",
+ "response = predictor.handle_request(\n",
+ " request=dict(\n",
+ " type=\"remove_object\",\n",
+ " session_id=session_id,\n",
+ " obj_id=obj_id,\n",
+ " )\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# now we propagate the outputs from frame 0 to the end of the video and collect all outputs\n",
+ "outputs_per_frame = propagate_in_video(predictor, session_id)\n",
+ "\n",
+ "# finally, we reformat the outputs for visualization and plot the outputs every 60 frames\n",
+ "outputs_per_frame = prepare_masks_for_visualization(outputs_per_frame)\n",
+ "\n",
+ "vis_frame_stride = 60\n",
+ "plt.close(\"all\")\n",
+ "for frame_idx in range(0, len(outputs_per_frame), vis_frame_stride):\n",
+ " visualize_formatted_frame_output(\n",
+ " frame_idx,\n",
+ " video_frames_for_vis,\n",
+ " outputs_list=[outputs_per_frame],\n",
+ " titles=[\"SAM 3.1 Dense Tracking outputs\"],\n",
+ " figsize=(6, 4),\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Adding new objects with point prompts\n",
+ "\n",
+ "We can add new objects through point prompts.\n",
+ "\n",
+ "Assuming that we've changed our mind, and now that we want to add back the dancer in the front (whom we just removed in the step above). We can use interactive clicks to add her back."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sample_img = Image.fromarray(load_frame(video_frames_for_vis[0]))\n",
+ "\n",
+ "IMG_WIDTH, IMG_HEIGHT = sample_img.size"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# let's add back the dancer via point prompts.\n",
+ "# we will use a single positive click to add the dancer back.\n",
+ "\n",
+ "frame_idx = 0\n",
+ "obj_id = 2\n",
+ "points_abs = np.array(\n",
+ " [\n",
+ " [760, 550], # positive click\n",
+ " ]\n",
+ ")\n",
+ "# positive clicks have label 1, while negative clicks have label 0\n",
+ "labels = np.array([1])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# convert points and labels to tensors; also convert to relative coordinates\n",
+ "points_tensor = torch.tensor(\n",
+ " abs_to_rel_coords(points_abs, IMG_WIDTH, IMG_HEIGHT, coord_type=\"point\"),\n",
+ " dtype=torch.float32,\n",
+ ")\n",
+ "points_labels_tensor = torch.tensor(labels, dtype=torch.int32)\n",
+ "\n",
+ "response = predictor.handle_request(\n",
+ " request=dict(\n",
+ " type=\"add_prompt\",\n",
+ " session_id=session_id,\n",
+ " frame_index=frame_idx,\n",
+ " points=points_tensor,\n",
+ " point_labels=points_labels_tensor,\n",
+ " obj_id=obj_id,\n",
+ " )\n",
+ ")\n",
+ "out = response[\"outputs\"]\n",
+ "\n",
+ "plt.close(\"all\")\n",
+ "visualize_formatted_frame_output(\n",
+ " frame_idx,\n",
+ " video_frames_for_vis,\n",
+ " outputs_list=[prepare_masks_for_visualization({frame_idx: out})],\n",
+ " titles=[\"SAM 3.1 Dense Tracking outputs\"],\n",
+ " figsize=(6, 4),\n",
+ " points_list=[points_abs],\n",
+ " points_labels_list=[labels],\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# now we propagate the outputs from frame 0 to the end of the video and collect all outputs\n",
+ "outputs_per_frame = propagate_in_video(predictor, session_id)\n",
+ "\n",
+ "# finally, we reformat the outputs for visualization and plot the outputs every 60 frames\n",
+ "outputs_per_frame = prepare_masks_for_visualization(outputs_per_frame)\n",
+ "\n",
+ "vis_frame_stride = 60\n",
+ "plt.close(\"all\")\n",
+ "for frame_idx in range(0, len(outputs_per_frame), vis_frame_stride):\n",
+ " visualize_formatted_frame_output(\n",
+ " frame_idx,\n",
+ " video_frames_for_vis,\n",
+ " outputs_list=[outputs_per_frame],\n",
+ " titles=[\"SAM 3.1 Dense Tracking outputs\"],\n",
+ " figsize=(6, 4),\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Refining an existing object with point prompts\n",
+ "\n",
+ "We can also refine the segmentation mask of an existing object through point prompts.\n",
+ "\n",
+ "Assuming that we've changed our mind (again) -- for Object ID 2 (the dancer in the front whom we just added back in the step above), now we only want to segment her T-shirt instead of her whole body. We can adjust the segmentation mask with a few more positive and negative clicks."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# For the dancer in the front, suppose now we only want to segment her T-shirt instead of her whole body\n",
+ "# we will use 2 positive clicks and 2 negative clicks to select her shirt.\n",
+ "\n",
+ "refine_object_3 = True\n",
+ "\n",
+ "if refine_object_3:\n",
+ " frame_idx = 0\n",
+ " obj_id = 3\n",
+ " points_abs = np.array(\n",
+ " [\n",
+ " [800, 135], # positive click\n",
+ " [800, 180], # negative click\n",
+ " ]\n",
+ " )\n",
+ " # positive clicks have label 1, while negative clicks have label 0\n",
+ " labels = np.array([1, 0])\n",
+ " \n",
+ "else:\n",
+ " frame_idx = 0\n",
+ " obj_id = 2\n",
+ " points_abs = np.array(\n",
+ " [\n",
+ " [740, 450], # positive click\n",
+ " [760, 630], # negative click\n",
+ " [840, 640], # negative click\n",
+ " [760, 550], # positive click\n",
+ " ]\n",
+ " )\n",
+ " # positive clicks have label 1, while negative clicks have label 0\n",
+ " labels = np.array([1, 0, 0, 1])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# convert points and labels to tensors; also convert to relative coordinates\n",
+ "points_tensor = torch.tensor(\n",
+ " abs_to_rel_coords(points_abs, IMG_WIDTH, IMG_HEIGHT, coord_type=\"point\"),\n",
+ " dtype=torch.float32,\n",
+ ")\n",
+ "points_labels_tensor = torch.tensor(labels, dtype=torch.int32)\n",
+ "\n",
+ "response = predictor.handle_request(\n",
+ " request=dict(\n",
+ " type=\"add_prompt\",\n",
+ " session_id=session_id,\n",
+ " frame_index=frame_idx,\n",
+ " points=points_tensor,\n",
+ " point_labels=points_labels_tensor,\n",
+ " obj_id=obj_id,\n",
+ " )\n",
+ ")\n",
+ "out = response[\"outputs\"]\n",
+ "\n",
+ "plt.close(\"all\")\n",
+ "visualize_formatted_frame_output(\n",
+ " frame_idx,\n",
+ " video_frames_for_vis,\n",
+ " outputs_list=[prepare_masks_for_visualization({frame_idx: out})],\n",
+ " titles=[\"SAM 3.1 Dense Tracking outputs\"],\n",
+ " figsize=(6, 4),\n",
+ " points_list=[points_abs],\n",
+ " points_labels_list=[labels],\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# now we propagate the outputs from frame 0 to the end of the video and collect all outputs\n",
+ "outputs_per_frame = propagate_in_video(predictor, session_id)\n",
+ "\n",
+ "# finally, we reformat the outputs for visualization and plot the outputs every 60 frames\n",
+ "outputs_per_frame = prepare_masks_for_visualization(outputs_per_frame)\n",
+ "\n",
+ "vis_frame_stride = 60\n",
+ "plt.close(\"all\")\n",
+ "for frame_idx in range(0, len(outputs_per_frame), vis_frame_stride):\n",
+ " visualize_formatted_frame_output(\n",
+ " frame_idx,\n",
+ " video_frames_for_vis,\n",
+ " outputs_list=[outputs_per_frame],\n",
+ " titles=[\"SAM 3.1 Dense Tracking outputs\"],\n",
+ " figsize=(6, 4),\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Close session\n",
+ "\n",
+ "Each session is tied to a single video. We can close the session after inference to free up its resources.\n",
+ "\n",
+ "(Then, you may start a new session on another video.)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "_ = predictor.handle_request(\n",
+ " request=dict(\n",
+ " type=\"close_session\",\n",
+ " session_id=session_id,\n",
+ " )\n",
+ ")"
+ ]
+ }
+ ],
+ "metadata": {
+ "fileHeader": "",
+ "fileUid": "0c5b0843-1bcb-4dac-8c85-3f149debb325",
+ "isAdHoc": false,
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/pyproject.toml b/pyproject.toml
index 9df1b67..cdfafa0 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -7,7 +7,7 @@ name = "sam3"
dynamic = ["version"]
description = "SAM3 (Segment Anything Model 3) implementation"
readme = "README.md"
-requires-python = ">=3.8"
+requires-python = ">=3.10,<3.13"
license = {file = "LICENSE"}
authors = [
{name = "Meta AI Research"}
@@ -33,6 +33,10 @@ dependencies = [
"iopath>=0.1.10",
"typing_extensions",
"huggingface_hub",
+ "einops",
+ "psutil",
+ "torch>=2.11,<2.12",
+ "torchvision>=0.26,<0.27",
]
[project.optional-dependencies]
@@ -92,6 +96,23 @@ sam3 = ["assets/*.txt.gz"]
[tool.setuptools.dynamic]
version = {attr = "sam3.__version__"}
+[dependency-groups]
+dev = [
+ "pytest",
+ "pytest-cov",
+ "black==24.2.0",
+ "ufmt==2.8.0",
+ "ruff-api==0.1.0",
+ "usort==1.0.2",
+ "gitpython==3.1.31",
+ "yt-dlp",
+ "pandas",
+ "opencv-python",
+ "pycocotools",
+ "numba",
+ "python-rapidjson",
+]
+
[tool.black]
line-length = 88
target-version = ['py38', 'py39', 'py310', 'py311', 'py312']
@@ -133,3 +154,4 @@ testpaths = ["tests"]
python_files = "test_*.py"
python_classes = "Test*"
python_functions = "test_*"
+markers = ["slow: long-running export and artifact tests"]
diff --git a/sam3/__init__.py b/sam3/__init__.py
index 1e75971..3600339 100644
--- a/sam3/__init__.py
+++ b/sam3/__init__.py
@@ -2,8 +2,8 @@
# pyre-unsafe
-from .model_builder import build_sam3_image_model
+from .model_builder import build_sam3_image_model, build_sam3_predictor
__version__ = "0.1.0"
-__all__ = ["build_sam3_image_model"]
+__all__ = ["build_sam3_image_model", "build_sam3_predictor"]
diff --git a/sam3/model/data_misc.py b/sam3/model/data_misc.py
index 298340d..bd1ed30 100644
--- a/sam3/model/data_misc.py
+++ b/sam3/model/data_misc.py
@@ -16,6 +16,53 @@
MyTensor = Union[torch.Tensor, List[Any]]
+class NestedTensor:
+ def __init__(self, tensors, mask):
+ self.tensors = tensors
+ self.mask = mask
+
+ def to(self, *args, **kwargs):
+ cast_tensor = self.tensors.to(*args, **kwargs)
+ cast_mask = self.mask.to(*args, **kwargs) if self.mask is not None else None
+ return type(self)(cast_tensor, cast_mask)
+
+ def clone(self):
+ new_tensors = self.tensors.clone()
+ new_mask = None if self.mask is None else self.mask.clone()
+ return NestedTensor(new_tensors, new_mask)
+
+ def __getitem__(self, idx):
+ return self.tensors[idx]
+
+ def __len__(self):
+ return len(self.tensors)
+
+ @property
+ def device(self):
+ return self.tensors.device
+
+ @property
+ def shape(self):
+ return self.tensors.shape
+
+ # custom memory pinning method on custom type
+ def pin_memory(self, device=None):
+ self.tensors = self.tensors.pin_memory(device)
+ if self.mask is not None:
+ self.mask = self.mask.pin_memory(device)
+
+
+# Register NestedTensor as a pytree node so tree_map_only can traverse into it
+# (matches onevision/utils/misc.py registration)
+from torch.utils import _pytree as pytree
+
+pytree.register_pytree_node(
+ NestedTensor,
+ lambda x: ([x.tensors, x.mask], None),
+ lambda values, _: NestedTensor(values[0], values[1]),
+)
+
+
def interpolate(
input, size=None, scale_factor=None, mode="nearest", align_corners=None
):
@@ -81,6 +128,15 @@ class FindStage:
# This is beneficial for tracking in videos without the need for pointers.
object_ids: Optional[List[List]] = None # List of objects per query
+ # Multiplex-specific fields (used by sam3_demo_multiplex)
+ img_ids_np: Optional[Any] = None
+ input_boxes_before_embed: Optional[MyTensor] = None
+ input_boxes_before_embed__type = torch.float
+ input_points_before_embed: Optional[MyTensor] = None
+ input_points_before_embed__type = torch.float
+ ptrs: Optional[Any] = None
+ ptrs_seg: Optional[Any] = None
+
@dataclass
class BatchedFindTarget:
@@ -165,6 +221,7 @@ class BatchedDatapoint:
find_targets: List[BatchedFindTarget]
find_metadatas: List[BatchedInferenceMetadata]
raw_images: Optional[List[Any]] = None
+ get_queries: Optional[Any] = None
def convert_my_tensors(obj):
@@ -188,6 +245,7 @@ def is_optional_field(field) -> bool:
):
stack_dim = 0
if field.name in [
+ "input_boxes_before_embed",
"input_boxes",
"input_boxes_label",
]:
diff --git a/sam3/model/decoder.py b/sam3/model/decoder.py
index 7a204be..75b10d0 100644
--- a/sam3/model/decoder.py
+++ b/sam3/model/decoder.py
@@ -6,12 +6,17 @@
Inspired from Pytorch's version, adds the pre-norm variant
"""
-from typing import Any, Dict, List, Optional
+import math
+from functools import partial
+from typing import Any, Dict, List, Optional, Union
import numpy as np
import torch
+import torch.nn.functional as torchF
+from sam3.sam.rope import apply_rotary_enc, apply_rotary_enc_real, compute_axial_cis
from sam3.sam.transformer import RoPEAttention
from torch import nn, Tensor
+from torch.nn.attention import sdpa_kernel, SDPBackend
from torchvision.ops.roi_align import RoIAlign
from .act_ckpt_utils import activation_ckpt_wrapper
@@ -151,24 +156,40 @@ def forward(
tgt = tgt + self.catext_dropout(tgt2)
tgt = self.catext_norm(tgt)
- if presence_token is not None:
- presence_token_mask = torch.zeros_like(cross_attn_mask[:, :1, :])
- cross_attn_mask = torch.cat(
- [presence_token_mask, cross_attn_mask], dim=1
- ) # (bs*nheads, 1+nq, hw)
+ if presence_token is not None and cross_attn_mask is not None:
+ if cross_attn_mask.dim() == 4:
+ presence_token_mask = torch.zeros_like(cross_attn_mask[:, :, :1, :])
+ cross_attn_mask = torch.cat(
+ [presence_token_mask, cross_attn_mask], dim=2
+ ) # (bs, nheads, 1+nq, hw)
+ else:
+ presence_token_mask = torch.zeros_like(cross_attn_mask[:, :1, :])
+ cross_attn_mask = torch.cat(
+ [presence_token_mask, cross_attn_mask], dim=1
+ ) # (bs*nheads, 1+nq, hw)
# Cross attention to image
- tgt2 = self.cross_attn(
- query=self.with_pos_embed(tgt, tgt_query_pos),
- key=self.with_pos_embed(memory, memory_pos),
- value=memory,
- attn_mask=cross_attn_mask,
- key_padding_mask=(
- memory_key_padding_mask.transpose(0, 1)
- if memory_key_padding_mask is not None
- else None
- ),
- )[0]
+ key_padding_mask = (
+ memory_key_padding_mask.transpose(0, 1)
+ if memory_key_padding_mask is not None
+ else None
+ )
+ if cross_attn_mask is not None and cross_attn_mask.dim() == 4:
+ tgt2 = self._cross_attn_with_rpb(
+ query=self.with_pos_embed(tgt, tgt_query_pos),
+ key=self.with_pos_embed(memory, memory_pos),
+ value=memory,
+ attn_bias=cross_attn_mask,
+ key_padding_mask=key_padding_mask,
+ )
+ else:
+ tgt2 = self.cross_attn(
+ query=self.with_pos_embed(tgt, tgt_query_pos),
+ key=self.with_pos_embed(memory, memory_pos),
+ value=memory,
+ attn_mask=cross_attn_mask,
+ key_padding_mask=key_padding_mask,
+ )[0]
tgt = tgt + self.dropout1(tgt2)
tgt = self.norm1(tgt)
@@ -183,6 +204,44 @@ def forward(
return tgt, presence_token_out
+ def _cross_attn_with_rpb(
+ self,
+ query: Tensor,
+ key: Tensor,
+ value: Tensor,
+ attn_bias: Tensor,
+ key_padding_mask: Optional[Tensor],
+ ) -> Tensor:
+ mha = self.cross_attn
+ assert isinstance(mha, nn.MultiheadAttention)
+ q, k, v = torchF._in_projection_packed(
+ query, key, value, mha.in_proj_weight, mha.in_proj_bias
+ )
+ tgt_len, bsz, _ = q.shape
+ num_heads = mha.num_heads
+ head_dim = mha.head_dim
+ q = q.contiguous().view(tgt_len, bsz, num_heads, head_dim).permute(1, 2, 0, 3)
+ k = k.contiguous().view(-1, bsz, num_heads, head_dim).permute(1, 2, 0, 3)
+ v = v.contiguous().view(-1, bsz, num_heads, head_dim).permute(1, 2, 0, 3)
+ src_len = k.shape[2]
+ bias = attn_bias
+ if bias.dim() == 3:
+ bias = bias.view(bsz, num_heads, tgt_len, src_len)
+ if key_padding_mask is not None:
+ pad = key_padding_mask[:, None, None, :].to(dtype=q.dtype)
+ pad = pad.masked_fill(pad > 0, float("-inf"))
+ bias = pad if bias is None else bias + pad
+ attn_output = torchF.scaled_dot_product_attention(
+ q,
+ k,
+ v,
+ attn_mask=bias,
+ dropout_p=mha.dropout if self.training else 0.0,
+ is_causal=False,
+ )
+ attn_output = attn_output.permute(2, 0, 1, 3).reshape(tgt_len, bsz, -1)
+ return torchF.linear(attn_output, mha.out_proj.weight, mha.out_proj.bias)
+
class TransformerDecoder(nn.Module):
def __init__(
@@ -333,10 +392,7 @@ def _get_rpb_matrix(self, reference_boxes, feat_size):
self.compilable_cord_cache = self._get_coords(H, W, reference_boxes.device)
self.compilable_stored_size = (H, W)
- if torch.compiler.is_dynamo_compiling() or self.compilable_stored_size == (
- H,
- W,
- ):
+ if torch.compiler.is_dynamo_compiling():
# good, hitting the cache, will be compilable
coords_h, coords_w = self.compilable_cord_cache
else:
@@ -348,8 +404,6 @@ def _get_rpb_matrix(self, reference_boxes, feat_size):
)
coords_h, coords_w = self.coord_cache[feat_size]
- assert coords_h.shape == (H,)
- assert coords_w.shape == (W,)
deltas_y = coords_h.view(1, -1, 1) - boxes_xyxy.reshape(-1, 1, 4)[:, :, 1:4:2]
deltas_y = deltas_y.view(bs, num_queries, -1, 2)
@@ -388,20 +442,13 @@ def _get_rpb_matrix(self, reference_boxes, feat_size):
act_ckpt_enable=self.training and self.use_act_checkpoint,
) # bs, num_queries, H, n_heads
- if not torch.compiler.is_dynamo_compiling():
- assert deltas_x.shape[:3] == (bs, num_queries, W)
- assert deltas_y.shape[:3] == (bs, num_queries, H)
B = deltas_y.unsqueeze(3) + deltas_x.unsqueeze(
2
) # bs, num_queries, H, W, n_heads
- if not torch.compiler.is_dynamo_compiling():
- assert B.shape[:4] == (bs, num_queries, H, W)
B = B.flatten(2, 3) # bs, num_queries, H*W, n_heads
B = B.permute(0, 3, 1, 2) # bs, n_heads, num_queries, H*W
B = B.contiguous() # memeff attn likes ordered strides
- if not torch.compiler.is_dynamo_compiling():
- assert B.shape[2:] == (num_queries, H * W)
return B
def forward(
@@ -510,6 +557,7 @@ def forward(
# conditional query
query_pos = self.ref_point_head(query_sine_embed) # nq, bs, d_model
+ memory_mask = None
if self.boxRPB != "none" and reference_boxes is not None:
assert spatial_shapes.shape[0] == 1, (
"only single scale support implemented"
@@ -518,7 +566,6 @@ def forward(
reference_boxes,
(spatial_shapes[0, 0], spatial_shapes[0, 1]),
)
- memory_mask = memory_mask.flatten(0, 1) # (bs*n_heads, nq, H*W)
if self.training:
assert self.use_act_checkpoint, (
"Activation checkpointing not enabled in the decoder"
@@ -951,3 +998,420 @@ def forward(self, *args: Any, **kwds: Any) -> torch.Tensor:
if self.pre_norm:
return self.forward_pre(*args, **kwds)
raise NotImplementedError
+
+
+def functional_attention(
+ q: Tensor,
+ k: Tensor,
+ v: Tensor,
+ *,
+ dropout: float,
+ num_heads: int,
+ num_k_exclude_rope: int = 0,
+ freqs_cis: Optional[Tensor] = None,
+ freqs_cis_real: Optional[Tensor] = None,
+ freqs_cis_imag: Optional[Tensor] = None,
+ use_fa3: bool = False,
+ use_rope_real: bool = False,
+ rope_k_repeat: bool,
+) -> Union[Tensor, tuple[Tensor, Tensor]]:
+ b, n, cq = q.shape
+ _, m, ck = k.shape
+ _, _, cv = v.shape
+ if b > 1:
+ assert k.shape[0] == v.shape[0] == b
+ else:
+ # broadcast-able
+ assert k.shape[0] == b == 1, f"{q.shape=} {k.shape=} {v.shape=}"
+ assert v.shape[1] == m
+
+ q = q.reshape(b, n, num_heads, cq // num_heads).transpose(1, 2)
+ k = k.reshape(b, m, num_heads, ck // num_heads).transpose(1, 2)
+ v = v.reshape(v.shape[0], m, num_heads, cv // num_heads).transpose(1, 2)
+
+ if freqs_cis is not None:
+ num_k_rope = k.size(-2) - num_k_exclude_rope
+ if use_rope_real:
+ q, k[:, :, :num_k_rope] = apply_rotary_enc_real(
+ q,
+ k[:, :, :num_k_rope],
+ freqs_cis_real=freqs_cis_real,
+ freqs_cis_imag=freqs_cis_imag,
+ repeat_freqs_k=rope_k_repeat,
+ )
+ else:
+ q, k[:, :, :num_k_rope] = apply_rotary_enc(
+ q,
+ k[:, :, :num_k_rope],
+ freqs_cis,
+ repeat_freqs_k=rope_k_repeat,
+ )
+
+ if use_fa3:
+ from sam3.perflib.fa3 import flash_attn_func
+
+ assert dropout == 0.0
+ out = flash_attn_func(q.transpose(1, 2), k.transpose(1, 2), v.transpose(1, 2))
+ else:
+ with sdpa_kernel(SDPBackend.FLASH_ATTENTION):
+ out = torchF.scaled_dot_product_attention(q, k, v, dropout_p=dropout)
+ out = out.transpose(1, 2) # B * n * n_heads * (cv // num_heads)
+
+ out = out.reshape(b, n, cv)
+ return out
+
+
+class SimpleRoPEAttention(nn.Module):
+ """
+ Attention with rotary position encoding.
+ This class is "simple" because it does not perform q/k/v/out projections.
+ """
+
+ def __init__(
+ self,
+ d_model: int,
+ num_heads: int,
+ dropout_p: float,
+ rope_theta=10000.0,
+ # whether to repeat q rope to match k length
+ # this is needed for cross-attention to memories
+ rope_k_repeat=False,
+ feat_sizes=(64, 64), # [w, h] for stride 16 feats at 1024 resolution
+ use_fa3: bool = False,
+ use_rope_real: bool = False,
+ ):
+ super().__init__()
+
+ self.num_heads = num_heads
+ self.dropout_p = dropout_p
+ self.compute_cis = partial(
+ compute_axial_cis, dim=d_model // num_heads, theta=rope_theta
+ )
+ device = torch.device("cuda") if torch.cuda.is_available() else None
+ self.freqs_cis = self.compute_cis(
+ end_x=feat_sizes[0], end_y=feat_sizes[1], device=device
+ )
+
+ self.use_fa3 = use_fa3
+ self.use_rope_real = use_rope_real
+ if self.use_rope_real:
+ self.freqs_cis_real = self.freqs_cis.real
+ self.freqs_cis_imag = self.freqs_cis.imag
+ self.rope_k_repeat = rope_k_repeat
+
+ def forward(
+ self,
+ q: Tensor,
+ k: Tensor,
+ v: Tensor,
+ num_k_exclude_rope: int = 0,
+ ) -> Union[Tensor, tuple[Tensor, Tensor]]:
+ # Apply rotary position encoding
+ w = h = math.sqrt(q.shape[-2])
+ self.freqs_cis = self.freqs_cis.to(q.device)
+ if self.freqs_cis.shape[0] != q.shape[-2]:
+ self.freqs_cis = self.compute_cis(end_x=w, end_y=h, device=q.device)
+ if self.use_rope_real:
+ self.freqs_cis_real = self.freqs_cis.real
+ self.freqs_cis_imag = self.freqs_cis.imag
+ if q.shape[-2] != k.shape[-2]:
+ assert self.rope_k_repeat
+
+ dropout_p = self.dropout_p if self.training else 0.0
+ out = functional_attention(
+ q,
+ k,
+ v,
+ dropout=dropout_p,
+ num_heads=self.num_heads,
+ num_k_exclude_rope=num_k_exclude_rope,
+ freqs_cis=self.freqs_cis,
+ freqs_cis_real=self.freqs_cis_real if self.use_rope_real else None,
+ freqs_cis_imag=self.freqs_cis_imag if self.use_rope_real else None,
+ use_fa3=self.use_fa3,
+ use_rope_real=self.use_rope_real,
+ rope_k_repeat=self.rope_k_repeat,
+ )
+
+ return out
+
+
+class DecoupledTransformerDecoderLayerv2(nn.Module):
+ def __init__(
+ self,
+ *,
+ activation: str,
+ d_model: int,
+ num_heads: int,
+ dim_feedforward: int,
+ dropout: float,
+ pos_enc_at_attn: bool,
+ pos_enc_at_cross_attn_keys: bool,
+ pos_enc_at_cross_attn_queries: bool,
+ pre_norm: bool,
+ cross_attention_first: bool = False,
+ self_attention_rope: SimpleRoPEAttention,
+ cross_attention_rope: SimpleRoPEAttention,
+ ):
+ super().__init__()
+ self.d_model = d_model
+ self.num_heads = num_heads
+ self.dim_feedforward = dim_feedforward
+ self.dropout_value = dropout
+
+ self.self_attn_q_proj = nn.Linear(d_model, d_model)
+ self.self_attn_k_proj = nn.Linear(d_model, d_model)
+ self.self_attn_v_proj = nn.Linear(d_model, d_model)
+ self.self_attn_out_proj = nn.Linear(d_model, d_model)
+
+ self.cross_attn_q_proj = nn.Linear(d_model, d_model)
+ self.cross_attn_k_proj = nn.Linear(d_model, d_model)
+ self.cross_attn_v_proj = nn.Linear(d_model, d_model)
+ self.cross_attn_out_proj = nn.Linear(d_model, d_model)
+
+ self.image_cross_attn_q_proj = nn.Linear(d_model, d_model)
+ self.image_cross_attn_k_proj = nn.Linear(d_model, d_model)
+
+ self.self_attention_rope = self_attention_rope
+ self.cross_attention_rope = cross_attention_rope
+
+ # Implementation of Feedforward model
+ self.linear1 = nn.Linear(d_model, dim_feedforward)
+ self.dropout = nn.Dropout(dropout)
+ self.linear2 = nn.Linear(dim_feedforward, d_model)
+
+ self.norm1 = nn.LayerNorm(d_model)
+ self.norm2 = nn.LayerNorm(d_model)
+ self.norm3 = nn.LayerNorm(d_model)
+ self.dropout1 = nn.Dropout(dropout)
+ self.dropout2 = nn.Dropout(dropout)
+ self.dropout3 = nn.Dropout(dropout)
+
+ self.activation_str = activation
+ self.activation = get_activation_fn(activation)
+ self.pre_norm = pre_norm
+
+ self.pos_enc_at_attn = pos_enc_at_attn
+ self.pos_enc_at_cross_attn_queries = pos_enc_at_cross_attn_queries
+ self.pos_enc_at_cross_attn_keys = pos_enc_at_cross_attn_keys
+
+ self.cross_attention_first = cross_attention_first
+
+ def _forward_sa(self, tgt, query_pos):
+ # Self-Attention
+ tgt2 = self.norm1(tgt)
+
+ q = k = tgt2 + query_pos if self.pos_enc_at_attn else tgt2
+
+ q = self.self_attn_q_proj(q)
+ k = self.self_attn_k_proj(k)
+ v = self.self_attn_v_proj(tgt2)
+ out = self.self_attention_rope(q, k, v)
+ tgt2 = self.self_attn_out_proj(out)
+
+ tgt = tgt + self.dropout1(tgt2)
+ return tgt
+
+ def _forward_ca(
+ self,
+ *,
+ image,
+ tgt,
+ memory_image,
+ memory,
+ query_pos,
+ memory_image_pos,
+ num_k_exclude_rope=0,
+ ):
+ kwds = {}
+ if num_k_exclude_rope > 0:
+ assert isinstance(self.cross_attention_rope, SimpleRoPEAttention)
+ kwds = {"num_k_exclude_rope": num_k_exclude_rope}
+
+ # Cross-Attention
+ tgt2 = self.norm2(tgt)
+
+ q = self.image_cross_attn_q_proj(image) + self.cross_attn_q_proj(tgt2)
+ if self.pos_enc_at_cross_attn_queries:
+ q = q + query_pos
+ k = self.image_cross_attn_k_proj(memory_image) + self.cross_attn_k_proj(memory)
+ if self.pos_enc_at_cross_attn_keys:
+ k = k + memory_image_pos
+ v = self.cross_attn_v_proj(memory)
+
+ out = self.cross_attention_rope(q, k, v, **kwds)
+ tgt2 = self.cross_attn_out_proj(out)
+
+ tgt = tgt + self.dropout2(tgt2)
+ return tgt
+
+ def forward_pre(
+ self,
+ *,
+ image,
+ tgt,
+ memory_image,
+ memory,
+ image_pos: Optional[Tensor] = None,
+ query_pos: Optional[Tensor] = None,
+ memory_image_pos: Optional[Tensor] = None,
+ memory_pos: Optional[Tensor] = None,
+ num_k_exclude_rope: int = 0,
+ ):
+ if self.cross_attention_first:
+ tgt = self._forward_ca(
+ image=image,
+ tgt=tgt,
+ memory_image=memory_image,
+ memory=memory,
+ query_pos=query_pos,
+ memory_image_pos=memory_image_pos,
+ num_k_exclude_rope=num_k_exclude_rope,
+ )
+ tgt = self._forward_sa(tgt, query_pos)
+ else:
+ tgt = self._forward_sa(tgt, query_pos)
+ tgt = self._forward_ca(
+ image=image,
+ tgt=tgt,
+ memory_image=memory_image,
+ memory=memory,
+ query_pos=query_pos,
+ memory_image_pos=memory_image_pos,
+ num_k_exclude_rope=num_k_exclude_rope,
+ )
+
+ # MLP
+ tgt2 = self.norm3(tgt)
+ tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))
+ tgt = tgt + self.dropout3(tgt2)
+
+ return image, tgt
+
+ def forward(self, *args: Any, **kwds: Any) -> torch.Tensor:
+ if self.pre_norm:
+ return self.forward_pre(*args, **kwds)
+ raise NotImplementedError
+
+
+class TransformerEncoderDecoupledCrossAttention(nn.Module):
+ def __init__(
+ self,
+ d_model: int,
+ frozen: bool,
+ pos_enc_at_input: bool,
+ layer,
+ num_layers: int,
+ use_act_checkpoint: bool = False,
+ batch_first: bool = False, # Do layers expect batch first input?
+ use_image_in_output: bool = True,
+ ):
+ super().__init__()
+ self.d_model = d_model
+ self.layers = get_clones(layer, num_layers)
+ self.num_layers = num_layers
+ self.norm = nn.LayerNorm(d_model)
+ self.pos_enc_at_input = pos_enc_at_input
+ self.use_act_checkpoint = use_act_checkpoint
+ self.use_image_in_output = use_image_in_output
+
+ if frozen:
+ for p in self.parameters():
+ p.requires_grad_(False)
+
+ self.batch_first = batch_first
+
+ def forward(
+ self,
+ image: Tensor, # image features
+ src: Tensor, # self-attention inputs; object features
+ memory_image: Tensor, # cross-attention inputs; image features
+ memory: Tensor, # cross-attention inputs; object features
+ image_pos: Optional[Tensor] = None, # pos_enc for self-attention inputs
+ src_pos: Optional[Tensor] = None, # pos_enc for self-attention inputs
+ memory_image_pos: Optional[Tensor] = None, # pos_enc for cross-attention inputs
+ memory_pos: Optional[Tensor] = None, # pos_enc for cross-attention inputs
+ num_obj_ptr_tokens: int = 0, # number of object pointer *tokens*
+ ):
+ assert src.shape[1] == memory.shape[1], (
+ "Batch size must be the same for src and memory"
+ )
+ assert image.shape[1] == memory_image.shape[1], (
+ "Batch size must be the same for image and memory_image"
+ )
+
+ output = src
+
+ if self.pos_enc_at_input and src_pos is not None:
+ output = output + 0.1 * src_pos
+
+ if self.batch_first:
+ # Convert to batch first
+ output = output.transpose(0, 1)
+ src_pos = src_pos.transpose(0, 1)
+ image = image.transpose(0, 1)
+ memory = memory.transpose(0, 1)
+ memory_pos = memory_pos.transpose(0, 1)
+ memory_image = memory_image.transpose(0, 1)
+ memory_image_pos = memory_image_pos.transpose(0, 1)
+
+ if memory_image.shape[1] != memory.shape[1]:
+ # Pad memory_image with zeros, to accodmate object pointers
+ assert (memory.shape[1] - memory_image.shape[1]) == num_obj_ptr_tokens, (
+ f"{memory.shape[1]} - {memory_image.shape[1]} != {num_obj_ptr_tokens}"
+ )
+ memory_image = torch.cat(
+ [
+ memory_image,
+ torch.zeros(
+ (memory_image.shape[0], num_obj_ptr_tokens)
+ + memory_image.shape[2:],
+ dtype=memory_image.dtype,
+ device=memory_image.device,
+ ),
+ ],
+ dim=1,
+ )
+ if memory_image_pos is not None:
+ assert (
+ memory_pos.shape[1] - memory_image_pos.shape[1]
+ ) == num_obj_ptr_tokens, (
+ f"{memory_pos.shape[1]} - {memory_image_pos.shape[1]} != {num_obj_ptr_tokens}"
+ )
+ # tpos is the same in the batch anyway; note that memory_image always has a batch size of 1
+ memory_image_pos = torch.cat(
+ [
+ memory_image_pos,
+ memory_pos[0:1, -num_obj_ptr_tokens:],
+ ],
+ dim=1,
+ )
+
+ for layer in self.layers:
+ image, output = activation_ckpt_wrapper(layer)(
+ image=image,
+ tgt=output,
+ memory_image=memory_image,
+ memory=memory,
+ image_pos=image_pos,
+ query_pos=src_pos,
+ memory_image_pos=memory_image_pos,
+ memory_pos=memory_pos,
+ num_k_exclude_rope=num_obj_ptr_tokens,
+ act_ckpt_enable=self.training and self.use_act_checkpoint,
+ )
+
+ if self.use_image_in_output:
+ normed_output = self.norm(output + image)
+ else:
+ normed_output = self.norm(output)
+
+ if self.batch_first:
+ # Convert back to seq first
+ normed_output = normed_output.transpose(0, 1)
+ src_pos = src_pos.transpose(0, 1)
+
+ return {
+ "memory": normed_output,
+ "pos_embed": src_pos,
+ }
diff --git a/sam3/model/encoder.py b/sam3/model/encoder.py
index 3fc9406..49b0219 100644
--- a/sam3/model/encoder.py
+++ b/sam3/model/encoder.py
@@ -538,7 +538,7 @@ def forward(
else None
)
else:
- assert all(x.dim == 4 for x in src), (
+ assert all(x.dim() == 4 for x in src), (
"expected list of (bs, c, h, w) tensors"
)
diff --git a/sam3/model/geometry_encoders.py b/sam3/model/geometry_encoders.py
index d60ee54..815d0d3 100644
--- a/sam3/model/geometry_encoders.py
+++ b/sam3/model/geometry_encoders.py
@@ -645,11 +645,28 @@ def _encode_boxes(self, boxes, boxes_mask, boxes_labels, img_feats):
# We need to denormalize, and convert to [x, y, x, y]
boxes_xyxy = box_cxcywh_to_xyxy(boxes)
scale = torch.tensor([W, H, W, H], dtype=boxes_xyxy.dtype)
- scale = scale.pin_memory().to(device=boxes_xyxy.device, non_blocking=True)
+ if (
+ torch.is_tensor(scale)
+ and scale.device.type == "cpu"
+ and boxes_xyxy.device.type != "cpu"
+ and not torch._dynamo.is_compiling()
+ ):
+ scale = scale.pin_memory()
+ scale = scale.to(
+ device=boxes_xyxy.device,
+ non_blocking=boxes_xyxy.device.type != "cpu" and not torch._dynamo.is_compiling(),
+ )
scale = scale.view(1, 1, 4)
boxes_xyxy = boxes_xyxy * scale
+ boxes_xyxy = boxes_xyxy.transpose(0, 1)
+ batch_idx = torch.arange(
+ bs, device=boxes_xyxy.device, dtype=boxes_xyxy.dtype
+ )
+ batch_idx = batch_idx.view(bs, 1, 1).expand(bs, n_boxes, 1)
+ boxes_for_roi = torch.cat([batch_idx, boxes_xyxy], dim=-1)
+ boxes_for_roi = boxes_for_roi.reshape(-1, 5)
sampled = torchvision.ops.roi_align(
- img_feats, boxes_xyxy.float().transpose(0, 1).unbind(0), self.roi_size
+ img_feats, boxes_for_roi.float(), self.roi_size
)
assert list(sampled.shape) == [
bs * n_boxes,
diff --git a/sam3/model/io_utils.py b/sam3/model/io_utils.py
index 067f125..49eedfd 100644
--- a/sam3/model/io_utils.py
+++ b/sam3/model/io_utils.py
@@ -7,7 +7,9 @@
import queue
import re
import time
+import types
from threading import Condition, get_ident, Lock, Thread
+from typing import Optional
import numpy as np
import torch
@@ -132,6 +134,13 @@ def load_video_frames(
match = re.match(r"", video_path)
num_frames = int(match.group(1)) if match else 60
return load_dummy_video(image_size, offload_video_to_cpu, num_frames=num_frames)
+ elif video_path.startswith(" where N is an integer
+ match = re.match(r"", video_path)
+ num_frames = int(match.group(1)) if match else 60
+ return load_dummy_video(
+ image_size, offload_video_to_cpu, num_frames=num_frames, do_zeros=True
+ )
elif os.path.isdir(video_path):
return load_video_frames_from_image_folder(
image_folder=video_path,
@@ -152,7 +161,23 @@ def load_video_frames(
video_loader_type=video_loader_type,
)
else:
- raise NotImplementedError("Only video files and image folders are supported")
+ # No recognized extension (e.g., extensionless OIL paths) — attempt video loading.
+ # Only raise if the loader itself fails to decode frames.
+ try:
+ return load_video_frames_from_video_file(
+ video_path=video_path,
+ image_size=image_size,
+ offload_video_to_cpu=offload_video_to_cpu,
+ img_mean=img_mean,
+ img_std=img_std,
+ async_loading_frames=async_loading_frames,
+ video_loader_type=video_loader_type,
+ )
+ except Exception as e:
+ raise NotImplementedError(
+ f"Only video files and image folders are supported; "
+ f"failed to load '{video_path}' as video: {e}"
+ ) from e
def load_video_frames_from_image_folder(
@@ -300,6 +325,12 @@ def load_video_frames_from_video_file_using_cv2(
cap.release()
pbar.close()
+ if len(frames) == 0:
+ raise RuntimeError(
+ f"No frames could be decoded from video: {video_path}. "
+ f"The file may be corrupted, empty, or encoded with an unsupported codec."
+ )
+
# Convert to tensor
frames_np = np.stack(frames, axis=0).astype(np.float32) # (T, H, W, C)
video_tensor = torch.from_numpy(frames_np).permute(0, 3, 1, 2) # (T, C, H, W)
@@ -316,12 +347,15 @@ def load_video_frames_from_video_file_using_cv2(
return video_tensor, original_height, original_width
-def load_dummy_video(image_size, offload_video_to_cpu, num_frames=60):
+def load_dummy_video(image_size, offload_video_to_cpu, num_frames=60, do_zeros=False):
"""
Load a dummy video with random frames for testing and compilation warmup purposes.
"""
video_height, video_width = 480, 640 # dummy original video sizes
- images = torch.randn(num_frames, 3, image_size, image_size, dtype=torch.float16)
+ if not do_zeros:
+ images = torch.randn(num_frames, 3, image_size, image_size, dtype=torch.float16)
+ else:
+ images = torch.zeros(num_frames, 3, image_size, image_size, dtype=torch.float16)
if not offload_video_to_cpu:
images = images.cuda()
return images, video_height, video_width
@@ -460,7 +494,7 @@ def __init__(self):
self._waiters = queue.Queue()
self._condition = Condition()
- def acquire(self):
+ def acquire(self) -> None:
ident = get_ident()
with self._condition:
self._waiters.put(ident)
@@ -479,7 +513,12 @@ def release(self):
def __enter__(self):
self.acquire()
- def __exit__(self, t, v, tb):
+ def __exit__(
+ self,
+ t: Optional[type[BaseException]],
+ v: Optional[BaseException],
+ tb: Optional[types.TracebackType],
+ ) -> None:
self.release()
diff --git a/sam3/model/maskformer_segmentation.py b/sam3/model/maskformer_segmentation.py
index a2d5c68..2f03640 100644
--- a/sam3/model/maskformer_segmentation.py
+++ b/sam3/model/maskformer_segmentation.py
@@ -107,6 +107,12 @@ def _embed_pixels(
image_ids,
encoder_hidden_states,
) -> torch.Tensor:
+ # Unwrap NestedTensors to plain tensors if needed (multiplex path)
+ from sam3.model.data_misc import NestedTensor
+
+ def _unwrap(x):
+ return x.tensors if isinstance(x, NestedTensor) else x
+
feature_device = backbone_feats[0].device # features could be on CPU
model_device = self.device
image_ids_ = image_ids.to(feature_device)
@@ -116,10 +122,14 @@ def _embed_pixels(
backbone_visual_feats = []
for feat in backbone_feats:
# Copy the img features per query (pixel decoder won't share img feats)
- backbone_visual_feats.append(feat[image_ids_, ...].to(model_device))
+ backbone_visual_feats.append(
+ _unwrap(feat)[image_ids_, ...].to(model_device)
+ )
else:
# Bs=1, we rely on broadcasting for query-based processing
- backbone_visual_feats = [bb_feat.clone() for bb_feat in backbone_feats]
+ backbone_visual_feats = [
+ _unwrap(bb_feat).clone() for bb_feat in backbone_feats
+ ]
# Extract visual embeddings
encoder_hidden_states = encoder_hidden_states.permute(1, 2, 0)
spatial_dim = math.prod(backbone_feats[-1].shape[-2:])
@@ -135,7 +145,7 @@ def _embed_pixels(
else:
pixel_embed = self.pixel_decoder(backbone_visual_feats)
else:
- backbone_feats = [x.to(model_device) for x in backbone_feats]
+ backbone_feats = [_unwrap(x).to(model_device) for x in backbone_feats]
pixel_embed = self.pixel_decoder(backbone_feats)
if pixel_embed.shape[0] == 1:
# For batch_size=1 training, we can avoid the indexing to save memory
diff --git a/sam3/model/memory.py b/sam3/model/memory.py
index 196dbf9..5540d85 100644
--- a/sam3/model/memory.py
+++ b/sam3/model/memory.py
@@ -38,14 +38,19 @@ def __init__(
# Option to interpolate the input mask first before downsampling using convs. In that case, the total_stride is assumed to be after interpolation.
# If set to input resolution or None, we don't interpolate. We default to None to be safe (for older configs or if not explicitly set)
interpol_size=None,
+ # options for incorporating multiplex memory encoding
+ multiplex_count: int = 1,
+ starting_out_chan: int = 1,
+ input_channel_multiplier: int = 1,
):
super().__init__()
num_layers = int(math.log2(total_stride) // math.log2(stride))
+ multiplex_count = multiplex_count * input_channel_multiplier
assert stride**num_layers == total_stride
self.encoder = nn.Sequential()
- mask_in_chans, mask_out_chans = 1, 1
+ mask_in_chans, mask_out_chans = multiplex_count, starting_out_chan
for _ in range(num_layers):
- mask_out_chans = mask_in_chans * (stride**2)
+ mask_out_chans = mask_out_chans * (stride**2)
self.encoder.append(
nn.Conv2d(
mask_in_chans,
@@ -60,6 +65,7 @@ def __init__(
mask_in_chans = mask_out_chans
self.encoder.append(nn.Conv2d(mask_out_chans, embed_dim, kernel_size=1))
+ self.multiplex_count = multiplex_count
self.interpol_size = interpol_size
if self.interpol_size is not None:
assert isinstance(self.interpol_size, (list, tuple)), (
diff --git a/sam3/model/model_misc.py b/sam3/model/model_misc.py
index d961461..56d51c2 100644
--- a/sam3/model/model_misc.py
+++ b/sam3/model/model_misc.py
@@ -6,18 +6,25 @@
import copy
import math
+import warnings
import weakref
from collections.abc import Iterator
from contextlib import AbstractContextManager
from enum import auto, Enum
-from typing import Dict, List, Optional, Union
+from typing import Dict, List, Optional, Tuple, Union
import numpy as np
import torch
import torch.nn.functional as F
from torch import nn, Tensor
+from torch.overrides import handle_torch_function, has_torch_function
from typing_extensions import override
+try:
+ import xformers
+except ImportError:
+ xformers = None
+
def inverse_sigmoid(x, eps=1e-3):
"""
@@ -30,10 +37,678 @@ def inverse_sigmoid(x, eps=1e-3):
return torch.log(x1 / x2)
-class MultiheadAttentionWrapper(nn.MultiheadAttention):
- def forward(self, *args, **kwargs):
- kwargs["need_weights"] = False
- return super().forward(*args, **kwargs)
+def get_sdpa_settings():
+ if torch.cuda.is_available():
+ old_gpu = torch.cuda.get_device_properties(0).major < 7
+ # only use Flash Attention on Ampere (8.0) or newer GPUs
+ use_flash_attn = torch.cuda.get_device_properties(0).major >= 8
+ if not use_flash_attn:
+ warnings.warn(
+ "Flash Attention is disabled as it requires a GPU with Ampere (8.0) CUDA capability.",
+ category=UserWarning,
+ stacklevel=2,
+ )
+ # keep math kernel for PyTorch versions before 2.2 (Flash Attention v2 is only
+ # available on PyTorch 2.2+, while Flash Attention v1 cannot handle all cases)
+ pytorch_version = tuple(int(v) for v in torch.__version__.split(".")[:2])
+ if pytorch_version < (2, 2):
+ warnings.warn(
+ f"You are using PyTorch {torch.__version__} without Flash Attention v2 support. "
+ "Consider upgrading to PyTorch 2.2+ for Flash Attention v2 (which could be faster).",
+ category=UserWarning,
+ stacklevel=2,
+ )
+ math_kernel_on = pytorch_version < (2, 2) or not use_flash_attn
+ else:
+ old_gpu = True
+ use_flash_attn = False
+ math_kernel_on = True
+
+ return old_gpu, use_flash_attn, math_kernel_on
+
+
+OLD_GPU, USE_FLASH_ATTN, MATH_KERNEL_ON = get_sdpa_settings()
+
+
+class AttentionType:
+ """Type of attention"""
+
+ # Simple dot product attention
+ Vanilla = "Vanilla"
+
+ # Efficient attention from xformers
+ Xformer = "Xformer"
+
+ # Sparse attention
+ Sparse = "Sparse"
+
+ # Deformable attention (not compatible with text)
+ Deformable = "Deformable"
+
+
+def multi_head_attention_forward(
+ query: Tensor,
+ key: Tensor,
+ value: Tensor,
+ embed_dim_to_check: int,
+ num_heads: int,
+ in_proj_weight: Optional[Tensor],
+ in_proj_bias: Optional[Tensor],
+ bias_k: Optional[Tensor],
+ bias_v: Optional[Tensor],
+ add_zero_attn: bool,
+ dropout_p: float,
+ out_proj_weight: Tensor,
+ out_proj_bias: Optional[Tensor],
+ training: bool = True,
+ key_padding_mask: Optional[Tensor] = None,
+ need_weights: bool = True,
+ attn_mask: Optional[Tensor] = None,
+ use_separate_proj_weight: bool = False,
+ q_proj_weight: Optional[Tensor] = None,
+ k_proj_weight: Optional[Tensor] = None,
+ v_proj_weight: Optional[Tensor] = None,
+ static_k: Optional[Tensor] = None,
+ static_v: Optional[Tensor] = None,
+ average_attn_weights: bool = True,
+ is_causal: bool = False,
+ attn_type: AttentionType = AttentionType.Vanilla,
+ attn_sparsity: float = 0.0,
+ attn_bias: Optional[Tensor] = None,
+ use_fa3: bool = False,
+) -> Tuple[Tensor, Optional[Tensor]]:
+ tens_ops = (
+ query,
+ key,
+ value,
+ in_proj_weight,
+ in_proj_bias,
+ bias_k,
+ bias_v,
+ out_proj_weight,
+ out_proj_bias,
+ )
+ if has_torch_function(tens_ops):
+ return handle_torch_function(
+ multi_head_attention_forward,
+ tens_ops,
+ query,
+ key,
+ value,
+ embed_dim_to_check,
+ num_heads,
+ in_proj_weight,
+ in_proj_bias,
+ bias_k,
+ bias_v,
+ add_zero_attn,
+ dropout_p,
+ out_proj_weight,
+ out_proj_bias,
+ training=training,
+ key_padding_mask=key_padding_mask,
+ need_weights=need_weights,
+ attn_mask=attn_mask,
+ is_causal=is_causal,
+ use_separate_proj_weight=use_separate_proj_weight,
+ q_proj_weight=q_proj_weight,
+ k_proj_weight=k_proj_weight,
+ v_proj_weight=v_proj_weight,
+ static_k=static_k,
+ static_v=static_v,
+ average_attn_weights=average_attn_weights,
+ use_fa3=use_fa3,
+ )
+
+ is_batched = True
+
+ if is_causal:
+ raise NotImplementedError("is_causal is not supported in this implem")
+ attn_mask = None
+
+ if not is_batched:
+ query = query.unsqueeze(1)
+ key = key.unsqueeze(1)
+ value = value.unsqueeze(1)
+ if key_padding_mask is not None:
+ key_padding_mask = key_padding_mask.unsqueeze(0)
+
+ # set up shape vars
+ tgt_len, bsz, embed_dim = query.shape
+ src_len, _, _ = key.shape
+ if key_padding_mask is not None:
+ _kpm_dtype = key_padding_mask.dtype
+ if _kpm_dtype != torch.bool and not torch.is_floating_point(key_padding_mask):
+ raise AssertionError(
+ "only bool and floating types of key_padding_mask are supported"
+ )
+ assert embed_dim == embed_dim_to_check, (
+ f"was expecting embedding dimension of {embed_dim_to_check}, but got {embed_dim}"
+ )
+ if isinstance(embed_dim, torch.Tensor):
+ head_dim = embed_dim.div(num_heads, rounding_mode="trunc")
+ else:
+ head_dim = embed_dim // num_heads
+ assert head_dim * num_heads == embed_dim, (
+ f"embed_dim {embed_dim} not divisible by num_heads {num_heads}"
+ )
+ if use_separate_proj_weight:
+ assert key.shape[:2] == value.shape[:2], (
+ f"key's sequence and batch dims {key.shape[:2]} do not match value's {value.shape[:2]}"
+ )
+ else:
+ assert key.shape == value.shape, (
+ f"key shape {key.shape} does not match value shape {value.shape}"
+ )
+
+ #
+ # compute in-projection
+ #
+ if not use_separate_proj_weight:
+ assert in_proj_weight is not None, (
+ "use_separate_proj_weight is False but in_proj_weight is None"
+ )
+ q, k, v = F._in_projection_packed(
+ query, key, value, in_proj_weight, in_proj_bias
+ )
+ else:
+ assert q_proj_weight is not None, (
+ "use_separate_proj_weight is True but q_proj_weight is None"
+ )
+ assert k_proj_weight is not None, (
+ "use_separate_proj_weight is True but k_proj_weight is None"
+ )
+ assert v_proj_weight is not None, (
+ "use_separate_proj_weight is True but v_proj_weight is None"
+ )
+ if in_proj_bias is None:
+ b_q = b_k = b_v = None
+ else:
+ b_q, b_k, b_v = in_proj_bias.chunk(3)
+ q, k, v = F._in_projection(
+ query,
+ key,
+ value,
+ q_proj_weight,
+ k_proj_weight,
+ v_proj_weight,
+ b_q,
+ b_k,
+ b_v,
+ )
+
+ # prep attention mask
+ if attn_mask is not None:
+ if attn_mask.dtype == torch.uint8:
+ warnings.warn(
+ "Byte tensor for attn_mask in nn.MultiheadAttention is deprecated. Use bool tensor instead."
+ )
+ attn_mask = attn_mask.to(torch.bool)
+ else:
+ assert attn_mask.is_floating_point() or attn_mask.dtype == torch.bool, (
+ f"Only float, byte, and bool types are supported for attn_mask, not {attn_mask.dtype}"
+ )
+ # ensure attn_mask's dim is 3
+ if attn_mask.dim() == 2:
+ correct_2d_size = (tgt_len, src_len)
+ if attn_mask.shape != correct_2d_size:
+ raise RuntimeError(
+ f"The shape of the 2D attn_mask is {attn_mask.shape}, but should be {correct_2d_size}."
+ )
+ attn_mask = attn_mask.unsqueeze(0)
+ elif attn_mask.dim() == 3:
+ correct_3d_size = (bsz * num_heads, tgt_len, src_len)
+ if attn_mask.shape != correct_3d_size:
+ raise RuntimeError(
+ f"The shape of the 3D attn_mask is {attn_mask.shape}, but should be {correct_3d_size}."
+ )
+ else:
+ raise RuntimeError(
+ f"attn_mask's dimension {attn_mask.dim()} is not supported"
+ )
+
+ # add bias along batch dimension (currently second)
+ if bias_k is not None and bias_v is not None:
+ assert static_k is None, "bias cannot be added to static key."
+ assert static_v is None, "bias cannot be added to static value."
+ k = torch.cat([k, bias_k.repeat(1, bsz, 1)])
+ v = torch.cat([v, bias_v.repeat(1, bsz, 1)])
+ if attn_mask is not None:
+ attn_mask = F.pad(attn_mask, (0, 1))
+ if key_padding_mask is not None:
+ key_padding_mask = F.pad(key_padding_mask, (0, 1))
+ else:
+ assert bias_k is None
+ assert bias_v is None
+
+ #
+ # reshape q, k, v for multihead attention and make em batch first
+ #
+ q = q.contiguous().view(tgt_len, bsz * num_heads, head_dim).transpose(0, 1)
+ if static_k is None:
+ k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
+ else:
+ assert static_k.size(0) == bsz * num_heads, (
+ f"expecting static_k.size(0) of {bsz * num_heads}, but got {static_k.size(0)}"
+ )
+ assert static_k.size(2) == head_dim, (
+ f"expecting static_k.size(2) of {head_dim}, but got {static_k.size(2)}"
+ )
+ k = static_k
+ if static_v is None:
+ v = v.contiguous().view(v.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
+ else:
+ assert static_v.size(0) == bsz * num_heads, (
+ f"expecting static_v.size(0) of {bsz * num_heads}, but got {static_v.size(0)}"
+ )
+ assert static_v.size(2) == head_dim, (
+ f"expecting static_v.size(2) of {head_dim}, but got {static_v.size(2)}"
+ )
+ v = static_v
+
+ # add zero attention along batch dimension (now first)
+ if add_zero_attn:
+ zero_attn_shape = (bsz * num_heads, 1, head_dim)
+ k = torch.cat(
+ [k, torch.zeros(zero_attn_shape, dtype=k.dtype, device=k.device)], dim=1
+ )
+ v = torch.cat(
+ [v, torch.zeros(zero_attn_shape, dtype=v.dtype, device=v.device)], dim=1
+ )
+ if attn_mask is not None:
+ attn_mask = F.pad(attn_mask, (0, 1))
+ if key_padding_mask is not None:
+ key_padding_mask = F.pad(key_padding_mask, (0, 1))
+
+ # update source sequence length after adjustments
+ src_len = k.size(1)
+
+ # merge key padding and attention masks
+ if key_padding_mask is not None:
+ assert key_padding_mask.shape == (
+ bsz,
+ src_len,
+ ), (
+ f"expecting key_padding_mask shape of {(bsz, src_len)}, but got {key_padding_mask.shape}"
+ )
+ key_padding_mask = (
+ key_padding_mask.view(bsz, 1, 1, src_len)
+ .expand(-1, num_heads, -1, -1)
+ .reshape(bsz * num_heads, 1, src_len)
+ )
+ if attn_mask is None:
+ attn_mask = key_padding_mask
+ elif attn_mask.dtype == torch.bool:
+ attn_mask = attn_mask.logical_or(key_padding_mask)
+ else:
+ attn_mask = attn_mask.masked_fill(key_padding_mask, float("-inf"))
+
+ # convert mask to float
+ if attn_mask is not None and attn_mask.dtype == torch.bool:
+ new_attn_mask = torch.zeros_like(attn_mask, dtype=q.dtype)
+ new_attn_mask.masked_fill_(attn_mask, float("-inf"))
+ attn_mask = new_attn_mask
+
+ # adjust dropout probability
+ if not training:
+ dropout_p = 0.0
+
+ #
+ # (deep breath) calculate attention and out projection
+ #
+
+ if attn_mask is not None:
+ if attn_mask.size(0) == 1:
+ attn_mask = attn_mask.unsqueeze(0)
+ else:
+ attn_mask = attn_mask.view(bsz, num_heads, -1, src_len)
+
+ if attn_bias is not None:
+ assert attn_bias.shape == (
+ bsz,
+ num_heads,
+ tgt_len,
+ src_len,
+ ), (
+ f"expecting attn_bias shape of {(bsz, num_heads, tgt_len, src_len)}, but got {attn_bias.shape}"
+ )
+ if attn_mask is None:
+ attn_mask = attn_bias
+ else:
+ attn_mask = attn_mask + attn_bias
+
+ q = q.view(bsz, num_heads, tgt_len, head_dim)
+ k = k.view(bsz, num_heads, src_len, head_dim)
+ v = v.view(bsz, num_heads, src_len, head_dim)
+
+ if attn_type == AttentionType.Vanilla:
+ if attn_mask is None and not is_causal and use_fa3:
+ from sam3.perflib.fa3 import flash_attn_func
+
+ assert dropout_p == 0.0
+ attn_output = flash_attn_func(
+ q.transpose(1, 2), k.transpose(1, 2), v.transpose(1, 2)
+ ).transpose(1, 2)
+ else:
+ torch.backends.cuda.enable_flash_sdp(True)
+ torch.backends.cuda.enable_math_sdp(True)
+ torch.backends.cuda.enable_mem_efficient_sdp(True)
+
+ attn_output = F.scaled_dot_product_attention(
+ q, k, v, attn_mask, dropout_p, is_causal
+ )
+
+ attn_output = (
+ attn_output.permute(2, 0, 1, 3).contiguous().view(bsz * tgt_len, embed_dim)
+ )
+ elif attn_type == AttentionType.Xformer:
+ attn_output_weights = None
+ assert not need_weights, "need_weights is not supported in efficient mode"
+ attn_output = xformers.ops.memory_efficient_attention(
+ q.transpose(1, 2),
+ k.transpose(1, 2),
+ v.transpose(1, 2),
+ attn_bias=attn_mask,
+ p=dropout_p,
+ )
+ attn_output = attn_output.permute(1, 0, 2, 3).reshape(bsz * tgt_len, embed_dim)
+ elif attn_type == AttentionType.Sparse:
+ attn_output_weights = None
+ assert not need_weights, "need_weights is not supported in efficient mode"
+ # Need to collapse heads and batch dimensions
+ q = q.reshape(bsz * num_heads, tgt_len, head_dim).contiguous()
+ k = k.reshape(bsz * num_heads, src_len, head_dim).contiguous()
+ v = v.reshape(bsz * num_heads, src_len, head_dim).contiguous()
+ row_offsets, column_indices = xformers.ops.find_locations_new(
+ q, k, attn_sparsity, True
+ )
+ attn_output = xformers.ops.sparse_memory_efficient_attention(
+ q, k, v, row_offsets, column_indices, attn_bias=attn_mask
+ ).reshape(bsz, num_heads, tgt_len, head_dim)
+ attn_output = attn_output.permute(2, 0, 1, 3).reshape(bsz * tgt_len, embed_dim)
+ else:
+ raise ValueError(f"Unsupported attention type {attn_type}")
+
+ attn_output = F.linear(attn_output, out_proj_weight, out_proj_bias)
+ attn_output = attn_output.view(tgt_len, bsz, attn_output.size(1))
+
+ if need_weights:
+ attn_output_weights = (q * head_dim**-0.5) @ k.transpose(-2, -1)
+ attn_output_weights = attn_output_weights.softmax(dim=-1)
+ attn_output_weights = attn_output_weights.view(bsz, num_heads, tgt_len, src_len)
+ if average_attn_weights:
+ attn_output_weights = attn_output_weights.sum(dim=1) / num_heads
+
+ if not is_batched:
+ attn_output = attn_output.squeeze(1)
+ attn_output_weights = attn_output_weights.squeeze(0)
+ return attn_output, attn_output_weights
+ else:
+ attn_output_weights = None
+ if not is_batched:
+ attn_output = attn_output.squeeze(1)
+ return attn_output, None
+
+
+class MultiheadAttention(nn.Module):
+ __constants__ = ["batch_first"]
+ bias_k: Optional[torch.Tensor]
+ bias_v: Optional[torch.Tensor]
+
+ def __init__(
+ self,
+ embed_dim,
+ num_heads,
+ dropout=0.0,
+ bias=True,
+ add_bias_kv=False,
+ add_zero_attn=False,
+ kdim=None,
+ vdim=None,
+ batch_first=False,
+ device=None,
+ dtype=None,
+ attn_type: AttentionType = AttentionType.Vanilla,
+ sparsity: float = 0.0,
+ use_act_checkpoint: bool = False,
+ use_fa3: bool = False,
+ ) -> None:
+ factory_kwargs = {"device": device, "dtype": dtype}
+ super(MultiheadAttention, self).__init__()
+ self.embed_dim = embed_dim
+ self.kdim = kdim if kdim is not None else embed_dim
+ self.vdim = vdim if vdim is not None else embed_dim
+ self._qkv_same_embed_dim = self.kdim == embed_dim and self.vdim == embed_dim
+
+ self.num_heads = num_heads
+ self.dropout = dropout
+ self.batch_first = batch_first
+ self.head_dim = embed_dim // num_heads
+ self.use_act_checkpoint = use_act_checkpoint
+ assert self.head_dim * num_heads == self.embed_dim, (
+ "embed_dim must be divisible by num_heads"
+ )
+
+ assert attn_type == AttentionType.Sparse or sparsity == 0.0, (
+ "sparsity is only supported for sparse attention"
+ )
+
+ if not self._qkv_same_embed_dim:
+ self.q_proj_weight = nn.Parameter(
+ torch.empty((embed_dim, embed_dim), **factory_kwargs)
+ )
+ self.k_proj_weight = nn.Parameter(
+ torch.empty((embed_dim, self.kdim), **factory_kwargs)
+ )
+ self.v_proj_weight = nn.Parameter(
+ torch.empty((embed_dim, self.vdim), **factory_kwargs)
+ )
+ self.register_parameter("in_proj_weight", None)
+ else:
+ self.in_proj_weight = nn.Parameter(
+ torch.empty((3 * embed_dim, embed_dim), **factory_kwargs)
+ )
+ self.register_parameter("q_proj_weight", None)
+ self.register_parameter("k_proj_weight", None)
+ self.register_parameter("v_proj_weight", None)
+
+ if bias:
+ self.in_proj_bias = nn.Parameter(
+ torch.empty(3 * embed_dim, **factory_kwargs)
+ )
+ else:
+ self.register_parameter("in_proj_bias", None)
+ self.out_proj = nn.modules.linear.NonDynamicallyQuantizableLinear(
+ embed_dim, embed_dim, bias=bias, **factory_kwargs
+ )
+
+ if add_bias_kv:
+ self.bias_k = nn.Parameter(torch.empty((1, 1, embed_dim), **factory_kwargs))
+ self.bias_v = nn.Parameter(torch.empty((1, 1, embed_dim), **factory_kwargs))
+ else:
+ self.bias_k = self.bias_v = None
+
+ self.add_zero_attn = add_zero_attn
+
+ self.attn_type = attn_type
+ self.sparsity = sparsity
+ self.use_fa3 = use_fa3
+
+ self._reset_parameters()
+
+ def _reset_parameters(self):
+ if self._qkv_same_embed_dim:
+ nn.init.xavier_uniform_(self.in_proj_weight)
+ else:
+ nn.init.xavier_uniform_(self.q_proj_weight)
+ nn.init.xavier_uniform_(self.k_proj_weight)
+ nn.init.xavier_uniform_(self.v_proj_weight)
+
+ if self.in_proj_bias is not None:
+ nn.init.constant_(self.in_proj_bias, 0.0)
+ nn.init.constant_(self.out_proj.bias, 0.0)
+ if self.bias_k is not None:
+ nn.init.xavier_normal_(self.bias_k)
+ if self.bias_v is not None:
+ nn.init.xavier_normal_(self.bias_v)
+
+ def __setstate__(self, state):
+ if "_qkv_same_embed_dim" not in state:
+ state["_qkv_same_embed_dim"] = True
+
+ super(MultiheadAttention, self).__setstate__(state)
+
+ def forward(
+ self,
+ query: Tensor,
+ key: Tensor,
+ value: Tensor,
+ key_padding_mask: Optional[Tensor] = None,
+ need_weights: bool = False,
+ attn_mask: Optional[Tensor] = None,
+ average_attn_weights: bool = True,
+ attn_bias: Optional[Tensor] = None,
+ ) -> Tuple[Tensor, Optional[Tensor]]:
+ is_batched = query.dim() == 3
+ if key_padding_mask is not None:
+ _kpm_dtype = key_padding_mask.dtype
+ if _kpm_dtype != torch.bool and not torch.is_floating_point(
+ key_padding_mask
+ ):
+ raise AssertionError(
+ "only bool and floating types of key_padding_mask are supported"
+ )
+
+ if self.batch_first and is_batched:
+ if key is value:
+ if query is key:
+ query = key = value = query.transpose(1, 0)
+ else:
+ query, key = [x.transpose(1, 0) for x in (query, key)]
+ value = key
+ else:
+ query, key, value = [x.transpose(1, 0) for x in (query, key, value)]
+
+ if not self._qkv_same_embed_dim:
+ if self.use_act_checkpoint:
+ attn_output, attn_output_weights = torch.utils.checkpoint.checkpoint(
+ multi_head_attention_forward,
+ query,
+ key,
+ value,
+ self.embed_dim,
+ self.num_heads,
+ self.in_proj_weight,
+ self.in_proj_bias,
+ self.bias_k,
+ self.bias_v,
+ self.add_zero_attn,
+ self.dropout,
+ self.out_proj.weight,
+ self.out_proj.bias,
+ use_reentrant=False,
+ training=self.training,
+ key_padding_mask=key_padding_mask,
+ need_weights=need_weights,
+ attn_mask=attn_mask,
+ use_separate_proj_weight=True,
+ q_proj_weight=self.q_proj_weight,
+ k_proj_weight=self.k_proj_weight,
+ v_proj_weight=self.v_proj_weight,
+ average_attn_weights=average_attn_weights,
+ attn_type=self.attn_type,
+ attn_sparsity=self.sparsity,
+ attn_bias=attn_bias,
+ use_fa3=self.use_fa3,
+ )
+ else:
+ attn_output, attn_output_weights = multi_head_attention_forward(
+ query,
+ key,
+ value,
+ self.embed_dim,
+ self.num_heads,
+ self.in_proj_weight,
+ self.in_proj_bias,
+ self.bias_k,
+ self.bias_v,
+ self.add_zero_attn,
+ self.dropout,
+ self.out_proj.weight,
+ self.out_proj.bias,
+ training=self.training,
+ key_padding_mask=key_padding_mask,
+ need_weights=need_weights,
+ attn_mask=attn_mask,
+ use_separate_proj_weight=True,
+ q_proj_weight=self.q_proj_weight,
+ k_proj_weight=self.k_proj_weight,
+ v_proj_weight=self.v_proj_weight,
+ average_attn_weights=average_attn_weights,
+ attn_type=self.attn_type,
+ attn_sparsity=self.sparsity,
+ attn_bias=attn_bias,
+ use_fa3=self.use_fa3,
+ )
+ else:
+ if self.use_act_checkpoint:
+ attn_output, attn_output_weights = torch.utils.checkpoint.checkpoint(
+ multi_head_attention_forward,
+ query,
+ key,
+ value,
+ self.embed_dim,
+ self.num_heads,
+ self.in_proj_weight,
+ self.in_proj_bias,
+ self.bias_k,
+ self.bias_v,
+ self.add_zero_attn,
+ self.dropout,
+ self.out_proj.weight,
+ self.out_proj.bias,
+ use_reentrant=False,
+ training=self.training,
+ key_padding_mask=key_padding_mask,
+ need_weights=need_weights,
+ attn_mask=attn_mask,
+ average_attn_weights=average_attn_weights,
+ attn_type=self.attn_type,
+ attn_sparsity=self.sparsity,
+ attn_bias=attn_bias,
+ )
+ else:
+ attn_output, attn_output_weights = multi_head_attention_forward(
+ query,
+ key,
+ value,
+ self.embed_dim,
+ self.num_heads,
+ self.in_proj_weight,
+ self.in_proj_bias,
+ self.bias_k,
+ self.bias_v,
+ self.add_zero_attn,
+ self.dropout,
+ self.out_proj.weight,
+ self.out_proj.bias,
+ training=self.training,
+ key_padding_mask=key_padding_mask,
+ need_weights=need_weights,
+ attn_mask=attn_mask,
+ average_attn_weights=average_attn_weights,
+ attn_type=self.attn_type,
+ attn_sparsity=self.sparsity,
+ attn_bias=attn_bias,
+ )
+ if self.batch_first and is_batched:
+ return attn_output.transpose(1, 0), attn_output_weights
+ else:
+ return attn_output, attn_output_weights
+
+
+# Keep backward compatibility alias
+MultiheadAttentionWrapper = MultiheadAttention
class DotProductScoring(torch.nn.Module):
diff --git a/sam3/model/multiplex_mask_decoder.py b/sam3/model/multiplex_mask_decoder.py
new file mode 100644
index 0000000..d1f4481
--- /dev/null
+++ b/sam3/model/multiplex_mask_decoder.py
@@ -0,0 +1,470 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+from typing import List, Optional, Type
+
+import torch
+from sam3.sam.common import LayerNorm2d
+from torch import nn
+from torch.nn import functional as F
+
+
+class MultiplexMaskDecoder(nn.Module):
+ def __init__(
+ self,
+ *,
+ transformer_dim: int,
+ transformer: nn.Module,
+ multiplex_count: int,
+ num_multimask_outputs: int = 3,
+ activation: Type[nn.Module] = nn.GELU,
+ iou_head_depth: int = 3,
+ iou_head_hidden_dim: int = 256,
+ use_high_res_features: bool = False,
+ iou_prediction_use_sigmoid: bool = False,
+ dynamic_multimask_via_stability=False,
+ dynamic_multimask_stability_delta=0.05,
+ dynamic_multimask_stability_thresh=0.98,
+ pred_obj_scores: bool = False,
+ pred_obj_scores_mlp: bool = False,
+ use_multimask_token_for_obj_ptr: bool = False,
+ decode_mask_with_shared_tokens: bool = False,
+ decode_mask_attribute_with_shared_tokens: bool = False,
+ multimask_outputs_only: bool = False,
+ ) -> None:
+ """
+ Predicts masks given an image and prompt embeddings, using a
+ transformer architecture with multiplex capabilities.
+
+ Arguments:
+ multiplex_count: the number of masks multiplexed into a single feature map
+ num_multimask_outputs: the number of masks to predict per multiplex output
+ (the total number of masks is (num_multimask_outputs+1) * multiplex_count)
+ use_multimask_token_for_obj_ptr: whether to use multimask tokens for object pointers
+ decode_mask_with_shared_tokens: use the same mask token for multimasks with different projection layers
+ decode_mask_attribute_with_shared_tokens: use the mask tokens (instead of separate tokens)
+ to predict iou and object scores
+ multimask_outputs_only: predict num_multimask_outputs masks without the single
+ mask output token (i.e., without the +1)
+ """
+ super().__init__()
+ self.transformer_dim = transformer_dim
+ self.transformer = transformer
+
+ self.multiplex_count = multiplex_count
+ self.num_multimask_outputs = num_multimask_outputs
+ self.multimask_outputs_only = multimask_outputs_only
+ self.decode_mask_with_shared_tokens = decode_mask_with_shared_tokens
+ self.decode_mask_attribute_with_shared_tokens = (
+ decode_mask_attribute_with_shared_tokens
+ )
+
+ if self.decode_mask_with_shared_tokens:
+ assert multimask_outputs_only, (
+ "multimask_outputs_only must be True if decode_mask_with_shared_tokens"
+ )
+
+ if self.multimask_outputs_only:
+ self.num_mask_output_per_object = num_multimask_outputs
+ else:
+ self.num_mask_output_per_object = num_multimask_outputs + 1
+
+ if self.decode_mask_with_shared_tokens:
+ self.num_mask_tokens = multiplex_count
+ else:
+ self.num_mask_tokens = multiplex_count * self.num_mask_output_per_object
+
+ self.pred_obj_scores = pred_obj_scores
+ self.use_multimask_token_for_obj_ptr = use_multimask_token_for_obj_ptr
+
+ if not self.decode_mask_attribute_with_shared_tokens:
+ self.iou_token = nn.Embedding(multiplex_count, transformer_dim)
+ if self.pred_obj_scores:
+ self.obj_score_token = nn.Embedding(multiplex_count, transformer_dim)
+
+ self.mask_tokens = nn.Embedding(self.num_mask_tokens, transformer_dim)
+
+ self.output_upscaling = nn.Sequential(
+ nn.ConvTranspose2d(
+ transformer_dim, transformer_dim // 4, kernel_size=2, stride=2
+ ),
+ LayerNorm2d(transformer_dim // 4),
+ activation(),
+ nn.ConvTranspose2d(
+ transformer_dim // 4, transformer_dim // 8, kernel_size=2, stride=2
+ ),
+ activation(),
+ )
+ self.use_high_res_features = use_high_res_features
+ if use_high_res_features:
+ self.conv_s0 = nn.Conv2d(
+ transformer_dim, transformer_dim // 8, kernel_size=1, stride=1
+ )
+ self.conv_s1 = nn.Conv2d(
+ transformer_dim, transformer_dim // 4, kernel_size=1, stride=1
+ )
+
+ if self.num_multimask_outputs == 0:
+ self.output_hypernetworks_mlp = MLP(
+ transformer_dim, transformer_dim, transformer_dim // 8, 3
+ )
+ else:
+ self.output_hypernetworks_mlps = nn.ModuleList(
+ [
+ MLP(transformer_dim, transformer_dim, transformer_dim // 8, 3)
+ for _ in range(self.num_mask_output_per_object)
+ ]
+ )
+
+ self.iou_prediction_head = MLP(
+ transformer_dim,
+ iou_head_hidden_dim,
+ (
+ 1
+ if (
+ self.decode_mask_attribute_with_shared_tokens
+ and not self.decode_mask_with_shared_tokens
+ )
+ else self.num_mask_output_per_object
+ ),
+ iou_head_depth,
+ sigmoid_output=iou_prediction_use_sigmoid,
+ )
+
+ if self.pred_obj_scores:
+ self.pred_obj_score_head = nn.Linear(transformer_dim, 1)
+ if pred_obj_scores_mlp:
+ self.pred_obj_score_head = MLP(transformer_dim, transformer_dim, 1, 3)
+
+ # When outputting a single mask, optionally we can dynamically fall back to the best
+ # multimask output token if the single mask output token gives low stability scores.
+ self.dynamic_multimask_via_stability = dynamic_multimask_via_stability
+ self.dynamic_multimask_stability_delta = dynamic_multimask_stability_delta
+ self.dynamic_multimask_stability_thresh = dynamic_multimask_stability_thresh
+
+ def forward(
+ self,
+ image_embeddings: torch.Tensor,
+ image_pe: torch.Tensor,
+ multimask_output: bool,
+ high_res_features: Optional[List[torch.Tensor]] = None,
+ extra_per_object_embeddings: Optional[torch.Tensor] = None,
+ ) -> dict[str, torch.Tensor]:
+ """
+ Predict masks given image and prompt embeddings.
+
+ Arguments:
+ image_embeddings (torch.Tensor): the embeddings from the image encoder
+ image_pe (torch.Tensor): positional encoding with the shape of image_embeddings
+ extra_per_object_embeddings (torch.Tensor): a tensor with shape b * multiplex_count * C to be added to the mask tokens
+
+ Returns: a dict of Tensors indexed by strings
+ masks: batched predicted masks
+ iou_pred: batched predictions of mask quality
+ object_score_logits: batched predictions of object existence
+ """
+
+ if self.num_multimask_outputs <= 0:
+ assert not multimask_output, (
+ f"multimask_output must be False with {self.num_multimask_outputs=}"
+ )
+
+ if self.multimask_outputs_only:
+ assert multimask_output, (
+ f"multimask_output must be True with {self.multimask_outputs_only=}"
+ )
+
+ out = self.predict_masks(
+ image_embeddings=image_embeddings,
+ image_pe=image_pe,
+ high_res_features=high_res_features,
+ extra_per_object_embeddings=extra_per_object_embeddings,
+ )
+
+ masks = out["masks"] # [B, M, (self.num_mask_token_per_object), H, W]
+ iou_pred = out["iou_pred"] # [B, M, (self.num_mask_token_per_object)]
+ mask_tokens_out = out[
+ "mask_tokens_out"
+ ] # [B, M, (self.num_mask_token_per_object), C]
+
+ # Select the correct mask or masks for output
+ if multimask_output:
+ if not self.multimask_outputs_only:
+ masks = masks[:, :, 1:, :, :]
+ iou_pred = iou_pred[:, :, 1:]
+ elif self.dynamic_multimask_via_stability and not self.training:
+ masks, iou_pred = self._dynamic_multimask_via_stability(masks, iou_pred)
+ else:
+ masks = masks[:, :, 0:1, :, :]
+ iou_pred = iou_pred[:, :, 0:1]
+
+ if multimask_output and self.use_multimask_token_for_obj_ptr:
+ if self.multimask_outputs_only:
+ sam_tokens_out = mask_tokens_out
+ else:
+ sam_tokens_out = mask_tokens_out[
+ :, :, 1:
+ ] # [B, M, num_multimask_outputs, C] shape
+ else:
+ # Take the mask output token. Here we *always* use the token for single mask output.
+ # At test time, even if we track after 1-click (and using multimask_output=True),
+ # we still take the single mask token here. The rationale is that we always track
+ # after multiple clicks during training, so the past tokens seen during training
+ # are always the single mask token (and we'll let it be the object-memory token).
+ sam_tokens_out = mask_tokens_out[:, :, 0:1] # [B, M, 1, C] shape
+
+ del out["mask_tokens_out"]
+ out["masks"] = masks
+ out["iou_pred"] = iou_pred
+ out["sam_tokens_out"] = sam_tokens_out
+
+ if multimask_output:
+ assert masks.shape[2] == self.num_mask_output_per_object, (
+ f"{masks.shape=}, {self.num_mask_output_per_object=}"
+ )
+ assert iou_pred.shape[2] == self.num_mask_output_per_object, (
+ f"{iou_pred.shape=}, {self.num_mask_output_per_object=}"
+ )
+ if self.use_multimask_token_for_obj_ptr:
+ if self.decode_mask_with_shared_tokens:
+ assert sam_tokens_out.shape[2] == 1, f"{sam_tokens_out.shape=}"
+ else:
+ assert sam_tokens_out.shape[2] == self.num_mask_output_per_object, (
+ f"{sam_tokens_out.shape=}, {self.num_mask_output_per_object=}"
+ )
+ else:
+ assert masks.shape[2] == 1, f"{masks.shape=}"
+ assert iou_pred.shape[2] == 1, f"{iou_pred.shape=}"
+ assert sam_tokens_out.shape[2] == 1, f"{sam_tokens_out.shape=}"
+
+ return out
+
+ def predict_masks(
+ self,
+ image_embeddings: torch.Tensor,
+ image_pe: torch.Tensor,
+ high_res_features: Optional[List[torch.Tensor]] = None,
+ extra_per_object_embeddings: Optional[
+ torch.Tensor
+ ] = None, # num_buckets, multiplex_count, C
+ ) -> dict[str, torch.Tensor]:
+ """Predicts masks. See 'forward' for more details."""
+ # Concatenate output tokens
+ B = image_embeddings.shape[0]
+ token_list = []
+ if self.pred_obj_scores and not self.decode_mask_attribute_with_shared_tokens:
+ token_list.append(self.obj_score_token.weight)
+ if not self.decode_mask_attribute_with_shared_tokens:
+ token_list.append(self.iou_token.weight)
+
+ tokens = torch.cat(token_list, dim=0)
+ tokens = tokens.unsqueeze(0).expand(B, -1, -1)
+
+ if extra_per_object_embeddings is not None:
+ mask_tokens = self.mask_tokens.weight.view(
+ 1, self.multiplex_count, self.num_mask_output_per_object, -1
+ ).expand(B, -1, -1, -1)
+
+ mask_tokens = mask_tokens + extra_per_object_embeddings.unsqueeze(2)
+ mask_tokens = mask_tokens.flatten(1, 2)
+ else:
+ mask_tokens = self.mask_tokens.weight.unsqueeze(0).expand(B, -1, -1)
+
+ tokens = torch.cat([tokens, mask_tokens], dim=1)
+
+ src = image_embeddings
+
+ assert image_pe.size(0) == 1, (
+ "image_pe should have size 1 in batch dim (from `get_dense_pe()`)"
+ )
+ pos_src = torch.repeat_interleave(image_pe, tokens.shape[0], dim=0)
+ b, c, h, w = src.shape
+
+ # Run the transformer
+ hs, src = self.transformer(src, pos_src, tokens)
+
+ # Parse transformer outputs based on token sharing configuration
+ if self.decode_mask_attribute_with_shared_tokens:
+ assert hs.shape[1] == self.num_mask_tokens, (
+ f"{hs.shape=}, {self.num_mask_tokens=}"
+ )
+ iou_token_out = mask_tokens_out = hs[:, 0 : self.num_mask_tokens]
+ if self.pred_obj_scores:
+ obj_score_token_out = mask_tokens_out
+ else:
+ # Separate tokens for each prediction type
+ s = 0
+ if self.pred_obj_scores:
+ obj_score_token_out = hs[:, s : s + self.multiplex_count, :]
+ s += self.multiplex_count
+
+ iou_token_out = hs[:, s : s + self.multiplex_count, :]
+ s += self.multiplex_count
+ mask_tokens_out = hs[:, s : s + self.num_mask_tokens, :]
+ assert hs.shape[1] == s + self.num_mask_tokens, (
+ f"{hs.shape=}, {s=}, {self.num_mask_tokens=}"
+ )
+
+ # Upscale mask embeddings and predict masks using the mask tokens
+ src = src.transpose(1, 2).view(b, c, h, w)
+ if not self.use_high_res_features:
+ upscaled_embedding = self.output_upscaling(src)
+ else:
+ dc1, ln1, act1, dc2, act2 = self.output_upscaling
+ feat_s0, feat_s1 = high_res_features
+ upscaled_embedding = act1(ln1(dc1(src) + feat_s1))
+ upscaled_embedding = act2(dc2(upscaled_embedding) + feat_s0)
+
+ if self.decode_mask_with_shared_tokens:
+ mask_tokens_out = mask_tokens_out.view(B, self.multiplex_count, 1, -1)
+ else:
+ mask_tokens_out = mask_tokens_out.view(
+ B, self.multiplex_count, self.num_mask_output_per_object, -1
+ )
+ if self.num_multimask_outputs == 0:
+ hyper_in = self.output_hypernetworks_mlp(
+ mask_tokens_out[:, :, 0, :]
+ ).unsqueeze(2) # [B, M, 1, C]
+ else:
+ hyper_in_list: List[torch.Tensor] = []
+ for i in range(self.num_mask_output_per_object):
+ if self.decode_mask_with_shared_tokens:
+ hyper_in_list.append(
+ self.output_hypernetworks_mlps[i](mask_tokens_out[:, :, 0, :])
+ )
+ else:
+ hyper_in_list.append(
+ self.output_hypernetworks_mlps[i](mask_tokens_out[:, :, i, :])
+ )
+ # hyper_in: [B, M, num_multimask_outputs+1, C]
+ hyper_in = torch.stack(hyper_in_list, dim=2)
+
+ # generate the masks
+ b, c, h, w = upscaled_embedding.shape
+ masks = torch.bmm(
+ hyper_in.flatten(1, 2), upscaled_embedding.view(b, c, h * w)
+ ).view(b, self.multiplex_count, self.num_mask_output_per_object, h, w)
+
+ # Generate mask quality predictions, with shape b * multiplex_count * (num_multimask_outputs+1)
+ iou_pred = self.iou_prediction_head(iou_token_out).view(
+ b, self.multiplex_count, self.num_mask_output_per_object
+ )
+
+ if self.pred_obj_scores:
+ # Generate mask quality predictions, with shape b * (num_multimask_outputs+1)
+ if (
+ self.decode_mask_attribute_with_shared_tokens
+ and not self.decode_mask_with_shared_tokens
+ ):
+ object_score_logits = (
+ self.pred_obj_score_head(obj_score_token_out)
+ .view(b, self.multiplex_count, self.num_mask_output_per_object)
+ .sum(-1, keepdim=True)
+ )
+ else:
+ object_score_logits = self.pred_obj_score_head(obj_score_token_out)
+ else:
+ # Obj scores logits - default to 10.0, i.e. assuming the object is present, sigmoid(10)=1
+ object_score_logits = 10.0 * iou_pred.new_ones(
+ iou_pred.shape[0], iou_pred.shape[1]
+ )
+
+ outputs = {
+ "masks": masks,
+ "iou_pred": iou_pred,
+ "mask_tokens_out": mask_tokens_out,
+ "object_score_logits": object_score_logits,
+ }
+
+ return outputs
+
+ def _get_stability_scores(self, mask_logits):
+ """
+ Compute stability scores of the mask logits based on the IoU between upper and
+ lower thresholds.
+ """
+ mask_logits = mask_logits.flatten(-2)
+ stability_delta = self.dynamic_multimask_stability_delta
+ area_i = torch.sum(mask_logits > stability_delta, dim=-1).float()
+ area_u = torch.sum(mask_logits > -stability_delta, dim=-1).float()
+ stability_scores = torch.where(area_u > 0, area_i / area_u, 1.0)
+ return stability_scores
+
+ def _dynamic_multimask_via_stability(self, all_mask_logits, all_iou_scores):
+ """
+ When outputting a single mask, if the stability score from the current single-mask
+ output (based on output token 0) falls below a threshold, we instead select from
+ multi-mask outputs (based on output token 1~3) the mask with the highest predicted
+ IoU score. This is intended to ensure a valid mask for both clicking and tracking.
+ """
+ # first, flatten the batch and the multiplex dimensions
+ B, M = all_mask_logits.shape[:2]
+ all_mask_logits = all_mask_logits.flatten(0, 1)
+ all_iou_scores = all_iou_scores.flatten(0, 1)
+
+ # The best mask from multimask output tokens (1~3)
+ multimask_logits = all_mask_logits[:, 1:, :, :]
+ multimask_iou_scores = all_iou_scores[:, 1:]
+ best_scores_inds = torch.argmax(multimask_iou_scores, dim=-1)
+ batch_inds = torch.arange(
+ multimask_iou_scores.size(0), device=all_iou_scores.device
+ )
+ best_multimask_logits = multimask_logits[batch_inds, best_scores_inds]
+ best_multimask_logits = best_multimask_logits.unsqueeze(1)
+ best_multimask_iou_scores = multimask_iou_scores[batch_inds, best_scores_inds]
+ best_multimask_iou_scores = best_multimask_iou_scores.unsqueeze(1)
+
+ # The mask from singlemask output token 0 and its stability score
+ singlemask_logits = all_mask_logits[:, 0:1, :, :]
+ singlemask_iou_scores = all_iou_scores[:, 0:1]
+ stability_scores = self._get_stability_scores(singlemask_logits)
+ is_stable = stability_scores >= self.dynamic_multimask_stability_thresh
+
+ # Dynamically fall back to best multimask output upon low stability scores.
+ mask_logits_out = torch.where(
+ is_stable[..., None, None].expand_as(singlemask_logits),
+ singlemask_logits,
+ best_multimask_logits,
+ )
+ iou_scores_out = torch.where(
+ is_stable.expand_as(singlemask_iou_scores),
+ singlemask_iou_scores,
+ best_multimask_iou_scores,
+ )
+
+ # restore the batch and multiplex dimensions
+ mask_logits_out = mask_logits_out.unflatten(0, (B, M))
+ iou_scores_out = iou_scores_out.unflatten(0, (B, M))
+
+ return mask_logits_out, iou_scores_out
+
+
+# Lightly adapted from
+# https://github.com/facebookresearch/MaskFormer/blob/main/mask_former/modeling/transformer/transformer_predictor.py # noqa
+class MLP(nn.Module):
+ def __init__(
+ self,
+ input_dim: int,
+ hidden_dim: int,
+ output_dim: int,
+ num_layers: int,
+ sigmoid_output: bool = False,
+ ) -> None:
+ super().__init__()
+ self.num_layers = num_layers
+ h = [hidden_dim] * (num_layers - 1)
+ self.layers = nn.ModuleList(
+ nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim])
+ )
+ self.sigmoid_output = sigmoid_output
+
+ def forward(self, x):
+ for i, layer in enumerate(self.layers):
+ x = F.relu(layer(x)) if i < self.num_layers - 1 else layer(x)
+ if self.sigmoid_output:
+ x = F.sigmoid(x)
+ return x
diff --git a/sam3/model/multiplex_utils.py b/sam3/model/multiplex_utils.py
new file mode 100644
index 0000000..73f5866
--- /dev/null
+++ b/sam3/model/multiplex_utils.py
@@ -0,0 +1,538 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
+
+# pyre-unsafe
+
+import logging
+import math
+from typing import Optional
+
+import torch
+from torch import nn
+
+# Special values for object tracking
+_PADDING_NUM = -1 # Marks empty slots in buckets
+_REMOVED_NUM = -1116 # Marks objects that have been removed
+
+
+logger = logging.getLogger(__name__)
+
+
+class MultiplexState:
+ """
+ MultiplexState records the state of multiplexing, for one or more buckets.
+
+ At a high level, we deal with the conversion of tensors between the data space (batch_size, num_channels, ...)
+ and the multiplex space (num_buckets, multiplex_count, num_channels, ...).
+
+ The multiplex state stores the assignments of each batch element to a slot in a bucket.
+ Each bucket has a fixed number of slots (multiplex_count), and not all slots need to be filled.
+ The batch size should equate to total_valid_entries, which is the sum of the number of valid entries in each bucket.
+
+ There are two main operations that this class supports:
+ mux: convert tensors in the data space to the multiplex space.
+ The mental model is that we start from a tensor of zeros that has the shape of the output,
+ then we go through the valid entries and place them into the corresponding slots, indicated by the assignments.
+
+ demux: convert tensors in the multiplex space to the data space.
+ This is the reverse operation of mux. Note that zeros were used in mux for the padding slots,
+ and that those slots are ignored in demux.
+
+ There are also two utility functions for object mangement:
+ add_objects: add new objects to the state by filling in empty slots
+ remove_objects: remove objects from the state by marking them as removed (not the same as empty!)
+ """
+
+ def __init__(
+ self,
+ assignments: list[list[int]],
+ device: torch.device,
+ dtype: torch.dtype,
+ allowed_bucket_capacity: int,
+ *,
+ object_ids: Optional[list[int]] = None,
+ ):
+ """
+ assignments: a list of lists of object indices
+ Each top-level list represents a bucket
+ Each inner list represents the object indices that are in the bucket
+ The object indices must ranges from 0 to num_valid_entries - 1, except for the following special values (all negatives):
+ _PADDING_NUM, which denotes padding entries
+ _REMOVED_NUM, which denotes an pre-existing object that got removed (currently not used during init)
+ If you wish to save the "true" object IDs, i.e., during inference, you can bookkeep them here
+ """
+ self.device = device
+ self.dtype = dtype
+
+ # Initialize bucket assignments and precompute matrices
+ self.allowed_bucket_capacity = allowed_bucket_capacity
+ self._initialize_assignments(assignments, object_ids=object_ids)
+
+ def _initialize_assignments(
+ self, assignments: list[list[int]], *, object_ids: Optional[list[int]] = None
+ ):
+ self.assignments = assignments
+ self.num_buckets = len(self.assignments)
+ if self.num_buckets == 0:
+ logger.error("No buckets found in the state")
+ raise ValueError("No buckets found in the state")
+
+ self.multiplex_count = len(self.assignments[0])
+ assert all(
+ len(self.assignments[i]) == self.multiplex_count
+ for i in range(self.num_buckets)
+ )
+
+ # number of non-negative elements in the state
+ self.total_valid_entries = sum(
+ sum(1 for x in bucket if x >= 0) for bucket in self.assignments
+ )
+ self.total_non_padding_entries = sum(
+ sum(1 for x in bucket if x != _PADDING_NUM) for bucket in self.assignments
+ )
+
+ # check the validity of the object IDs
+ self.object_ids = object_ids
+ if self.object_ids is not None:
+ assert len(self.object_ids) == self.total_valid_entries, (
+ "object_ids should map 1:1 to the valid entries"
+ )
+
+ # check the validity of the assignments
+ all_object_idxs = set()
+ for bucket in self.assignments:
+ valid_entries_in_bucket = sum(1 for x in bucket if x != _PADDING_NUM)
+ assert valid_entries_in_bucket <= self.allowed_bucket_capacity, (
+ f"{valid_entries_in_bucket=} > {self.allowed_bucket_capacity=}"
+ )
+ for obj_idx in bucket:
+ if obj_idx >= 0:
+ assert obj_idx < self.total_non_padding_entries, (
+ f"object ID {obj_idx} >= {self.total_non_padding_entries}"
+ )
+ assert obj_idx not in all_object_idxs, "object IDs must be unique"
+ all_object_idxs.add(obj_idx)
+
+ # Precompute and cache the actual selection matrices
+ self._precompute_transition_matrices(self.device, self.dtype)
+
+ @property
+ def available_slots(self) -> int:
+ # returns the number of available slots in the state
+ return (
+ self.num_buckets * self.allowed_bucket_capacity
+ - self.total_non_padding_entries
+ )
+
+ def find_next_batch_of_available_indices(
+ self,
+ num_objects: int,
+ *,
+ allow_new_buckets: bool = False,
+ prefer_new_buckets: bool = False,
+ ) -> list[int]:
+ # produce a list of consecutive indices that are available in the state
+ # Note: prefer_new_buckets parameter is accepted for API compatibility but not used here
+ # as the actual bucket allocation logic is in add_objects()
+ assert num_objects > 0, f"{num_objects=} must be positive"
+ if not allow_new_buckets:
+ assert self.available_slots >= num_objects, (
+ f"not enough available slots {self.available_slots} < {num_objects}"
+ )
+
+ return list(
+ range(
+ self.total_valid_entries,
+ self.total_valid_entries + num_objects,
+ )
+ )
+
+ def add_objects(
+ self,
+ object_indices: list[int],
+ *,
+ object_ids: Optional[list[int]] = None,
+ allow_new_buckets: bool = False,
+ prefer_new_buckets: bool = False,
+ ):
+ """
+ Add new objects to the state by filling in empty slots and
+ creating new buckets if necessary.
+
+ object_indices must be sorted and follow existing object indices.
+ If prefer_new_buckets is True, we skip filling existing slots and place
+ the objects into freshly created buckets (requires allow_new_buckets=True).
+ """
+ if len(object_indices) == 0:
+ return
+
+ # we will modify this in-place
+ object_indices = object_indices.copy()
+ assert (object_ids is None) == (self.object_ids is None), (
+ "object_ids must either be always given or always omitted"
+ )
+
+ if object_ids is not None:
+ assert len(object_ids) == len(object_indices), (
+ "object_ids must have the same length as object_indices"
+ )
+ object_ids = object_ids.copy()
+
+ num_new_objects = len(object_indices)
+ assert object_indices == sorted(object_indices), "object_indices must be sorted"
+ object_indices.reverse() # reverse so we can pop from the end
+ if object_ids is not None:
+ object_ids.reverse()
+
+ if prefer_new_buckets:
+ assert allow_new_buckets, "prefer_new_buckets requires allow_new_buckets"
+
+ slots_filled = 0
+ buckets_created = 0
+
+ def _pop_next():
+ idx = object_indices.pop()
+ if object_ids is not None and self.object_ids is not None:
+ self.object_ids.append(object_ids.pop())
+ return idx
+
+ if not prefer_new_buckets:
+ # Fill empty slots in existing buckets first
+ for bucket in self.assignments:
+ for i in range(self.allowed_bucket_capacity):
+ if bucket[i] == _PADDING_NUM:
+ bucket[i] = _pop_next()
+ slots_filled += 1
+ if len(object_indices) == 0:
+ break
+ if len(object_indices) == 0:
+ break
+
+ if len(object_indices) > 0 and not allow_new_buckets:
+ raise ValueError(
+ f"Cannot place objects {list(reversed(object_indices))} without creating new buckets"
+ )
+
+ # Create new buckets for remaining objects (or all objects if prefer_new_buckets)
+ while len(object_indices) > 0:
+ new_bucket = [_PADDING_NUM] * self.multiplex_count
+ for i in range(self.allowed_bucket_capacity):
+ if len(object_indices) == 0:
+ break
+ new_bucket[i] = _pop_next()
+ self.assignments.append(new_bucket)
+ buckets_created += 1
+
+ # reinitialize all the settings
+ original_num_entries = self.total_valid_entries
+ self._initialize_assignments(self.assignments, object_ids=self.object_ids)
+ assert self.total_valid_entries == original_num_entries + num_new_objects, (
+ f"{self.total_valid_entries=} != {original_num_entries=} + {num_new_objects=}"
+ )
+
+ logger.info(
+ f"Filled {slots_filled} slots and created {buckets_created} new buckets"
+ )
+ logger.info(
+ f"{self.num_buckets=}, {self.total_valid_entries=}, {self.total_non_padding_entries=}"
+ )
+
+ def remove_objects(self, object_indices: list[int], strict: bool = True):
+ """
+ Remove objects from the state by marking them as removed.
+ Remove a bucket if all objects in the bucket are removed.
+
+ Args:
+ object_indices: List of object indices to remove
+ strict: If True, will raise an error if any object indices are not found in the state
+
+ Returns:
+ List of bucket indices that we are going to keep
+ """
+ object_indices = object_indices.copy()
+
+ # Mark objects as removed in assignments
+ for bucket_idx, bucket in enumerate(self.assignments):
+ for slot_idx, obj_id in enumerate(bucket):
+ if obj_id in object_indices:
+ self.assignments[bucket_idx][slot_idx] = _REMOVED_NUM
+ object_indices.remove(obj_id)
+
+ if strict:
+ assert len(object_indices) == 0, (
+ f"Failed to remove objects: {object_indices}"
+ )
+
+ # Check which buckets should be completely removed (all objects removed/paddings)
+ # and which buckets we will keep
+ buckets_to_remove = []
+ buckets_to_keep = []
+ for bucket_idx, bucket in enumerate(self.assignments):
+ # Check if all objects in this bucket are removed or are paddings
+ all_removed = all(
+ obj_id in [_PADDING_NUM, _REMOVED_NUM] for obj_id in bucket
+ )
+ if all_removed:
+ buckets_to_remove.append(bucket_idx)
+ logger.info(
+ f"Bucket {bucket_idx} marked for removal - all objects removed/paddings"
+ )
+ else:
+ buckets_to_keep.append(bucket_idx)
+
+ # Remove buckets in reverse order to maintain correct indices
+ for bucket_idx in reversed(buckets_to_remove):
+ del self.assignments[bucket_idx]
+
+ if len(buckets_to_keep) == 0:
+ logger.info(f"Removing all buckets: {buckets_to_remove}; state invalidated")
+ self.assignments = None
+ if self.object_ids is not None:
+ self.object_ids = []
+ return buckets_to_keep
+
+ # After removal, remap object IDs to be sequential
+ # Collect all unique positive object IDs and create a mapping to sequential IDs
+ all_positive_ids = set()
+ for bucket in self.assignments:
+ for obj_id in bucket:
+ if obj_id >= 0:
+ all_positive_ids.add(obj_id)
+
+ # Create mapping from old IDs to new sequential IDs
+ sorted_ids = sorted(all_positive_ids)
+ id_mapping = {old_id: new_id for new_id, old_id in enumerate(sorted_ids)}
+
+ # Apply the mapping to assignments to make IDs sequential
+ for bucket in self.assignments:
+ for i, obj_id in enumerate(bucket):
+ if obj_id >= 0:
+ bucket[i] = id_mapping[obj_id]
+
+ # Update object_ids if they exist
+ if self.object_ids is not None:
+ # Create new object_ids array based on the remapped indices
+ # We need to preserve the original object_ids for the objects that weren't removed
+ new_object_ids = [None] * len(sorted_ids)
+
+ # Map the original object_ids to their new positions
+ for old_idx, new_idx in id_mapping.items():
+ new_object_ids[new_idx] = self.object_ids[old_idx]
+
+ assert not any(obj_id is None for obj_id in new_object_ids)
+ self.object_ids = new_object_ids
+
+ # Reinitialize the state to update all internal structures
+ self._initialize_assignments(self.assignments, object_ids=self.object_ids)
+
+ logger.info(f"Removed these buckets: {buckets_to_remove}")
+ logger.info(f"Kept these buckets: {buckets_to_keep}")
+ logger.info(
+ f"Remaining buckets: {self.num_buckets}, total valid entries: {self.total_valid_entries}"
+ )
+
+ return buckets_to_keep
+
+ def _precompute_transition_matrices(self, device: torch.device, dtype: torch.dtype):
+ """
+ Precompute the transition matrices for maximum efficiency.
+ Note that these should be partial permutation matrices.
+ """
+ # Create a transition matrix for muxing
+ self.mux_matrix = torch.zeros(
+ self.num_buckets * self.multiplex_count,
+ self.total_valid_entries,
+ device=device,
+ dtype=dtype,
+ )
+
+ # Create a transition matrix for demuxing
+ self.demux_matrix = torch.zeros(
+ self.total_valid_entries,
+ self.num_buckets * self.multiplex_count,
+ device=device,
+ dtype=dtype,
+ )
+
+ # Fill both matrices based on assignments
+ for i in range(self.num_buckets):
+ for j in range(self.multiplex_count):
+ bucket_idx = i * self.multiplex_count + j
+ object_idx = self.assignments[i][j]
+ if object_idx >= 0:
+ self.mux_matrix[bucket_idx, object_idx] = 1.0
+ self.demux_matrix[object_idx, bucket_idx] = 1.0
+
+ def mux(self, x: torch.Tensor) -> torch.Tensor:
+ """
+ Multiplexing operation
+ x: self.total_valid_entries * ...
+
+ return num_buckets * multiplex_count * ...
+ with padding entries filled with 0
+ """
+ num_valid_entries = x.shape[0]
+ assert num_valid_entries == self.total_valid_entries, (
+ f"{num_valid_entries=} != {self.total_valid_entries=}"
+ )
+ output_shape = (
+ self.num_buckets,
+ self.multiplex_count,
+ ) + x.shape[1:]
+
+ x_flat = x.reshape(num_valid_entries, -1)
+
+ # Apply mux matrix: (num_buckets * multiplex_count, batch_size) @ (batch_size, features)
+ # Result: (num_buckets * multiplex_count, features)
+ result_flat = self.mux_matrix @ x_flat
+
+ result = result_flat.view(output_shape)
+ return result
+
+ def demux(self, x: torch.Tensor) -> torch.Tensor:
+ """
+ Inverse operation of mux
+ x: num_buckets, multiplex_count * ...
+ Returns: total_valid_entries * ...
+ """
+ num_buckets, multiplex_count = x.shape[:2]
+ assert num_buckets == self.num_buckets, f"{num_buckets=} != {self.num_buckets=}"
+ assert multiplex_count == self.multiplex_count, (
+ f"{multiplex_count=} != {self.multiplex_count=}"
+ )
+ output_shape = (self.total_valid_entries,) + x.shape[2:]
+
+ x_flat = x.reshape(num_buckets * multiplex_count, -1)
+
+ # Apply demux matrix: (total_valid_entries, num_buckets*multiplex_count) @ (num_buckets*multiplex_count, features)
+ # Result: (total_valid_entries, features)
+ result_flat = self.demux_matrix @ x_flat
+
+ result = result_flat.view(output_shape)
+ return result
+
+ def get_valid_object_mask(self) -> torch.Tensor:
+ """
+ Returns a (num_buckets, multiplex_count) tensor with 1 for valid entries and 0 for padding entries
+ """
+ valid_mask = self.mux_matrix.sum(dim=1) > 0
+ valid_mask = valid_mask.reshape(self.num_buckets, self.multiplex_count)
+
+ return valid_mask
+
+ def get_all_valid_object_idx(self) -> set[int]:
+ """
+ Returns a set of all valid object idx in the state
+ Note that this returns the internal object idx representations,
+ not the arbitrary object IDs that are passed in during initialization
+ """
+ all_valid_objects = {
+ obj_idx for bucket in self.assignments for obj_idx in bucket if obj_idx >= 0
+ }
+ return all_valid_objects
+
+
+class MultiplexController(nn.Module):
+ def __init__(
+ self,
+ multiplex_count: int,
+ full_shuffle: bool = False,
+ eval_multiplex_count: int = -1,
+ ):
+ super().__init__()
+
+ self.multiplex_count = multiplex_count
+ self.full_shuffle = full_shuffle
+ if eval_multiplex_count < 0:
+ self.eval_multiplex_count = multiplex_count
+ else:
+ self.eval_multiplex_count = eval_multiplex_count
+ assert self.multiplex_count >= 1
+
+ @property
+ def allowed_bucket_capacity(self) -> int:
+ if self.training:
+ return self.multiplex_count
+ else:
+ return self.eval_multiplex_count
+
+ def get_state(
+ self,
+ num_valid_entries: int,
+ device: torch.device,
+ dtype: torch.dtype,
+ random: bool = True,
+ *,
+ object_ids: Optional[
+ list[int]
+ ] = None, # object_ids is an auxiliary field that we pass to the state unmodified
+ ) -> MultiplexState:
+ # returns a state that maps elements in the batch to buckets of size self.multiplex_count
+
+ allowed_bucket_capacity = self.allowed_bucket_capacity
+
+ # the size of the bucket during training
+ true_bucket_capacity = self.multiplex_count
+
+ num_buckets = math.ceil(num_valid_entries / allowed_bucket_capacity)
+ # each bucket contains at most self.multiplex_count elements
+ # padding elements are marked with _PADDING_NUM (only the last bucket should contain _PADDING_NUM)
+
+ if self.full_shuffle:
+ # Shuffle all IDs, including the paddings
+ ids = torch.cat(
+ [
+ torch.arange(num_valid_entries, dtype=torch.long),
+ torch.tensor(
+ [_PADDING_NUM]
+ * (num_buckets * true_bucket_capacity - num_valid_entries),
+ dtype=torch.long,
+ ),
+ ],
+ dim=0,
+ )
+ if random:
+ indices = torch.randperm(ids.shape[0], dtype=torch.long)
+ ids = ids[indices]
+
+ # convert to a list of list
+ assignments = []
+ for i in range(num_buckets):
+ assignments.append(
+ ids[
+ i * true_bucket_capacity : (i + 1) * true_bucket_capacity
+ ].tolist()
+ )
+ else:
+ # Only shuffle the the IDs within the first #batch_size slots, leave all paddings at the end
+ if random:
+ # randomly assign ids to buckets
+ ids = torch.randperm(num_valid_entries, dtype=torch.int64)
+ else:
+ ids = torch.arange(num_valid_entries)
+ # append with _PADDING_NUM to make a multiple of bucket_capacity
+ total_elements = num_buckets * allowed_bucket_capacity
+ if ids.shape[0] < total_elements:
+ ids = torch.cat(
+ [
+ ids,
+ torch.tensor([_PADDING_NUM] * (total_elements - ids.shape[0])),
+ ]
+ )
+
+ # convert to a list of list
+ assignments = []
+ for i in range(num_buckets):
+ assignments.append(
+ ids[
+ i * allowed_bucket_capacity : (i + 1) * allowed_bucket_capacity
+ ].tolist()
+ + [_PADDING_NUM] * (true_bucket_capacity - allowed_bucket_capacity)
+ )
+
+ return MultiplexState(
+ assignments,
+ device,
+ dtype,
+ allowed_bucket_capacity=allowed_bucket_capacity,
+ object_ids=object_ids,
+ )
diff --git a/sam3/model/necks.py b/sam3/model/necks.py
index c60f87f..6db174e 100644
--- a/sam3/model/necks.py
+++ b/sam3/model/necks.py
@@ -9,6 +9,7 @@
import torch
import torch.nn as nn
+from sam3.model.data_misc import NestedTensor
class Sam3DualViTDetNeck(nn.Module):
@@ -124,3 +125,145 @@ def forward(
sam2_out.append(sam2_x_out)
sam2_pos.append(sam2_pos_out)
return sam3_out, sam3_pos, sam2_out, sam2_pos
+
+
+class Sam3TriViTDetNeck(nn.Module):
+ def __init__(
+ self,
+ trunk: nn.Module,
+ position_encoding: nn.Module,
+ d_model: int,
+ neck_norm=None,
+ scale_factors=(4.0, 2.0, 1.0),
+ ):
+ """
+ SimpleFPN neck with three heads (sam3, interactive, propagation).
+ """
+ super().__init__()
+ self.trunk = trunk
+ self.position_encoding = position_encoding
+ self.convs = nn.ModuleList()
+
+ self.scale_factors = scale_factors
+ use_bias = neck_norm is None
+ dim = self.trunk.channel_list[-1]
+
+ for _, scale in enumerate(scale_factors):
+ current = nn.Sequential()
+
+ if scale == 4.0:
+ current.add_module(
+ "dconv_2x2_0",
+ nn.ConvTranspose2d(dim, dim // 2, kernel_size=2, stride=2),
+ )
+ current.add_module(
+ "gelu",
+ nn.GELU(),
+ )
+ current.add_module(
+ "dconv_2x2_1",
+ nn.ConvTranspose2d(dim // 2, dim // 4, kernel_size=2, stride=2),
+ )
+ out_dim = dim // 4
+ elif scale == 2.0:
+ current.add_module(
+ "dconv_2x2",
+ nn.ConvTranspose2d(dim, dim // 2, kernel_size=2, stride=2),
+ )
+ out_dim = dim // 2
+ elif scale == 1.0:
+ out_dim = dim
+ elif scale == 0.5:
+ current.add_module(
+ "maxpool_2x2",
+ nn.MaxPool2d(kernel_size=2, stride=2),
+ )
+ out_dim = dim
+ else:
+ raise NotImplementedError(f"scale_factor={scale} is not supported yet.")
+
+ current.add_module(
+ "conv_1x1",
+ nn.Conv2d(
+ in_channels=out_dim,
+ out_channels=d_model,
+ kernel_size=1,
+ bias=use_bias,
+ ),
+ )
+ current.add_module(
+ "conv_3x3",
+ nn.Conv2d(
+ in_channels=d_model,
+ out_channels=d_model,
+ kernel_size=3,
+ padding=1,
+ bias=use_bias,
+ ),
+ )
+ self.convs.append(current)
+
+ # Assumes the new necks are just clones of the original neck
+ self.interactive_convs = deepcopy(self.convs)
+ self.propagation_convs = deepcopy(self.convs)
+
+ def forward(
+ self,
+ tensor_list,
+ *,
+ need_sam3_out: bool = True,
+ need_interactive_out: bool = True,
+ need_propagation_out: bool = True,
+ ):
+ xs = self.trunk(tensor_list)
+ sam3_out = []
+ interactive_out = []
+ propagation_out = []
+
+ sam3_pos = []
+ interactive_pos = []
+ propagation_pos = []
+ x = xs[-1] # simpleFPN
+ # OSS trunk returns plain tensors; onevision trunk returns NestedTensors.
+ # Use getattr to handle both in a torch.compile-friendly way.
+ x_data = getattr(x, "tensors", x)
+ x_mask = getattr(x, "mask", None)
+ for _, (conv, interactive_conv, propagation_conv) in enumerate(
+ zip(self.convs, self.interactive_convs, self.propagation_convs)
+ ):
+ if need_sam3_out:
+ sam3_conv_out = conv(x_data)
+ sam3_x_out = NestedTensor(sam3_conv_out, x_mask)
+ sam3_out.append(sam3_x_out)
+ sam3_pos.append(
+ self.position_encoding(sam3_conv_out).to(sam3_conv_out.dtype)
+ )
+
+ if need_interactive_out:
+ interactive_conv_out_t = interactive_conv(x_data)
+ interactive_conv_out = NestedTensor(interactive_conv_out_t, x_mask)
+ interactive_out.append(interactive_conv_out)
+ interactive_pos.append(
+ self.position_encoding(interactive_conv_out_t).to(
+ interactive_conv_out_t.dtype
+ )
+ )
+
+ if need_propagation_out:
+ propagation_conv_out = propagation_conv(x_data)
+ propagation_x_out = NestedTensor(propagation_conv_out, x_mask)
+ propagation_out.append(propagation_x_out)
+ propagation_pos.append(
+ self.position_encoding(propagation_conv_out).to(
+ propagation_conv_out.dtype
+ )
+ )
+
+ return (
+ sam3_out,
+ sam3_pos,
+ interactive_out,
+ interactive_pos,
+ propagation_out,
+ propagation_pos,
+ )
diff --git a/sam3/model/position_encoding.py b/sam3/model/position_encoding.py
index a6a1266..2ae0a37 100644
--- a/sam3/model/position_encoding.py
+++ b/sam3/model/position_encoding.py
@@ -38,11 +38,17 @@ def __init__(
# Precompute positional encodings under `precompute_resolution` to fill the cache
# and avoid symbolic shape tracing errors in torch.compile in PyTorch 2.4 nightly.
if precompute_resolution is not None:
- # We precompute pos enc for stride 4, 8, 16 and 32 to fill `self.cache`.
+ # We precompute pos enc for all strides used by both DualViTDetNeck and
+ # TriViTDetNeck (scale_factors 4.0, 2.0, 1.0, 0.5 applied to backbone
+ # output at stride 14 from 1008px input → 72x72).
precompute_sizes = [
+ (int(precompute_resolution // 3.5), int(precompute_resolution // 3.5)),
(precompute_resolution // 4, precompute_resolution // 4),
+ (int(precompute_resolution // 7), int(precompute_resolution // 7)),
(precompute_resolution // 8, precompute_resolution // 8),
+ (int(precompute_resolution // 14), int(precompute_resolution // 14)),
(precompute_resolution // 16, precompute_resolution // 16),
+ (int(precompute_resolution // 28), int(precompute_resolution // 28)),
(precompute_resolution // 32, precompute_resolution // 32),
]
for size in precompute_sizes:
@@ -53,7 +59,7 @@ def __init__(
def _encode_xy(self, x, y):
# The positions are expected to be normalized
- assert len(x) == len(y) and x.ndim == y.ndim == 1
+ # torch._check(len(x) == len(y) and x.ndim == y.ndim == 1)
x_embed = x * self.scale
y_embed = y * self.scale
@@ -62,12 +68,8 @@ def _encode_xy(self, x, y):
pos_x = x_embed[:, None] / dim_t
pos_y = y_embed[:, None] / dim_t
- pos_x = torch.stack(
- (pos_x[:, 0::2].sin(), pos_x[:, 1::2].cos()), dim=2
- ).flatten(1)
- pos_y = torch.stack(
- (pos_y[:, 0::2].sin(), pos_y[:, 1::2].cos()), dim=2
- ).flatten(1)
+ pos_x = torch.stack((pos_x[:, 0::2].sin(), pos_x[:, 1::2].cos()), dim=2).flatten(1)
+ pos_y = torch.stack((pos_y[:, 0::2].sin(), pos_y[:, 1::2].cos()), dim=2).flatten(1)
return pos_x, pos_y
@torch.no_grad()
@@ -89,9 +91,9 @@ def encode_points(self, x, y, labels):
@torch.no_grad()
def forward(self, x):
- cache_key = None
cache_key = (x.shape[-2], x.shape[-1])
- if cache_key in self.cache:
+ use_cache = all(isinstance(dim, int) for dim in cache_key)
+ if use_cache and cache_key in self.cache:
return self.cache[cache_key][None].repeat(x.shape[0], 1, 1, 1)
y_embed = (
torch.arange(1, x.shape[-2] + 1, dtype=torch.float32, device=x.device)
@@ -121,6 +123,6 @@ def forward(self, x):
(pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4
).flatten(3)
pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
- if cache_key is not None:
+ if use_cache:
self.cache[cache_key] = pos[0]
return pos
diff --git a/sam3/model/sam3_base_predictor.py b/sam3/model/sam3_base_predictor.py
new file mode 100644
index 0000000..fdacda4
--- /dev/null
+++ b/sam3/model/sam3_base_predictor.py
@@ -0,0 +1,329 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
+
+# pyre-unsafe
+
+"""
+Base predictor class shared by SAM3 and SAM3.1 (multiplex) video predictors.
+
+Provides the common handle_request/handle_stream_request API and session management.
+Subclasses only need to override methods where their behavior differs.
+"""
+
+import gc
+import time
+import uuid
+from typing import Dict, List, Optional
+
+import torch
+from sam3.logger import get_logger
+
+logger = get_logger(__name__)
+
+
+class Sam3BasePredictor:
+ """
+ Base class for SAM3 video predictors. Provides:
+ - Session management (start, reset, close)
+ - Request dispatch (handle_request / handle_stream_request)
+ - Common add_prompt / propagate_in_video / remove_object / reset_session / close_session
+
+ Subclasses must set `self.model` and `self._all_inference_states` before use.
+ """
+
+ def __init__(self):
+ # Subclasses must populate these
+ self.model = None
+ self._all_inference_states: Dict[str, dict] = {}
+
+ # ── Request dispatch ──────────────────────────────────────────────
+
+ @torch.inference_mode()
+ def handle_request(self, request):
+ """Dispatch a request based on its type."""
+ request_type = request["type"]
+ if request_type == "start_session":
+ return self.start_session(
+ resource_path=request["resource_path"],
+ session_id=request.get("session_id", None),
+ offload_video_to_cpu=request.get("offload_video_to_cpu", False),
+ offload_state_to_cpu=request.get("offload_state_to_cpu", False),
+ )
+ elif request_type == "add_prompt":
+ return self.add_prompt(
+ session_id=request["session_id"],
+ frame_idx=request["frame_index"],
+ text=request.get("text", None),
+ points=request.get("points", None),
+ point_labels=request.get("point_labels", None),
+ clear_old_points=request.get("clear_old_points", True),
+ bounding_boxes=request.get("bounding_boxes", None),
+ bounding_box_labels=request.get("bounding_box_labels", None),
+ clear_old_boxes=request.get("clear_old_boxes", True),
+ output_prob_thresh=request.get(
+ "output_prob_thresh",
+ getattr(self, "default_output_prob_thresh", 0.5),
+ ),
+ obj_id=request.get("obj_id", None),
+ rel_coordinates=request.get("rel_coordinates", True),
+ )
+ elif request_type == "remove_object":
+ return self.remove_object(
+ session_id=request["session_id"],
+ frame_idx=request.get("frame_index", 0),
+ obj_id=request["obj_id"],
+ )
+ elif request_type == "reset_session":
+ return self.reset_session(session_id=request["session_id"])
+ elif request_type == "cancel_propagation":
+ return self.cancel_propagation(session_id=request["session_id"])
+ elif request_type == "close_session":
+ return self.close_session(
+ session_id=request["session_id"],
+ run_gc_collect=request.get("run_gc_collect", True),
+ )
+ else:
+ raise RuntimeError(f"invalid request type: {request_type}")
+
+ @torch.inference_mode()
+ def handle_stream_request(self, request):
+ """Dispatch a stream request based on its type."""
+ request_type = request["type"]
+ if request_type == "propagate_in_video":
+ yield from self.propagate_in_video(
+ session_id=request["session_id"],
+ propagation_direction=request.get("propagation_direction", "both"),
+ start_frame_idx=request.get("start_frame_index", None),
+ max_frame_num_to_track=request.get("max_frame_num_to_track", None),
+ output_prob_thresh=request.get(
+ "output_prob_thresh",
+ getattr(self, "default_output_prob_thresh", 0.5),
+ ),
+ )
+ else:
+ raise RuntimeError(f"invalid request type: {request_type}")
+
+ # ── Session management ────────────────────────────────────────────
+
+ def start_session(
+ self,
+ resource_path,
+ session_id=None,
+ offload_video_to_cpu=False,
+ offload_state_to_cpu=False,
+ ):
+ """Start a new inference session on a video directory or path."""
+ init_kwargs = dict(
+ resource_path=resource_path,
+ offload_video_to_cpu=offload_video_to_cpu,
+ offload_state_to_cpu=offload_state_to_cpu,
+ )
+ if hasattr(self, "async_loading_frames"):
+ init_kwargs["async_loading_frames"] = self.async_loading_frames
+ if hasattr(self, "video_loader_type"):
+ init_kwargs["video_loader_type"] = self.video_loader_type
+ inference_state = self.model.init_state(**init_kwargs)
+
+ if not session_id:
+ session_id = str(uuid.uuid4())
+ self._all_inference_states[session_id] = {
+ "state": inference_state,
+ "session_id": session_id,
+ "start_time": time.time(),
+ "last_use_time": time.time(),
+ }
+ logger.info(f"started new session {session_id}")
+ return {"session_id": session_id}
+
+ def add_prompt(
+ self,
+ session_id: str,
+ frame_idx: int,
+ text: Optional[str] = None,
+ points=None,
+ point_labels=None,
+ clear_old_points: bool = True,
+ bounding_boxes=None,
+ bounding_box_labels=None,
+ clear_old_boxes: bool = True,
+ output_prob_thresh: float = 0.5,
+ obj_id: Optional[int] = None,
+ rel_coordinates: bool = True,
+ ):
+ """Add text, box and/or point prompt on a specific video frame."""
+ session = self._get_session(session_id)
+ inference_state = session["state"]
+ self._extend_expiration_time(session)
+
+ # Convert lists to tensors if needed
+ if points is not None and not isinstance(points, torch.Tensor):
+ points = torch.tensor(points, dtype=torch.float32)
+ if point_labels is not None and not isinstance(point_labels, torch.Tensor):
+ point_labels = torch.tensor(point_labels, dtype=torch.int32)
+ if bounding_boxes is not None and not isinstance(bounding_boxes, torch.Tensor):
+ bounding_boxes = torch.tensor(bounding_boxes, dtype=torch.float32)
+ if bounding_box_labels is not None and not isinstance(
+ bounding_box_labels, torch.Tensor
+ ):
+ bounding_box_labels = torch.tensor(bounding_box_labels, dtype=torch.int32)
+
+ kwargs = dict(
+ inference_state=inference_state,
+ frame_idx=frame_idx,
+ text_str=text,
+ points=points,
+ point_labels=point_labels,
+ clear_old_points=clear_old_points,
+ boxes_xywh=bounding_boxes,
+ box_labels=bounding_box_labels,
+ clear_old_boxes=clear_old_boxes,
+ output_prob_thresh=output_prob_thresh,
+ rel_coordinates=rel_coordinates,
+ )
+ if obj_id is not None:
+ kwargs["obj_id"] = obj_id
+
+ # Filter kwargs to only pass what the model accepts
+ # (SAM3 has a simpler add_prompt than SAM3.1)
+ import inspect
+
+ sig = inspect.signature(self.model.add_prompt)
+ valid_params = set(sig.parameters.keys())
+ filtered_kwargs = {k: v for k, v in kwargs.items() if k in valid_params}
+
+ with torch.autocast(device_type="cuda", dtype=torch.bfloat16):
+ frame_idx, outputs = self.model.add_prompt(**filtered_kwargs)
+ return {"frame_index": frame_idx, "outputs": outputs}
+
+ def remove_object(
+ self,
+ session_id: str,
+ frame_idx: int = 0,
+ obj_id: int = 0,
+ is_user_action: bool = True,
+ ):
+ """Remove an object from tracking."""
+ session = self._get_session(session_id)
+ inference_state = session["state"]
+ self._extend_expiration_time(session)
+
+ result = self.model.remove_object(
+ inference_state, obj_id, frame_idx=frame_idx, is_user_action=is_user_action
+ )
+ # Handle both return conventions
+ if result is None or (isinstance(result, tuple) and result[1] is None):
+ import numpy as np
+
+ out_obj_ids = torch.zeros(0, dtype=torch.int64)
+ out_binary_masks = torch.zeros(
+ 0,
+ inference_state["orig_height"],
+ inference_state["orig_width"],
+ dtype=torch.bool,
+ )
+ out_boxes_xywh = torch.zeros(0, 4, dtype=torch.float32)
+ outputs = {
+ "out_obj_ids": out_obj_ids.cpu().numpy(),
+ "out_boxes_xywh": out_boxes_xywh.cpu().numpy(),
+ "out_binary_masks": out_binary_masks.cpu().numpy(),
+ }
+ elif isinstance(result, tuple):
+ _, outputs = result
+ else:
+ outputs = result
+ return {"frame_index": frame_idx, "outputs": outputs}
+
+ def cancel_propagation(self, session_id):
+ """Cancel any ongoing propagation. No-op if not supported by the model."""
+ session = self._get_session(session_id)
+ inference_state = session["state"]
+ self._extend_expiration_time(session)
+ if hasattr(self.model, "cancel_propagation"):
+ self.model.cancel_propagation(inference_state)
+ return {"is_success": True}
+
+ def propagate_in_video(
+ self,
+ session_id,
+ propagation_direction="both",
+ start_frame_idx=None,
+ max_frame_num_to_track=None,
+ output_prob_thresh=0.5,
+ **kwargs,
+ ):
+ """Propagate the added prompts to get results on all video frames."""
+ try:
+ session = self._get_session(session_id)
+ inference_state = session["state"]
+ self._extend_expiration_time(session)
+ if propagation_direction not in ["both", "forward", "backward"]:
+ raise ValueError(
+ f"invalid propagation direction: {propagation_direction}"
+ )
+
+ propagate_kwargs = dict(
+ inference_state=inference_state,
+ start_frame_idx=start_frame_idx,
+ max_frame_num_to_track=max_frame_num_to_track,
+ )
+ # Only pass output_prob_thresh / extra kwargs if the model supports them
+ import inspect
+
+ sig = inspect.signature(self.model.propagate_in_video)
+ if "output_prob_thresh" in sig.parameters:
+ propagate_kwargs["output_prob_thresh"] = output_prob_thresh
+ for k, v in kwargs.items():
+ if k in sig.parameters:
+ propagate_kwargs[k] = v
+
+ # Forward propagation
+ if propagation_direction in ["both", "forward"]:
+ for frame_idx, outputs in self.model.propagate_in_video(
+ **propagate_kwargs,
+ reverse=False,
+ ):
+ yield {"frame_index": frame_idx, "outputs": outputs}
+ # Backward propagation
+ if propagation_direction in ["both", "backward"]:
+ for frame_idx, outputs in self.model.propagate_in_video(
+ **propagate_kwargs,
+ reverse=True,
+ ):
+ yield {"frame_index": frame_idx, "outputs": outputs}
+ finally:
+ logger.info(f"propagation ended in session {session_id}")
+
+ def reset_session(self, session_id):
+ """Reset the session to its initial state."""
+ session = self._get_session(session_id)
+ inference_state = session["state"]
+ self._extend_expiration_time(session)
+ self.model.reset_state(inference_state)
+ return {"is_success": True}
+
+ def close_session(self, session_id, run_gc_collect=True):
+ """Close a session. Idempotent."""
+ session = self._all_inference_states.pop(session_id, None)
+ if session is None:
+ logger.warning(f"cannot close session {session_id} as it does not exist")
+ else:
+ del session
+ if run_gc_collect:
+ gc.collect()
+ logger.info(f"removed session {session_id}")
+ return {"is_success": True}
+
+ def _get_session(self, session_id):
+ session = self._all_inference_states.get(session_id, None)
+ if session is None:
+ raise RuntimeError(
+ f"Cannot find session {session_id}; it might have expired"
+ )
+ return session
+
+ def _extend_expiration_time(self, session):
+ """Update last-use time for session expiration tracking."""
+ session["last_use_time"] = time.time()
+
+ def shutdown(self):
+ """Shutdown the predictor and clear all sessions."""
+ self._all_inference_states.clear()
diff --git a/sam3/model/sam3_image.py b/sam3/model/sam3_image.py
index 679300d..ee13d2f 100644
--- a/sam3/model/sam3_image.py
+++ b/sam3/model/sam3_image.py
@@ -16,6 +16,7 @@
from .act_ckpt_utils import activation_ckpt_wrapper
from .box_ops import box_cxcywh_to_xyxy
+from .data_misc import FindStage
from .geometry_encoders import Prompt
from .model_misc import inverse_sigmoid
@@ -442,6 +443,7 @@ def forward_grounding(
find_input,
find_target,
geometric_prompt: Prompt,
+ **kwargs,
):
with torch.profiler.record_function("SAM3Image._encode_prompt"):
prompt, prompt_mask, backbone_out = self._encode_prompt(
@@ -474,10 +476,14 @@ def forward_grounding(
# Run segmentation heads
with torch.profiler.record_function("SAM3Image._run_segmentation_heads"):
+ # Apply id_mapping to img_ids if backbone features were recomputed
+ seg_img_ids = find_input.img_ids
+ if "id_mapping" in backbone_out and backbone_out["id_mapping"] is not None:
+ seg_img_ids = backbone_out["id_mapping"][seg_img_ids]
self._run_segmentation_heads(
out=out,
backbone_out=backbone_out,
- img_ids=find_input.img_ids,
+ img_ids=seg_img_ids,
vis_feat_sizes=encoder_out["vis_feat_sizes"],
encoder_hidden_states=out["encoder_hidden_states"],
prompt=prompt,
@@ -516,6 +522,28 @@ def _postprocess_out(self, out: Dict, multimask_output: bool = False):
return out
+ def _get_geo_prompt_from_find_input(self, find_input: FindStage):
+ """Construct an initial geometric prompt from the find input."""
+ point_embeddings, point_mask, point_labels = None, None, None
+ if find_input.input_points_before_embed is not None:
+ # Point embeddings are batch first, switch to seq first
+ point_embeddings = find_input.input_points_before_embed.transpose(0, 1)
+
+ # they are stored as (x,y,label), so we unpack
+ point_labels = point_embeddings[..., -1]
+ point_embeddings = point_embeddings[..., :-1]
+ point_mask = find_input.input_points_mask
+
+ geometric_prompt = Prompt(
+ box_embeddings=find_input.input_boxes_before_embed,
+ box_mask=find_input.input_boxes_mask,
+ box_labels=find_input.input_boxes_label,
+ point_embeddings=point_embeddings,
+ point_mask=point_mask,
+ point_labels=point_labels,
+ )
+ return geometric_prompt
+
def _get_dummy_prompt(self, num_prompts=1):
device = self.device
geometric_prompt = Prompt(
diff --git a/sam3/model/sam3_multiplex_base.py b/sam3/model/sam3_multiplex_base.py
new file mode 100644
index 0000000..e4df5d3
--- /dev/null
+++ b/sam3/model/sam3_multiplex_base.py
@@ -0,0 +1,2858 @@
+import datetime
+import logging
+import math
+import os
+import sys
+from collections import defaultdict
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Set, Tuple
+
+import numpy as np
+import torch
+import torch.distributed as dist
+import torch.nn.functional as F
+from sam3.logger import get_logger
+from sam3.model.box_ops import fast_diag_box_iou
+from sam3.model.data_misc import BatchedDatapoint, NestedTensor
+from sam3.model.sam3_multiplex_detector import Sam3MultiplexDetector
+from sam3.model.sam3_tracker_utils import fill_holes_in_mask_scores, mask_to_box
+from sam3.model.sam3_video_base import (
+ _associate_det_trk_compilable,
+ LazyAssociateDetTrkResult,
+ MaskletConfirmationStatus,
+ realize_adt_result,
+ RealizedAssociateDetTrkresult,
+ Sam3VideoBase,
+)
+from sam3.perflib.masks_ops import mask_iou
+from sam3.train.masks_ops import rle_encode
+from torch import nn, Tensor
+
+# a short 3-min timeout to quickly detect any synchronization failures
+SAM3_COLLECTIVE_OP_TIMEOUT_SEC = int(os.getenv("SAM3_COLLECTIVE_OP_TIMEOUT_SEC", "180"))
+
+logger = get_logger(__name__)
+
+if torch.cuda.get_device_properties(0).major >= 8:
+ # turn on tfloat32 for Ampere GPUs (https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices)
+ torch.backends.cuda.matmul.allow_tf32 = True
+ torch.backends.cudnn.allow_tf32 = True
+
+
+class Sam3MultiplexTrackerPredictor(nn.Module):
+ def __init__(
+ self,
+ config_file,
+ checkpoint_file=None,
+ hydra_overrides=None,
+ per_obj_inference=False,
+ fill_hole_area=0,
+ use_fa3=False,
+ use_rope_real=False,
+ keep_first_cond_frame=False,
+ is_multiplex=False,
+ is_multiplex_dynamic=False,
+ use_memory_selection=False,
+ ):
+ """
+ Initialize the SAM2 predictor with the given configuration and checkpoint.
+ Args:
+ config_file (str): Path to the configuration file.
+ checkpoint_file (str, optional): Path to the checkpoint file. If None, the model will be initialized without loading weights.
+ hydra_overrides (list, optional): List of Hydra overrides to apply to the configuration.
+ per_obj_inference (bool): If True, the model will perform per-object inference instead of bucketized batching.
+ """
+
+ super().__init__()
+ #######################################
+ # Load model from config and checkpoint
+ #######################################
+
+ from hydra import compose, initialize_config_module
+ from hydra.core.global_hydra import GlobalHydra
+ from hydra.utils import instantiate
+
+ # Ensure proper Hydra initialization
+ if not GlobalHydra().is_initialized():
+ logger.info("Sam3MultiplexTrackerPredictor: GlobalHydra not initialized")
+ GlobalHydra.instance().clear()
+ initialize_config_module("sam3.config", version_base="1.2")
+
+ if hydra_overrides is None:
+ hydra_overrides = []
+ self.is_multiplex = is_multiplex
+ self.is_multiplex_dynamic = is_multiplex_dynamic
+ self.per_obj_inference = per_obj_inference
+
+ if self.is_multiplex:
+ inference_model_class = "sam3.model.video_tracking_multiplex_demo.Sam3VideoTrackingMultiplexDemo"
+ else:
+ inference_model_class = (
+ "sam3.model.video_tracking_with_prompt_demo_per_obj_inference.Sam3VideoTrackingWithPromptDemoPerObjInference"
+ if per_obj_inference
+ else "sam3.model.video_tracking_with_prompt_demo.Sam3VideoTrackingWithPromptDemo"
+ )
+ hydra_overrides = list(hydra_overrides)
+ hydra_overrides.extend(
+ [
+ "launcher.experiment_log_dir=''",
+ f"++trainer.model._target_={inference_model_class}",
+ # Shared backbone cfg
+ "++trainer.model.image_size=1008",
+ "++trainer.model.backbone_stride=14",
+ "++trainer.model.maskmem_backbone.mask_downsampler.interpol_size=[1152,1152]",
+ "++trainer.model.backbone.forward_in_chunk_for_eval=false",
+ # always start tracking from the frame where we receive the first annotation
+ # (clicks or mask) and ignore the `start_frame_idx` passed to `propagate_in_video`
+ "++trainer.model.always_start_from_first_ann_frame=false",
+ # apply non-overlapping constraints on the object masks in the
+ # memory encoder to avoid/alleviate superposing mask predictions
+ "++trainer.model.non_overlap_masks_for_mem_enc=false",
+ # Do not apply non-overlapping constraints on the output
+ "++trainer.model.non_overlap_masks_for_output=false",
+ # attend to at most 4 temporally closest conditioning frames in the encoder for
+ # better temporal locality and a better handling to a large number of annotated frames
+ "++trainer.model.max_cond_frames_in_attn=4",
+ f"++trainer.model.keep_first_cond_frame={keep_first_cond_frame}",
+ # turn off all offloading options in the demo (we handle them separately in the demo class)
+ "++trainer.model.offload_output_to_cpu_for_eval=false",
+ "++trainer.model.trim_past_non_cond_mem_for_eval=false",
+ # torch.compile on the image backbone (w/ `dynamic=false` and `fullgraph=true` to capture a full graph)
+ # "++trainer.model.backbone.compile_mode=max-autotune",
+ # "++trainer.model.backbone.compile_extra_args.fullgraph=true",
+ # "++trainer.model.backbone.compile_extra_args.dynamic=false",
+ "++trainer.model.backbone.visual.trunk.weights_path=null",
+ # Postprocessing/demo options
+ # dynamically fall back to multi-mask if the single mask is not stable
+ "++trainer.model.sam_mask_decoder_extra_args.dynamic_multimask_via_stability=true",
+ "++trainer.model.sam_mask_decoder_extra_args.dynamic_multimask_stability_delta=0.05",
+ "++trainer.model.sam_mask_decoder_extra_args.dynamic_multimask_stability_thresh=0.98",
+ # the sigmoid mask logits on interacted frames with clicks in the memory encoder so that the encoded masks are exactly as what users see from clicking
+ "++trainer.model.binarize_mask_from_pts_for_mem_enc=true",
+ # only attend to object pointers in the past (before the current frame) in the encoder during evaluation
+ "++trainer.model.only_obj_ptrs_in_the_past_for_eval=true",
+ # clear non-conditioning memory of the surrounding frames (which may contain outdated information) after adding correction clicks
+ "++trainer.model.clear_non_cond_mem_around_input=true",
+ "++trainer.model.transformer.encoder.layer.self_attention.feat_sizes=[72,72]",
+ "++trainer.model.transformer.encoder.layer.cross_attention.feat_sizes=[72,72]",
+ # fill small holes in the final masks up to `fill_hole_area` (after resizing them to the original video resolution)
+ f"++trainer.model.fill_hole_area={fill_hole_area}",
+ f"++trainer.model.transformer.encoder.layer.self_attention.use_fa3={use_fa3}",
+ f"++trainer.model.transformer.encoder.layer.cross_attention.use_fa3={use_fa3}",
+ f"++trainer.model.transformer.encoder.layer.self_attention.use_rope_real={use_rope_real}",
+ f"++trainer.model.transformer.encoder.layer.cross_attention.use_rope_real={use_rope_real}",
+ ]
+ )
+
+ if self.is_multiplex or self.is_multiplex_dynamic:
+ hydra_overrides.extend(
+ [
+ f"++trainer.model.transformer.encoder.layer.self_attention_rope.use_fa3={use_fa3}",
+ f"++trainer.model.transformer.encoder.layer.cross_attention_rope.use_fa3={use_fa3}",
+ f"++trainer.model.transformer.encoder.layer.self_attention_rope.use_rope_real={use_rope_real}",
+ f"++trainer.model.transformer.encoder.layer.cross_attention_rope.use_rope_real={use_rope_real}",
+ ]
+ )
+
+ hydra_overrides.extend(
+ [f"++trainer.model.use_memory_selection={use_memory_selection}"]
+ )
+
+ cfg = compose(config_name=config_file, overrides=hydra_overrides)
+ model = instantiate(cfg.trainer.model, _recursive_=True)
+ del model.backbone # Remove backbone since it is shared with the sam3 model
+ if checkpoint_file is not None:
+ ckpt = torch.load(checkpoint_file, map_location="cpu")
+ model.load_state_dict(ckpt["model"], strict=False)
+ self.model = model
+ self.per_obj_inference = per_obj_inference
+ self.fill_hole_area = fill_hole_area
+ # use bfloat16 inference for Flash Attention kernel
+ self.bf16_context = torch.autocast(device_type="cuda", dtype=torch.bfloat16)
+ self.bf16_context.__enter__() # keep using for the entire model process
+
+ def __getattr__(self, name):
+ # Expose all attributes of the underlying model
+ model = super().__getattr__("model")
+ if name == "model":
+ return model
+ return getattr(model, name)
+
+ def forward(self, *args, **kwargs):
+ raise NotImplementedError(
+ "Use the sam2 predictor APIs instead. Check VideoTrackingWithPromptDemo class for details."
+ )
+
+ def add_output_per_object(self, *args, **kwargs):
+ if self.per_obj_inference:
+ # nothing needs to be done as each object is already stored separately
+ return
+
+ # for batched inference state, we also need to add per-object
+ # memory slides to support instance interactivity
+ self._add_output_per_object(*args, **kwargs)
+
+
+class Sam3MultiplexBase(Sam3VideoBase):
+ def __init__(
+ self,
+ tracker,
+ detector,
+ ckpt_path=None,
+ sam3_ckpt_path=None,
+ # prob threshold for detection outputs -- only keep detections above this threshold
+ # enters NMS and det-to-track matching
+ score_threshold_detection=0.5,
+ # Detection threshold when running on image-only inputs
+ image_only_det_thresh=0.5,
+ # IoU threshold for detection NMS
+ det_nms_thresh=0.0,
+ # If `det_nms_use_iom` is True, use IoM instead of IoU for NMS
+ det_nms_use_iom=False,
+ # IoU threshold for det-to-track matching -- a detection is considered "matched" to a tracklet it
+ # overlaps with a tracklet above this threshold -- it is often a loose threshold like 0.1
+ assoc_iou_thresh=0.5,
+ # IoU threshold for det-to-track matching, which is used to determine whether a masklet is "unmatched"
+ # by any detections -- it is often a stricter threshold like 0.5
+ trk_assoc_iou_thresh=0.5,
+ # prob threshold for a detection to be added as a new object
+ new_det_thresh=0.5,
+ # hotstart parameters: we hold off the outputs for `hotstart_delay` frames and
+ # 1) remove those tracklets unmatched by any detections based on `hotstart_unmatch_thresh`
+ # 2) remove those tracklets overlapping with one another based on `hotstart_dup_thresh`
+ hotstart_delay=0,
+ hotstart_unmatch_thresh=3,
+ hotstart_dup_thresh=3,
+ # Whether to suppress masks only within hotstart. If False, we can suppress masks even if they start before hotstart period.
+ suppress_unmatched_only_within_hotstart=True,
+ init_trk_keep_alive=0,
+ max_trk_keep_alive=8,
+ min_trk_keep_alive=-4,
+ # Threshold for suppressing overlapping objects based on recent occlusion
+ suppress_overlapping_based_on_recent_occlusion_threshold=0.0,
+ allow_unoccluded_to_suppress: bool = False,
+ decrease_trk_keep_alive_for_empty_masklets=False,
+ o2o_matching_masklets_enable=False, # Enable hungarian matching to match existing masklets
+ suppress_det_close_to_boundary=False,
+ fill_hole_area=16,
+ sprinkle_removal_area=16,
+ # The maximum number of objects (masklets) to track across all GPUs (for no limit, set it to -1)
+ max_num_objects=128, # 128 objects (total across all GPUs) should be able to cover nearly all cases
+ max_num_kboxes=20,
+ recondition_every_nth_frame=-1,
+ use_iom_recondition=False,
+ iom_thresh_recondition=0.8,
+ iou_thresh_recondition=0.8,
+ is_multiplex=False,
+ # masket confirmation status (to suppress unconfirmed masklets)
+ masklet_confirmation_enable=False,
+ # a masklet is confirmed after being consecutively detected and matched for
+ # `masklet_confirmation_consecutive_det_thresh`
+ masklet_confirmation_consecutive_det_thresh=3,
+ # bbox heuristic parameters
+ reconstruction_bbox_iou_thresh=0.0,
+ reconstruction_bbox_det_score=0.5,
+ reapply_no_object_pointer: bool = False, # reapply the no object pointer for suppressed objects
+ running_in_prod=False, # Flag to specify if we are running in FBInfra for Insta Edit/Segments
+ use_batched_grounding=False,
+ batched_grounding_batch_size=1,
+ **kwargs,
+ ):
+ nn.Module.__init__(self)
+ assert isinstance(tracker, Sam3MultiplexTrackerPredictor)
+ self.tracker = tracker
+ assert isinstance(detector, Sam3MultiplexDetector)
+ self.detector = detector
+ if sam3_ckpt_path:
+ ckpt = torch.load(sam3_ckpt_path, map_location="cpu", weights_only=True)
+ self.detector.load_state_dict(ckpt["model"], strict=False)
+ elif ckpt_path:
+ self._load_checkpoint(ckpt_path, strict=False)
+ self.score_threshold_detection = score_threshold_detection
+ self.image_only_det_thresh = image_only_det_thresh
+ self.det_nms_thresh = det_nms_thresh
+ self.det_nms_use_iom = det_nms_use_iom
+ self.assoc_iou_thresh = assoc_iou_thresh
+ self.trk_assoc_iou_thresh = trk_assoc_iou_thresh
+ self.new_det_thresh = new_det_thresh
+ self.is_multiplex = is_multiplex
+ self.running_in_prod = running_in_prod
+ self.detector.running_in_prod = running_in_prod
+
+ assert (
+ self.is_multiplex == self.tracker.is_multiplex == self.detector.is_multiplex
+ ), (
+ f"is_multiplex must be the same for all models: {self.is_multiplex=}, {self.tracker.is_multiplex=}, {self.detector.is_multiplex=}"
+ )
+
+ # hotstart parameters
+ if hotstart_delay > 0:
+ assert hotstart_unmatch_thresh <= hotstart_delay
+ assert hotstart_dup_thresh <= hotstart_delay
+ self.hotstart_delay = hotstart_delay
+ self.hotstart_unmatch_thresh = hotstart_unmatch_thresh
+ self.hotstart_dup_thresh = hotstart_dup_thresh
+ self.suppress_unmatched_only_within_hotstart = (
+ suppress_unmatched_only_within_hotstart
+ )
+ self.init_trk_keep_alive = init_trk_keep_alive
+ self.max_trk_keep_alive = max_trk_keep_alive
+ self.min_trk_keep_alive = min_trk_keep_alive
+ self.suppress_overlapping_based_on_recent_occlusion_threshold = (
+ suppress_overlapping_based_on_recent_occlusion_threshold
+ )
+ self.allow_unoccluded_to_suppress = allow_unoccluded_to_suppress
+ self.suppress_det_close_to_boundary = suppress_det_close_to_boundary
+ self.decrease_trk_keep_alive_for_empty_masklets = (
+ decrease_trk_keep_alive_for_empty_masklets
+ )
+ self.o2o_matching_masklets_enable = o2o_matching_masklets_enable
+ self.fill_hole_area = fill_hole_area
+ self.sprinkle_removal_area = sprinkle_removal_area
+ self.eval()
+ self.rank = int(os.getenv("RANK", "0"))
+ self.world_size = int(os.getenv("WORLD_SIZE", "1"))
+ self._dist_pg_cpu = None # CPU process group (lazy-initialized on first use)
+
+ # Initialize profiling variables
+ self._profiler = None
+ self._frame_count = 0
+ self._profile_save_dir = os.getenv("PROFILE_SAVE_DIR", "/tmp/profiling")
+ self._profiling_enabled = os.getenv("ENABLE_PROFILING", "0").lower() == "1"
+
+ # the maximum object number
+ if max_num_objects > 0:
+ multiplex_divisor = (
+ self.tracker.multiplex_controller.allowed_bucket_capacity
+ if self.is_multiplex
+ else 1
+ )
+ num_obj_for_compile = math.ceil(
+ max_num_objects / (self.world_size * multiplex_divisor)
+ )
+ else:
+ max_num_objects = 10000 # no limit
+ num_obj_for_compile = 16
+ logger.info(
+ f"`setting max_num_objects` to {max_num_objects} -- creating {num_obj_for_compile=} objects for torch.compile cache"
+ )
+ self.max_num_objects = max_num_objects
+ self.num_obj_for_compile = num_obj_for_compile
+ self.max_num_kboxes = max_num_kboxes
+ self.recondition_every_nth_frame = recondition_every_nth_frame
+ self.use_iom_recondition = use_iom_recondition
+ self.iom_thresh_recondition = iom_thresh_recondition
+ self.iou_thresh_recondition = iou_thresh_recondition
+ self.masklet_confirmation_enable = masklet_confirmation_enable
+ self.masklet_confirmation_consecutive_det_thresh = (
+ masklet_confirmation_consecutive_det_thresh
+ )
+ self.reconstruction_bbox_iou_thresh = reconstruction_bbox_iou_thresh
+ self.reconstruction_bbox_det_score = reconstruction_bbox_det_score
+ self.reapply_no_object_pointer = reapply_no_object_pointer
+
+ # Batched grounding configuration
+ self.use_batched_grounding = use_batched_grounding
+ self.batched_grounding_batch_size = (
+ batched_grounding_batch_size # Batch size for batched grounding
+ )
+
+ if self.is_multiplex:
+ assert not self.tracker.multiplex_controller.training, (
+ "This model class should only be used for eval."
+ )
+ self.bucket_capacity: int = (
+ self.tracker.multiplex_controller.allowed_bucket_capacity
+ )
+
+ def all_gather_cpu(self, tensor_list, tensor):
+ if self._dist_pg_cpu is None:
+ self._init_dist_pg_cpu()
+ dist.broadcast(tensor_list, tensor, group=self._dist_pg_cpu)
+
+ def all_gather_python_obj_cpu(self, object_list, python_obj):
+ if self._dist_pg_cpu is None:
+ self._init_dist_pg_cpu()
+ dist.all_gather_object(object_list, python_obj, group=self._dist_pg_cpu)
+
+ def broadcast_cpu(self, x, src):
+ if self._dist_pg_cpu is None:
+ self._init_dist_pg_cpu()
+ dist.broadcast(x, src=src, group=self._dist_pg_cpu)
+
+ def _start_profiling(self, frame_idx):
+ self._profiling_enabled = os.getenv("ENABLE_PROFILING", "0").lower() == "1"
+ self._profile_end_frame = int(os.getenv("PROFILE_END_FRAME", "-1"))
+ """Start profiling for _det_track_one_frame if conditions are met."""
+ if not self._profiling_enabled:
+ return False
+
+ if not getattr(self, "_warm_up_complete", False):
+ return False
+
+ if self._profiler is not None:
+ return True
+
+ # Start profiling
+ os.makedirs(self._profile_save_dir, exist_ok=True)
+ profile_path = os.path.join(
+ self._profile_save_dir, f"det_track_frame_rank_{self.rank}.json.gz"
+ )
+
+ self._profiler = torch.profiler.profile(
+ activities=[
+ torch.profiler.ProfilerActivity.CPU,
+ torch.profiler.ProfilerActivity.CUDA,
+ ],
+ record_shapes=True,
+ experimental_config=torch.profiler._ExperimentalConfig(
+ profile_all_threads=True
+ ),
+ )
+ self._profiler.start()
+ self._current_profile_path = profile_path
+ print(f"Started profiling frame on {frame_idx} on rank {self.rank}")
+ return True
+
+ def _stop_profiling(self):
+ """Stop profiling and save trace."""
+ if self._profiler is not None:
+ self._profiler.stop()
+ self._profiler.export_chrome_trace(self._current_profile_path)
+ print(f"Profiling trace saved to: {self._current_profile_path}")
+ print(
+ f"You can open this file in Perfetto (https://ui.perfetto.dev/) to visualize the trace"
+ )
+ self._profiler = None
+ self._profiling_enabled = False
+ os.environ["ENABLE_PROFILING"] = "0"
+
+ def _det_track_one_frame(
+ self,
+ frame_idx: int,
+ num_frames: int,
+ reverse: bool,
+ input_batch: BatchedDatapoint,
+ geometric_prompt: Any,
+ tracker_states_local: List[Any],
+ tracker_metadata_prev: Dict[str, Any],
+ feature_cache: Dict,
+ orig_vid_height: int,
+ orig_vid_width: int,
+ is_image_only: bool = False,
+ ):
+ profiling_enabled = self._start_profiling(frame_idx)
+
+ try:
+ return self._det_track_one_frame_impl(
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ reverse=reverse,
+ input_batch=input_batch,
+ geometric_prompt=geometric_prompt,
+ tracker_states_local=tracker_states_local,
+ tracker_metadata_prev=tracker_metadata_prev,
+ feature_cache=feature_cache,
+ orig_vid_height=orig_vid_height,
+ orig_vid_width=orig_vid_width,
+ is_image_only=is_image_only,
+ )
+ finally:
+ if profiling_enabled:
+ if sys.exc_info()[0] is not None:
+ # If there is an exception, stop profiling
+ self._stop_profiling()
+ else:
+ if (
+ (not reverse and frame_idx == num_frames - 1)
+ or (reverse and frame_idx == 0)
+ or self._profile_end_frame == frame_idx
+ ):
+ # Stop profiling if reached the last frame
+ self._stop_profiling()
+
+ def _det_track_one_frame_impl(
+ self,
+ frame_idx: int,
+ num_frames: int,
+ reverse: bool,
+ input_batch: BatchedDatapoint,
+ geometric_prompt: Any,
+ tracker_states_local: List[Any],
+ tracker_metadata_prev: Dict[str, Any],
+ feature_cache: Dict,
+ orig_vid_height: int,
+ orig_vid_width: int,
+ is_image_only: bool,
+ ):
+ """
+ This function handles one-step inference for the multiplex model in an SPMD manner.
+ At a high-level, all GPUs execute the same function calls as if it's done on a single GPU,
+ while under the hood, some function calls involve distributed computation based on sharded
+ SAM2 states.
+
+ - `input_batch` contains image and other inputs on the entire video; it should be identical across GPUs
+ - `tracker_states_local` holds the local masklet information in this GPU shard
+ - `tracker_metadata_prev` manages the metadata for SAM2 objects, such as which masklet is hold on which GPUs
+ it contains both global and local masklet information
+ """
+
+ # Step 1: run backbone and FA in a distributed manner -- this is done via Sam3MultiplexDetector,
+ # a distributed FA model (assigned to `self.detector`) that shards frames in a round-robin manner.
+ # It returns a "det_out" dict for `frame_idx` and fills SAM2 backbone features for `frame_idx`
+ # into `feature_cache`. Despite its distributed inference under the hood, the results would be
+ # the same as if it is running backbone and FA for every frame on a single GPU.
+ with torch.profiler.record_function("run_backbone_and_detection"):
+ det_out, pos_pred_mask = self.run_backbone_and_detection(
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ reverse=reverse,
+ input_batch=input_batch,
+ geometric_prompt=geometric_prompt,
+ feature_cache=feature_cache,
+ use_batched_grounding=self.use_batched_grounding,
+ batched_grounding_batch_size=self.batched_grounding_batch_size,
+ )
+
+ # Step 2: each GPU propagates its local SAM2 states to get the SAM2 prediction masks.
+ # the returned `tracker_low_res_masks_global` contains the concatenated masklet predictions
+ # gathered from all GPUs (as if they are propagated on a single GPU). Note that this step only
+ # runs the SAM2 propagation step, but doesn't encode new memory for the predicted masks;
+ # we defer memory encoding to `run_tracker_update_execution_phase` after resolving all heuristics.
+ with torch.profiler.record_function("run_tracker_propagation"):
+ if tracker_metadata_prev == {}:
+ # initialize masklet metadata if it's uninitialized (empty dict)
+ tracker_metadata_prev.update(self._initialize_metadata())
+ tracker_low_res_masks_global, tracker_obj_scores_global = (
+ self.run_tracker_propagation(
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ reverse=reverse,
+ tracker_states_local=tracker_states_local,
+ tracker_metadata_prev=tracker_metadata_prev,
+ )
+ )
+
+ with torch.profiler.record_function("GPU sync and filter"):
+ # Remove leading dimension (assumes batch size 1)
+ assert pos_pred_mask.shape[0] == 1
+ pos_pred_mask = pos_pred_mask.squeeze(0)
+ det_out = {k: det_out[k][0] for k in det_out}
+ # Move detections we'll actually keep at the top for future logic
+ pos_pred_mask_idx = pos_pred_mask.argsort(descending=True)
+ pos_pred_mask = torch.index_select(
+ pos_pred_mask, dim=0, index=pos_pred_mask_idx
+ )
+ det_out = {
+ k: torch.index_select(det_out[k], dim=0, index=pos_pred_mask_idx)
+ for k in det_out
+ }
+
+ # Step 3: based on detection outputs and the propagated SAM2 prediction masks, we make plans
+ # for SAM2 masklet updates (i.e. which objects to add and remove, how to load-balance them, etc).
+ # We also run SAM2 memory encoder globally in this step to resolve non-overlapping constraints.
+ # **This step should involve all the heuristics needed for any updates.** Most of the update
+ # planning will be done on the master rank (GPU 0) and the resulting plan `sam2_update_plan` is
+ # broadcasted to other GPUs (to be executed in a distributed manner). This step also generates the
+ # new masklet metadata `tracker_metadata_new` (based on its previous version `tracker_metadata_prev`).
+ with torch.profiler.record_function("run_tracker_update_planning_phase"):
+ sam2_update_plan, tracker_metadata_new = (
+ self.run_tracker_update_planning_phase(
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ reverse=reverse,
+ det_out=det_out,
+ det_keep=pos_pred_mask,
+ tracker_low_res_masks_global=tracker_low_res_masks_global,
+ tracker_obj_scores_global=tracker_obj_scores_global,
+ tracker_metadata_prev=tracker_metadata_prev,
+ tracker_states_local=tracker_states_local,
+ is_image_only=is_image_only,
+ )
+ )
+
+ # Get reconditioning info from the update plan
+ reconditioned_obj_ids = sam2_update_plan.get("reconditioned_obj_ids", set())
+ det_to_matched_trk_obj_ids = sam2_update_plan.get(
+ "det_to_matched_trk_obj_ids", {}
+ )
+
+ # Step 4: based on `sam2_update_plan`, each GPU executes the update w.r.t. its local SAM2 inference states
+ with torch.profiler.record_function("run_tracker_update_execution_phase"):
+ tracker_states_local_new = self.run_tracker_update_execution_phase(
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ reverse=reverse,
+ det_out=det_out,
+ tracker_states_local=tracker_states_local,
+ tracker_update_plan=sam2_update_plan,
+ tracker_metadata_new=tracker_metadata_new,
+ orig_vid_height=orig_vid_height,
+ orig_vid_width=orig_vid_width,
+ feature_cache=feature_cache,
+ )
+
+ # Step 5: finally, build the outputs for this frame (it only needs to be done on GPU 0 since
+ # only GPU 0 will send outputs to the server).
+ with torch.profiler.record_function("build_outputs"):
+ if self.rank == 0:
+ obj_id_to_mask = self.build_outputs(
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ reverse=reverse,
+ det_out=det_out,
+ tracker_low_res_masks_global=tracker_low_res_masks_global,
+ tracker_obj_scores_global=tracker_obj_scores_global,
+ tracker_metadata_prev=tracker_metadata_prev,
+ sam2_update_plan=sam2_update_plan,
+ orig_vid_height=orig_vid_height,
+ orig_vid_width=orig_vid_width,
+ reconditioned_obj_ids=reconditioned_obj_ids,
+ det_to_matched_trk_obj_ids=det_to_matched_trk_obj_ids,
+ )
+ obj_id_to_score = tracker_metadata_new["obj_id_to_score"]
+ else:
+ obj_id_to_mask, obj_id_to_score = {}, {} # dummy outputs on other GPUs
+ # a few statistics for the current frame as a part of the output
+ frame_stats = {
+ "num_obj_tracked": np.sum(tracker_metadata_new["num_obj_per_gpu"]),
+ "num_obj_dropped": sam2_update_plan["num_obj_dropped_due_to_limit"],
+ }
+ # add sam2 scores to metadata, it should be fired for frames except the first frame
+ if tracker_obj_scores_global.shape[0] > 0:
+ # Convert tracker_obj_scores_global to sigmoid scores before updating
+ tracker_obj_scores_global = tracker_obj_scores_global.sigmoid()
+ sam2_obj_ids = tracker_metadata_prev["obj_ids_all_gpu"]
+ tracker_metadata_new["obj_id_to_sam2_score_frame_wise"][frame_idx].update(
+ dict(zip(sam2_obj_ids, tracker_obj_scores_global))
+ )
+
+ return (
+ obj_id_to_mask, # a dict: obj_id --> output mask
+ obj_id_to_score, # a dict: obj_id --> output score (prob)
+ tracker_states_local_new,
+ tracker_metadata_new,
+ frame_stats,
+ tracker_obj_scores_global, # a dict: obj_id --> sam2 frame-level scores
+ )
+
+ def run_backbone_and_detection(
+ self,
+ frame_idx: int,
+ num_frames: int,
+ input_batch: BatchedDatapoint,
+ geometric_prompt: Any,
+ feature_cache: Dict,
+ reverse: bool,
+ use_batched_grounding: bool = False,
+ batched_grounding_batch_size: int = 16,
+ ):
+ # Step 1: if text feature is not cached in `feature_cache`, compute and cache it
+ text_batch_key = tuple(input_batch.find_text_batch)
+ if "text" not in feature_cache or text_batch_key not in feature_cache["text"]:
+ text_outputs = self.detector.backbone.forward_text(
+ input_batch.find_text_batch, device=self.device
+ )
+ # note: we only cache the text feature of the most recent prompt
+ feature_cache["text"] = {text_batch_key: text_outputs}
+ else:
+ text_outputs = feature_cache["text"][text_batch_key]
+
+ # Step 2: run backbone, FA detection, and post-processing with NMS
+ # Extract max_frame_num_to_track from feature_cache if available
+ tracking_bounds = feature_cache.get("tracking_bounds", {})
+ max_frame_num_to_track = tracking_bounds.get("max_frame_num_to_track")
+ start_frame_idx = tracking_bounds.get("propagate_in_video_start_frame_idx")
+ backbone_out = {
+ "img_batch_all_stages": input_batch.img_batch,
+ **text_outputs,
+ }
+
+ if use_batched_grounding:
+ # Use fully batched forward_grounding approach
+ if "grounding_cache" not in feature_cache:
+ feature_cache["grounding_cache"] = {}
+
+ with torch.profiler.record_function(
+ "forward_video_grounding_batched_multigpu"
+ ):
+ sam3_image_out, _ = (
+ self.detector.forward_video_grounding_batched_multigpu(
+ backbone_out=backbone_out,
+ find_inputs=input_batch.find_inputs,
+ geometric_prompt=geometric_prompt,
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ grounding_cache=feature_cache["grounding_cache"],
+ track_in_reverse=reverse,
+ return_sam2_backbone_feats=True,
+ run_nms=self.det_nms_thresh > 0.0,
+ nms_prob_thresh=self.score_threshold_detection,
+ nms_iou_thresh=self.det_nms_thresh,
+ nms_use_iom=self.det_nms_use_iom,
+ max_frame_num_to_track=max_frame_num_to_track,
+ propagate_in_video_start_frame_idx=start_frame_idx,
+ feature_cache=feature_cache,
+ batch_size=batched_grounding_batch_size,
+ )
+ )
+ else:
+ # Use existing multi-GPU distributed approach
+ if "multigpu_buffer" not in feature_cache:
+ # "multigpu_buffer" is a buffer cache used by `self.detector` and it needs
+ # to be passed to `forward_video_grounding_multigpu` for every call
+ feature_cache["multigpu_buffer"] = {}
+
+ with torch.profiler.record_function("forward_video_grounding_multigpu"):
+ sam3_image_out, _ = self.detector.forward_video_grounding_multigpu(
+ backbone_out=backbone_out,
+ find_inputs=input_batch.find_inputs,
+ geometric_prompt=geometric_prompt,
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ multigpu_buffer=feature_cache["multigpu_buffer"],
+ track_in_reverse=reverse,
+ # also get the SAM2 backbone features
+ return_sam2_backbone_feats=True,
+ # run NMS as a part of distributed FA computation
+ run_nms=self.det_nms_thresh > 0.0,
+ nms_prob_thresh=self.score_threshold_detection,
+ nms_iou_thresh=self.det_nms_thresh,
+ nms_use_iom=self.det_nms_use_iom,
+ # pass max_frame_num_to_track to respect tracking limits
+ max_frame_num_to_track=max_frame_num_to_track,
+ propagate_in_video_start_frame_idx=start_frame_idx,
+ # pass feature_cache for buffered backbone computation
+ feature_cache=feature_cache,
+ )
+
+ # note: detections in `sam3_image_out` has already gone through NMS
+ pred_probs = sam3_image_out["pred_logits"].squeeze(-1).sigmoid()
+ pred_boxes_xyxy = sam3_image_out["pred_boxes_xyxy"]
+ pred_masks = sam3_image_out["pred_masks"]
+ # get the positive detection outputs above threshold
+ pos_pred_mask = pred_probs > self.score_threshold_detection
+
+ if self.suppress_det_close_to_boundary:
+ # Suppress detections too close to image edges (for normalized boxes).
+ keep = self._suppress_detections_close_to_boundary(pred_boxes_xyxy)
+ pos_pred_mask = pos_pred_mask & keep
+
+ det_out = {
+ "bbox": pred_boxes_xyxy,
+ "mask": pred_masks,
+ "scores": pred_probs,
+ }
+
+ # Step 3: build SAM2 backbone features and store them in `feature_cache`
+ backbone_cache = {}
+ if self.is_multiplex:
+ # For the multiplex model we have separate interaction and propagation features
+ # TODO: We do not need the interaction features every frame so there are rooms for optimization
+ interaction_sam_mask_decoder = self.tracker.interactive_sam_mask_decoder
+ interaction_backbone_fpn = [
+ interaction_sam_mask_decoder.conv_s0(
+ sam3_image_out["interactive_backbone_fpn_0"]
+ ),
+ interaction_sam_mask_decoder.conv_s1(
+ sam3_image_out["interactive_backbone_fpn_1"]
+ ),
+ sam3_image_out[
+ "interactive_backbone_fpn_2"
+ ], # fpn_2 doesn't need additional conv
+ ]
+ interaction_backbone_out = {
+ "vision_features": interaction_backbone_fpn[-1], # top-level feature
+ "vision_mask": None,
+ "vision_pos_enc": sam3_image_out["interactive_backbone_pos_enc"],
+ "backbone_fpn": [
+ NestedTensor(x, None) for x in interaction_backbone_fpn
+ ],
+ }
+ backbone_cache["interactive"] = interaction_backbone_out
+ sam_mask_decoder = self.tracker.sam_mask_decoder
+ sam2_backbone_fpn = [
+ sam_mask_decoder.conv_s0(sam3_image_out["sam2_backbone_fpn_0"]),
+ sam_mask_decoder.conv_s1(sam3_image_out["sam2_backbone_fpn_1"]),
+ sam3_image_out["sam2_backbone_fpn_2"], # fpn_2 doesn't need additional conv
+ ]
+ sam2_backbone_out = {
+ "vision_features": sam2_backbone_fpn[-1], # top-level feature
+ "vision_mask": None,
+ "vision_pos_enc": sam3_image_out["sam2_backbone_pos_enc"],
+ "backbone_fpn": [NestedTensor(x, None) for x in sam2_backbone_fpn],
+ }
+ backbone_cache["sam2_backbone_out"] = sam2_backbone_out
+
+ with torch.profiler.record_function("run_backbone_and_detection.feature_cache"):
+ feature_cache[frame_idx] = (
+ input_batch.img_batch.tensors[frame_idx],
+ backbone_cache,
+ )
+ # remove from `feature_cache` old features to save GPU memory
+ feature_cache.pop(frame_idx - 1 if not reverse else frame_idx + 1, None)
+ return det_out, pos_pred_mask
+
+ def run_tracker_propagation(
+ self,
+ frame_idx: int,
+ num_frames: int,
+ reverse: bool,
+ tracker_states_local: List[Any],
+ tracker_metadata_prev: Dict[str, np.ndarray],
+ ):
+ # Step 1: propagate the local SAM2 states to get the current frame's prediction
+ # `low_res_masks_local` of the existing masklets on this GPU
+ # - obj_ids_local: List[int] -- list of object IDs
+ # - low_res_masks_local: Tensor -- (num_local_obj, H_mask, W_mask)
+ with torch.profiler.record_function("propagate_tracker_one_frame_local_gpu"):
+ obj_ids_local, low_res_masks_local, obj_scores_local = (
+ self._propogate_tracker_one_frame_local_gpu(
+ tracker_states_local, frame_idx=frame_idx, reverse=reverse
+ )
+ )
+
+ assert np.all(
+ obj_ids_local == tracker_metadata_prev["obj_ids_per_gpu"][self.rank]
+ ), "{} != {}".format(
+ obj_ids_local, tracker_metadata_prev["obj_ids_per_gpu"][self.rank]
+ )
+
+ # Step 2: all-gather `low_res_masks_local` into `low_res_masks_global`
+ # - low_res_masks_global: Tensor -- (num_global_obj, H_mask, W_mask)
+ with torch.profiler.record_function("all_gather_low_res_masks_local"):
+ _, H_mask, W_mask = low_res_masks_local.shape
+ if self.world_size > 1:
+ # `low_res_masks_local` and `obj_scores_local` need to be contiguous and float32
+ # (they could be non-contiguous due to slicing and/or bfloat16 due to autocast)
+ low_res_masks_local = low_res_masks_local.float().contiguous()
+ obj_scores_local = obj_scores_local.float().contiguous()
+ num_obj_this_gpu = tracker_metadata_prev["num_obj_per_gpu"][self.rank]
+ assert low_res_masks_local.size(0) == num_obj_this_gpu
+ assert obj_scores_local.size(0) == num_obj_this_gpu
+ low_res_masks_peers = [
+ low_res_masks_local.new_empty(num_obj, H_mask, W_mask)
+ for num_obj in tracker_metadata_prev["num_obj_per_gpu"]
+ ]
+ obj_scores_peers = [
+ obj_scores_local.new_empty(num_obj)
+ for num_obj in tracker_metadata_prev["num_obj_per_gpu"]
+ ]
+ dist.all_gather(low_res_masks_peers, low_res_masks_local)
+ dist.all_gather(obj_scores_peers, obj_scores_local)
+ low_res_masks_global = torch.cat(low_res_masks_peers, dim=0)
+ obj_scores_global = torch.cat(obj_scores_peers, dim=0)
+ else:
+ low_res_masks_global = low_res_masks_local
+ obj_scores_global = obj_scores_local
+ return low_res_masks_global, obj_scores_global
+
+ def _recondition_masklets(
+ self,
+ frame_idx,
+ det_out: Dict[str, Tensor],
+ trk_id_to_max_iou_high_conf_det: Dict[int, int], # trk_obj_id -> det_idx
+ tracker_states_local: List[Any],
+ tracker_metadata: Dict[str, np.ndarray],
+ tracker_obj_scores_global: Tensor,
+ tracker_low_res_masks_global: Tensor,
+ ):
+ reconditioned_obj_ids = set()
+ HIGH_CONF_THRESH = 0.8
+ input_mask_res = self.tracker.input_mask_size
+
+ if len(trk_id_to_max_iou_high_conf_det) == 0:
+ return tracker_states_local, reconditioned_obj_ids
+
+ # === BATCH ALL INDEX LOOKUPS ON GPU ===
+ trk_obj_ids = list(trk_id_to_max_iou_high_conf_det.keys())
+ det_indices = list(trk_id_to_max_iou_high_conf_det.values())
+
+ # Convert obj_ids_all_gpu to tensor once (keep on GPU)
+ obj_ids_all_gpu_t = torch.from_numpy(tracker_metadata["obj_ids_all_gpu"]).to(
+ device=tracker_obj_scores_global.device
+ )
+ trk_obj_ids_t = torch.tensor(
+ trk_obj_ids, device=tracker_obj_scores_global.device
+ )
+ det_indices_t = torch.tensor(
+ det_indices, device=tracker_obj_scores_global.device
+ )
+
+ # Batched lookup: find obj_idx for each trk_obj_id
+ # Shape: (num_trk, num_all_obj) -> find matching indices
+ matches = trk_obj_ids_t.unsqueeze(1) == obj_ids_all_gpu_t.unsqueeze(0) # (N, M)
+ obj_indices_t = matches.int().argmax(dim=1) # (N,)
+
+ # Batched score lookup and filtering - NO SYNC until we need CPU decision
+ obj_scores_batch = tracker_obj_scores_global[obj_indices_t].sigmoid() # (N,)
+ high_conf_mask = obj_scores_batch > HIGH_CONF_THRESH # (N,) bool tensor on GPU
+
+ # === SINGLE SYNC POINT: Transfer filter mask to CPU ===
+ high_conf_mask_cpu = high_conf_mask.cpu().numpy()
+
+ # Filter to only high-confidence items
+ valid_trk_obj_ids = [
+ tid for tid, valid in zip(trk_obj_ids, high_conf_mask_cpu) if valid
+ ]
+ valid_det_indices = [
+ did for did, valid in zip(det_indices, high_conf_mask_cpu) if valid
+ ]
+ valid_obj_indices = obj_indices_t[high_conf_mask] # Keep as tensor
+
+ if len(valid_trk_obj_ids) == 0:
+ return tracker_states_local, reconditioned_obj_ids
+
+ # === BATCH MASK OPERATIONS ===
+ valid_det_indices_t = torch.tensor(
+ valid_det_indices, device=det_out["mask"].device
+ )
+
+ # Batch fetch all detection masks at once
+ new_masks = det_out["mask"][valid_det_indices_t] # (K, H, W)
+ new_masks_binary = (
+ F.interpolate(
+ new_masks.unsqueeze(1),
+ size=(input_mask_res, input_mask_res),
+ mode="bilinear",
+ align_corners=False,
+ ).squeeze(1)
+ > 0
+ ) # (K, H, W)
+
+ # Batch update low_res_masks_global
+ old_masks = tracker_low_res_masks_global[valid_obj_indices] # (K, H, W)
+ binary_agreement = (new_masks > 0) == (old_masks > 0)
+ updated_masks = torch.where(binary_agreement, old_masks, new_masks)
+
+ # Batch hole filling
+ updated_masks = fill_holes_in_mask_scores(
+ updated_masks.unsqueeze(1),
+ fill_hole_area=self.fill_hole_area,
+ sprinkle_removal_area=self.sprinkle_removal_area,
+ fill_holes=True,
+ remove_sprinkles=True,
+ ).squeeze(1)
+
+ # Write back (scatter)
+ tracker_low_res_masks_global[valid_obj_indices] = updated_masks
+
+ # === NOW DO THE STATE UPDATES (still needs iteration but with pre-filtered data) ===
+ if self.is_multiplex:
+ state_to_recondition_info = {}
+ for i, trk_obj_id in enumerate(valid_trk_obj_ids):
+ for state_idx, inference_state in enumerate(tracker_states_local):
+ if trk_obj_id in inference_state["obj_ids"]:
+ if state_idx not in state_to_recondition_info:
+ state_to_recondition_info[state_idx] = []
+ state_to_recondition_info[state_idx].append(
+ (trk_obj_id, new_masks_binary[i])
+ )
+ break
+
+ for state_idx, recondition_list in state_to_recondition_info.items():
+ inference_state = tracker_states_local[state_idx]
+ obj_ids_to_recondition = [item[0] for item in recondition_list]
+ masks_to_recondition = torch.stack(
+ [item[1] for item in recondition_list]
+ )
+ with torch.profiler.record_function(
+ "_recodition_masklets.add_new_masks"
+ ):
+ self.tracker.add_new_masks(
+ inference_state=inference_state,
+ frame_idx=frame_idx,
+ obj_ids=obj_ids_to_recondition,
+ masks=masks_to_recondition,
+ reconditioning=True,
+ )
+ reconditioned_obj_ids.update(inference_state["obj_idx_to_id"].values())
+ else:
+ # Non-multiplex: still iterate but masks already computed
+ for i, trk_obj_id in enumerate(valid_trk_obj_ids):
+ for inference_state in tracker_states_local:
+ if trk_obj_id in inference_state["obj_ids"]:
+ self.tracker.add_new_mask(
+ inference_state=inference_state,
+ frame_idx=frame_idx,
+ obj_id=trk_obj_id,
+ mask=new_masks_binary[i],
+ )
+ reconditioned_obj_ids.update(
+ inference_state["obj_idx_to_id"].values()
+ )
+ break
+
+ return tracker_states_local, reconditioned_obj_ids
+
+ def _deepcopy(self, x):
+ # If running in prod, dont need to do a deepcopy as we only traverse in 1 direction
+ if True:
+ return x
+ return deepcopy(x)
+
+ def run_tracker_update_planning_phase(
+ self,
+ frame_idx: int,
+ num_frames: int,
+ reverse: bool,
+ det_out: Dict[str, Tensor],
+ det_keep: Tensor,
+ tracker_low_res_masks_global: Tensor,
+ tracker_obj_scores_global: Tensor,
+ tracker_metadata_prev: Dict[str, np.ndarray],
+ tracker_states_local: List[Any],
+ is_image_only: bool = False,
+ ):
+ # initialize new metadata from previous metadata (its values will be updated later)
+ with torch.profiler.record_function("initialize_tracker_metadata_new"):
+ tracker_metadata_new = self._create_planning_metadata(tracker_metadata_prev)
+
+ # Initialize reconditioned_obj_ids early to avoid UnboundLocalError
+ reconditioned_obj_ids = set()
+
+ # Step 1: make the update plan and resolve heuristics on GPU 0
+ det_mask_preds: Tensor = det_out["mask"] # low-res mask logits
+ det_scores: Tensor = det_out["scores"].float()
+ # a) match FA and SAM2 masks and find new objects
+ with torch.profiler.record_function("associate_det_trk"):
+ adt_result = self._associate_det_trk(
+ det_masks=det_mask_preds,
+ det_scores=det_scores,
+ det_keep=det_keep,
+ trk_masks=tracker_low_res_masks_global,
+ trk_obj_ids=tracker_metadata_prev["obj_ids_all_gpu"],
+ default_det_thresh=(
+ self.image_only_det_thresh if is_image_only else None
+ ),
+ )
+
+ # b) handle hotstart heuristics to remove objects (GPU-vectorized, no sync!)
+ # here `rank0_metadata` contains metadata stored on (and only accessible to) GPU 0;
+ # we avoid broadcasting them to other GPUs to save communication cost, assuming
+ # that `rank0_metadata` is not needed by other GPUs
+ rank0_metadata_new = self._deepcopy(tracker_metadata_prev["rank0_metadata"])
+ if not hasattr(self, "_warm_up_complete") or self._warm_up_complete:
+ # Call GPU-vectorized hotstart using lazy adt_result (NO realize_adt yet!)
+ with torch.profiler.record_function("_process_hotstart_gpu"):
+ to_remove_mask, to_suppress_mask, gpu_metadata_new = (
+ self._process_hotstart_gpu(
+ frame_idx=frame_idx,
+ reverse=reverse,
+ adt_result=adt_result, # Still lazy - no sync!
+ tracker_metadata_prev=tracker_metadata_prev,
+ gpu_metadata_prev=tracker_metadata_prev["gpu_metadata"],
+ )
+ )
+ # IMPORTANT: From this point, tracker_metadata_new["gpu_metadata"] is updated but CPU metadata (obj_ids_all_gpu, etc.) is NOT
+ tracker_metadata_new["gpu_metadata"] = gpu_metadata_new
+ else:
+ # if warm-up is not complete, we don't remove any objects
+ N_obj = tracker_low_res_masks_global.size(0)
+ to_remove_mask = torch.zeros(
+ N_obj, dtype=torch.bool, device=tracker_low_res_masks_global.device
+ )
+ to_suppress_mask = torch.zeros(
+ N_obj, dtype=torch.bool, device=tracker_low_res_masks_global.device
+ )
+ tracker_metadata_new["rank0_metadata"] = rank0_metadata_new
+
+ # Step 3 (optional): recondition masklets based on high-confidence detections before memory encoding
+ # NOTE: Running this in execution phase (after memory encoding) can lead to suboptimal results
+ should_recondition_iou = False
+
+ # Evaluate tracklets for reconditioning based on bbox IoU mismatch with detections
+ if self.reconstruction_bbox_iou_thresh > 0:
+ adt_result = realize_adt_result(
+ adt_result, tracker_metadata_prev, det_mask_preds
+ )
+ if (
+ self.reconstruction_bbox_iou_thresh > 0
+ and len(adt_result.trk_id_to_max_iou_high_conf_det) > 0
+ ):
+ with torch.profiler.record_function(
+ "evaluate_reconstruction_bbox_iou_thresh"
+ ):
+ trk_obj_ids = adt_result.trk_id_to_max_iou_high_conf_det.keys()
+ sam2_obj_ids_all_gpu = list(tracker_metadata_prev["obj_ids_all_gpu"])
+ trk_ids = [
+ sam2_obj_ids_all_gpu.index(trk_obj_id)
+ for trk_obj_id in trk_obj_ids
+ if trk_obj_id in sam2_obj_ids_all_gpu
+ ]
+ det_ids = list(adt_result.trk_id_to_max_iou_high_conf_det.values())
+
+ det_boxes_bbox_iou = det_out["bbox"][det_ids]
+ det_scores_bbox_iou = det_out["scores"][det_ids]
+ sam2_mask = tracker_low_res_masks_global[trk_ids]
+ mask_binary = sam2_mask > 0
+ sam2_box_pixels = mask_to_box(mask_binary.unsqueeze(1)).squeeze(1)
+ mask_height, mask_width = sam2_mask.shape[-2:]
+ sam2_box_normalized = sam2_box_pixels / torch.tensor(
+ [mask_width, mask_height, mask_width, mask_height],
+ device=sam2_box_pixels.device,
+ )
+ iou = fast_diag_box_iou(det_boxes_bbox_iou, sam2_box_normalized)[0]
+ if iou < self.reconstruction_bbox_iou_thresh and torch.any(
+ det_scores_bbox_iou >= self.reconstruction_bbox_det_score
+ ):
+ should_recondition_iou = True
+
+ if (
+ self.recondition_every_nth_frame > 0
+ and frame_idx % self.recondition_every_nth_frame == 0
+ ):
+ adt_result = realize_adt_result(
+ adt_result, tracker_metadata_prev, det_mask_preds
+ )
+
+ should_recondition_periodic = (
+ self.recondition_every_nth_frame > 0
+ and frame_idx % self.recondition_every_nth_frame == 0
+ and len(adt_result.trk_id_to_max_iou_high_conf_det) > 0
+ )
+
+ # Recondition if periodic or IoU condition met
+ if should_recondition_periodic or should_recondition_iou:
+ adt_result = realize_adt_result(
+ adt_result, tracker_metadata_prev, det_mask_preds
+ )
+ # NOTE: sam2_low_res_mask_global is modified in-place on all GPUs.
+ with torch.profiler.record_function("_recondition_masklets"):
+ tracker_states_local, reconditioned_obj_ids = (
+ self._recondition_masklets(
+ frame_idx,
+ det_out,
+ adt_result.trk_id_to_max_iou_high_conf_det,
+ tracker_states_local,
+ tracker_metadata_prev,
+ tracker_obj_scores_global,
+ tracker_low_res_masks_global,
+ )
+ )
+
+ for state in tracker_states_local:
+ if any(
+ obj_id in reconditioned_obj_ids
+ for obj_id in state.get("obj_ids", [])
+ ):
+ self.tracker.propagate_in_video_preflight(
+ state, run_mem_encoder=True
+ )
+
+ # Step 4: Run SAM2 memory encoder on the current frame's prediction masks
+ # This is done on all GPUs
+ batch_size = tracker_low_res_masks_global.size(0)
+ if batch_size > 0:
+ if not hasattr(self, "_warm_up_complete") or self._warm_up_complete:
+ if self.suppress_overlapping_based_on_recent_occlusion_threshold > 0.0:
+ # NOTE: tracker_low_res_masks_global is updated in-place then returned
+ with torch.profiler.record_function(
+ "_suppress_overlapping_based_on_recent_occlusion"
+ ):
+ tracker_low_res_masks_global = (
+ self._suppress_overlapping_based_on_recent_occlusion(
+ frame_idx,
+ tracker_low_res_masks_global,
+ tracker_metadata_prev,
+ tracker_metadata_new,
+ to_remove_mask, # GPU boolean mask, no sync!
+ reverse,
+ )
+ )
+ with torch.profiler.record_function("_tracker_update_memories"):
+ self._tracker_update_memories(
+ tracker_states_local,
+ frame_idx,
+ tracker_metadata=tracker_metadata_prev,
+ low_res_masks=tracker_low_res_masks_global,
+ )
+
+ # NOW realize adt_result after memory encoding (sync only for GPU load balancing)
+ adt_result = realize_adt_result(
+ adt_result, tracker_metadata_prev, det_mask_preds
+ )
+ new_det_obj_ids, new_det_gpu_ids, num_obj_dropped_due_to_limit = (
+ adt_result.get_new_det_gpu_ids(
+ tracker_metadata_prev, is_image_only, det_scores, self
+ )
+ )
+
+ # Convert GPU removal mask to CPU obj_id set for metadata updates
+ if not hasattr(self, "_warm_up_complete") or self._warm_up_complete:
+ obj_ids_all_gpu = tracker_metadata_prev["obj_ids_all_gpu"]
+ to_remove_cpu = to_remove_mask.cpu().numpy()
+ obj_ids_newly_removed = set(obj_ids_all_gpu[to_remove_cpu].tolist())
+ else:
+ obj_ids_newly_removed = set()
+
+ # Step 4: update the SAM2 metadata based on the update plan
+ # note: except for "rank0_metadata" (that is only available on GPU 0),
+ # the updated `tracker_metadata_new` should be identical on all GPUs
+ for rank in range(self.world_size):
+ new_det_obj_ids_this_gpu = new_det_obj_ids[new_det_gpu_ids == rank]
+ updated_obj_ids_this_gpu = tracker_metadata_new["obj_ids_per_gpu"][rank]
+ if len(new_det_obj_ids_this_gpu) > 0:
+ updated_obj_ids_this_gpu = np.concatenate(
+ [updated_obj_ids_this_gpu, new_det_obj_ids_this_gpu]
+ )
+ if len(obj_ids_newly_removed) > 0:
+ is_removed = np.isin(
+ updated_obj_ids_this_gpu, list(obj_ids_newly_removed)
+ )
+ updated_obj_ids_this_gpu = updated_obj_ids_this_gpu[~is_removed]
+ tracker_metadata_new["obj_ids_per_gpu"][rank] = updated_obj_ids_this_gpu
+ tracker_metadata_new["num_obj_per_gpu"][rank] = len(
+ updated_obj_ids_this_gpu
+ )
+ tracker_metadata_new["obj_ids_all_gpu"] = np.concatenate(
+ tracker_metadata_new["obj_ids_per_gpu"]
+ )
+ # update object scores and the maximum object ID assigned so far
+ if len(new_det_obj_ids) > 0:
+ det_scores_np: np.ndarray = det_scores.cpu().numpy()
+ tracker_metadata_new["obj_id_to_score"].update(
+ zip(new_det_obj_ids, det_scores_np[adt_result.new_det_fa_inds])
+ )
+ # sam2 scores are not available for new objects, use det score instead.
+ # Store as GPU tensors for consistency with SAM2 propagation scores
+ new_det_scores_tensor = det_scores[adt_result.new_det_fa_inds]
+ tracker_metadata_new["obj_id_to_sam2_score_frame_wise"][frame_idx].update(
+ zip(new_det_obj_ids, new_det_scores_tensor)
+ )
+ tracker_metadata_new["max_obj_id"] = max(
+ tracker_metadata_new["max_obj_id"],
+ np.max(new_det_obj_ids),
+ )
+ # for removed objects, we set their scores to a very low value (-1e4) but still
+ # keep them in "obj_id_to_score" (it's easier to handle outputs this way)
+ for obj_id in obj_ids_newly_removed:
+ tracker_metadata_new["obj_id_to_score"][obj_id] = -1e4
+ # Store as GPU tensor for consistency
+ tracker_metadata_new["obj_id_to_sam2_score_frame_wise"][frame_idx][
+ obj_id
+ ] = torch.tensor(-1e4, dtype=torch.float32, device=det_scores.device)
+ tracker_metadata_new["obj_id_to_last_occluded"].pop(obj_id, None)
+ # check that "rank0_metadata" is in tracker_metadata_new if and only if it's GPU 0
+ assert "rank0_metadata" in tracker_metadata_new
+ if self.masklet_confirmation_enable:
+ with torch.profiler.record_function("update_masklet_confirmation_status"):
+ rank0_metadata = self.update_masklet_confirmation_status(
+ rank0_metadata=tracker_metadata_new["rank0_metadata"],
+ obj_ids_all_gpu_prev=tracker_metadata_prev["obj_ids_all_gpu"],
+ obj_ids_all_gpu_updated=tracker_metadata_new["obj_ids_all_gpu"],
+ det_to_matched_trk_obj_ids=adt_result.det_to_matched_trk_obj_ids,
+ new_det_obj_ids=new_det_obj_ids,
+ )
+ tracker_metadata_new["rank0_metadata"] = rank0_metadata
+
+ # Compact GPU metadata NOW (after sync) in preparation for next frame
+ # This removes entries for objects that will be deleted in execution phase
+ # so next frame's _process_hotstart_gpu doesn't need to do sync-inducing compaction
+ if not hasattr(self, "_warm_up_complete") or self._warm_up_complete:
+ if (
+ "gpu_metadata" in tracker_metadata_new
+ and tracker_metadata_new["gpu_metadata"].get("N_obj", 0) > 0
+ ):
+ with torch.profiler.record_function("compact_gpu_metadata"):
+ gpu_meta = tracker_metadata_new["gpu_metadata"]
+ removed_mask = gpu_meta[
+ "removed_mask"
+ ] # (N_obj,) - which objects marked for removal
+ keep_indices = torch.nonzero(~removed_mask, as_tuple=True)[0]
+
+ gpu_meta["obj_first_frame"] = gpu_meta["obj_first_frame"][
+ keep_indices
+ ]
+ gpu_meta["consecutive_unmatch_count"] = gpu_meta[
+ "consecutive_unmatch_count"
+ ][keep_indices]
+ gpu_meta["trk_keep_alive"] = gpu_meta["trk_keep_alive"][
+ keep_indices
+ ]
+ gpu_meta["removed_mask"] = gpu_meta["removed_mask"][
+ keep_indices
+ ] # Should be all False
+ gpu_meta["last_occluded_tensor"] = gpu_meta["last_occluded_tensor"][
+ keep_indices
+ ]
+
+ # Compact pairwise matrix (remove both rows and columns)
+ overlap_counts = gpu_meta["overlap_pair_counts"]
+ overlap_counts = overlap_counts[keep_indices][:, keep_indices]
+ gpu_meta["overlap_pair_counts"] = overlap_counts
+
+ # Update N_obj to reflect post-removal count
+ gpu_meta["N_obj"] = keep_indices.size(0)
+
+ # After compaction, extend gpu_metadata with new objects' initial values
+ # This ensures obj_first_frame is set to the detection frame, not propagation frame
+ num_new = len(new_det_obj_ids)
+ if num_new > 0:
+ with torch.profiler.record_function(
+ "extend_gpu_metadata_for_new_objects"
+ ):
+ gpu_meta = tracker_metadata_new["gpu_metadata"]
+ device = det_scores.device
+ NEVER_OCCLUDED = -1
+
+ # Extend all metadata tensors for new objects
+ gpu_meta["obj_first_frame"] = torch.cat(
+ [
+ gpu_meta.get(
+ "obj_first_frame",
+ torch.empty(0, dtype=torch.long, device=device),
+ ),
+ torch.full(
+ (num_new,), frame_idx, dtype=torch.long, device=device
+ ),
+ ]
+ )
+ gpu_meta["consecutive_unmatch_count"] = torch.cat(
+ [
+ gpu_meta.get(
+ "consecutive_unmatch_count",
+ torch.empty(0, dtype=torch.long, device=device),
+ ),
+ torch.zeros(num_new, dtype=torch.long, device=device),
+ ]
+ )
+ gpu_meta["trk_keep_alive"] = torch.cat(
+ [
+ gpu_meta.get(
+ "trk_keep_alive",
+ torch.empty(0, dtype=torch.long, device=device),
+ ),
+ torch.full(
+ (num_new,),
+ self.init_trk_keep_alive,
+ dtype=torch.long,
+ device=device,
+ ),
+ ]
+ )
+ gpu_meta["removed_mask"] = torch.cat(
+ [
+ gpu_meta.get(
+ "removed_mask",
+ torch.empty(0, dtype=torch.bool, device=device),
+ ),
+ torch.zeros(num_new, dtype=torch.bool, device=device),
+ ]
+ )
+ gpu_meta["last_occluded_tensor"] = torch.cat(
+ [
+ gpu_meta.get(
+ "last_occluded_tensor",
+ torch.empty(0, dtype=torch.long, device=device),
+ ),
+ torch.full(
+ (num_new,),
+ NEVER_OCCLUDED,
+ dtype=torch.long,
+ device=device,
+ ),
+ ]
+ )
+
+ # Grow overlap matrix
+ old_N = gpu_meta.get("N_obj", 0)
+ new_N = old_N + num_new
+ old_overlap = gpu_meta.get(
+ "overlap_pair_counts",
+ torch.zeros((0, 0), dtype=torch.long, device=device),
+ )
+ new_overlap = torch.zeros(
+ (new_N, new_N), dtype=torch.long, device=device
+ )
+ if old_N > 0:
+ new_overlap[:old_N, :old_N] = old_overlap
+ gpu_meta["overlap_pair_counts"] = new_overlap
+
+ gpu_meta["N_obj"] = new_N
+
+ sam2_update_plan = {
+ "new_det_fa_inds": adt_result.new_det_fa_inds, # np.ndarray
+ "new_det_obj_ids": new_det_obj_ids, # np.ndarray
+ "new_det_gpu_ids": new_det_gpu_ids, # np.ndarray
+ "unmatched_trk_obj_ids": adt_result.unmatched_trk_obj_ids, # np.ndarray
+ "det_to_matched_trk_obj_ids": adt_result.det_to_matched_trk_obj_ids, # dict
+ "obj_ids_newly_removed": obj_ids_newly_removed, # set
+ "num_obj_dropped_due_to_limit": num_obj_dropped_due_to_limit, # int
+ "trk_id_to_max_iou_high_conf_det": adt_result.trk_id_to_max_iou_high_conf_det, # dict
+ "reconditioned_obj_ids": reconditioned_obj_ids, # set
+ }
+ return sam2_update_plan, tracker_metadata_new
+
+ def _suppress_overlapping_based_on_recent_occlusion(
+ self,
+ frame_idx: int,
+ tracker_low_res_masks_global: Tensor,
+ tracker_metadata_prev: Dict[str, Any],
+ tracker_metadata_new: Dict[str, Any],
+ to_remove_mask: Tensor, # GPU boolean mask (N_obj,) instead of CPU set
+ reverse: bool = False,
+ ):
+ """
+ Suppress overlapping masks based on the most recent occlusion information. If an object is removed by hotstart, we always suppress it if it overlaps with any other object.
+ Args:
+ frame_idx (int): The current frame index.
+ tracker_low_res_masks_global (Tensor): The low-resolution masks for the current frame.
+ tracker_metadata_prev (Dict[str, Any]): The metadata from the previous frame.
+ tracker_metadata_new (Dict[str, Any]): The metadata for the current frame (with updated gpu_metadata from _process_hotstart_gpu).
+ to_remove_mask (Tensor): GPU boolean mask (N_obj,) indicating which objects are removed.
+ Return:
+ Tensor: The updated low-resolution masks with some objects suppressed.
+ """
+ # NOTE: obj_ids_global is only used for debug logging, so we can use prev (it won't match perfectly but close enough for debugging)
+ # The actual suppression logic uses GPU tensors which ARE in the correct index space from tracker_metadata_new
+ obj_ids_global = tracker_metadata_prev["obj_ids_all_gpu"]
+ binary_tracker_low_res_masks_global = tracker_low_res_masks_global > 0
+ batch_size = tracker_low_res_masks_global.size(0)
+ num_ids = len(obj_ids_global)
+
+ # immediately to force proper debugging. (Aligned with merge decision 4.5.2)
+ assert batch_size == num_ids, (
+ f"Mask/metadata count mismatch in _suppress_overlapping: "
+ f"batch_size={batch_size}, num_ids={num_ids}, frame_idx={frame_idx}"
+ )
+
+ binary_tracker_low_res_masks_global = tracker_low_res_masks_global > 0
+ if batch_size > 0:
+ assert len(obj_ids_global) == batch_size, (
+ f"Mismatch in number of objects: {len(obj_ids_global)} vs {batch_size}"
+ )
+ NEVER_OCCLUDED = -1
+ ALWAYS_OCCLUDED = 100000 # This value should be larger than any possible frame index, indicates that the object was removed by hotstart logic
+
+ # GPU-vectorized: Build last_occluded_prev tensor without iteration/syncs
+ device = binary_tracker_low_res_masks_global.device
+
+ # Get last_occluded from UPDATED gpu_metadata (already in correct index space from _process_hotstart_gpu)
+ gpu_metadata_new = tracker_metadata_new["gpu_metadata"]
+ last_occluded_prev = gpu_metadata_new["last_occluded_tensor"]
+
+ # Sanity check: ensure last_occluded_tensor is in sync with batch_size
+ assert last_occluded_prev.size(0) == batch_size, (
+ f"last_occluded_tensor size mismatch: {last_occluded_prev.size(0)} vs {batch_size}. "
+ f"This indicates gpu_metadata tensors are out of sync."
+ )
+
+ # Set ALWAYS_OCCLUDED for removed objects (fully vectorized, no sync!)
+ last_occluded_prev = torch.where(
+ to_remove_mask,
+ torch.full_like(last_occluded_prev, ALWAYS_OCCLUDED),
+ last_occluded_prev,
+ )
+
+ to_suppress = self._get_objects_to_suppress_based_on_most_recently_occluded(
+ binary_tracker_low_res_masks_global,
+ last_occluded_prev,
+ obj_ids_global,
+ frame_idx,
+ reverse,
+ )
+
+ # Update metadata with occlusion information (fully vectorized)
+ is_obj_occluded = ~(binary_tracker_low_res_masks_global.any(dim=(-1, -2)))
+ is_obj_occluded_or_suppressed = is_obj_occluded | to_suppress
+ last_occluded_new = last_occluded_prev.clone()
+ last_occluded_new[is_obj_occluded_or_suppressed] = frame_idx
+
+ # Store in gpu_metadata to keep it aligned with other metadata tensors
+ tracker_metadata_new["gpu_metadata"]["last_occluded_tensor"] = (
+ last_occluded_new
+ )
+
+ # Also maintain legacy dict format for backwards compatibility
+ # This conversion happens on CPU AFTER memory encoding, not in critical path
+ tracker_metadata_new[
+ "obj_id_to_last_occluded"
+ ] = {} # Will be populated later if needed
+
+ # Zero out suppressed masks before memory encoding
+ NO_OBJ_LOGIT = -10
+ tracker_low_res_masks_global[to_suppress] = NO_OBJ_LOGIT
+
+ return tracker_low_res_masks_global
+
+ def _create_planning_metadata(self, tracker_metadata_prev):
+ """Extend planning metadata with multiplex-specific fields."""
+ metadata = super()._create_planning_metadata(tracker_metadata_prev)
+ if self.is_multiplex:
+ metadata["num_buc_per_gpu"] = self._deepcopy(
+ tracker_metadata_prev["num_buc_per_gpu"]
+ )
+ metadata["gpu_metadata"] = tracker_metadata_prev["gpu_metadata"]
+ return metadata
+
+ def _post_execution_phase_hook(self, tracker_states_local, tracker_metadata_new):
+ """Update bucket count after execution phase (multiplex-specific)."""
+ if self.is_multiplex and tracker_metadata_new is not None:
+ actual_bucket_count = self._count_buckets_in_states(tracker_states_local)
+ tracker_metadata_new["num_buc_per_gpu"][self.rank] = actual_bucket_count
+
+ def _count_buckets_in_states(self, tracker_states_local: List[Any]) -> int:
+ """Count the total number of buckets across all states."""
+ if not self.is_multiplex:
+ return 0
+ total_buckets = 0
+ for state in tracker_states_local:
+ if "multiplex_state" in state:
+ total_buckets += state["multiplex_state"].num_buckets
+ return total_buckets
+
+ def build_outputs(
+ self,
+ frame_idx: int,
+ num_frames: int,
+ reverse: bool,
+ det_out: Dict[
+ str, Tensor
+ ], # TODO: Only det_out["mask"][new_det_fa_inds_local_t] is needed
+ tracker_low_res_masks_global: Tensor,
+ tracker_obj_scores_global: Tensor,
+ tracker_metadata_prev: Dict[str, np.ndarray],
+ sam2_update_plan: Dict[str, np.ndarray],
+ orig_vid_height: int,
+ orig_vid_width: int,
+ reconditioned_obj_ids: set = None,
+ det_to_matched_trk_obj_ids: dict = None,
+ ):
+ new_det_fa_inds: np.ndarray = sam2_update_plan["new_det_fa_inds"]
+ new_det_obj_ids: np.ndarray = sam2_update_plan["new_det_obj_ids"]
+ obj_id_to_mask = {} # obj_id --> output mask tensor
+
+ # Part 1: masks from previous SAM2 propagation
+ # Align IDs and masks from previous SAM2 propagation
+ existing_masklet_obj_ids_all = tracker_metadata_prev["obj_ids_all_gpu"]
+ existing_masklet_obj_ids_per_gpu = np.concatenate(
+ tracker_metadata_prev["obj_ids_per_gpu"]
+ )
+ use_per_gpu_ids = len(existing_masklet_obj_ids_per_gpu) != len(
+ existing_masklet_obj_ids_all
+ ) or not np.array_equal(
+ existing_masklet_obj_ids_per_gpu, existing_masklet_obj_ids_all
+ )
+ existing_masklet_obj_ids = (
+ existing_masklet_obj_ids_per_gpu
+ if use_per_gpu_ids
+ else existing_masklet_obj_ids_all
+ )
+ existing_masklet_video_res_masks = F.interpolate(
+ tracker_low_res_masks_global.unsqueeze(1),
+ size=(orig_vid_height, orig_vid_width),
+ mode="bilinear",
+ align_corners=False,
+ ) # (num_obj, 1, H_video, W_video)
+ # Pad/truncate masks to match metadata count
+ num_masks = existing_masklet_video_res_masks.size(0)
+ num_ids = len(existing_masklet_obj_ids)
+ if num_masks != num_ids:
+ if num_masks < num_ids:
+ pad = existing_masklet_video_res_masks.new_zeros(
+ (num_ids - num_masks, 1, orig_vid_height, orig_vid_width)
+ )
+ existing_masklet_video_res_masks = torch.cat(
+ [existing_masklet_video_res_masks, pad], dim=0
+ )
+ else:
+ existing_masklet_video_res_masks = existing_masklet_video_res_masks[
+ :num_ids
+ ]
+ existing_masklet_binary = existing_masklet_video_res_masks > 0
+ for obj_id, mask in zip(existing_masklet_obj_ids, existing_masklet_binary):
+ obj_id_to_mask[obj_id] = mask # (1, H_video, W_video)
+
+ # Part 2: masks from new detections
+ new_det_fa_inds_t = torch.from_numpy(new_det_fa_inds)
+ new_det_low_res_masks = det_out["mask"][new_det_fa_inds_t].unsqueeze(1)
+ new_det_low_res_masks = fill_holes_in_mask_scores(
+ new_det_low_res_masks,
+ fill_hole_area=self.fill_hole_area,
+ sprinkle_removal_area=self.sprinkle_removal_area,
+ fill_holes=True,
+ remove_sprinkles=True,
+ )
+ new_masklet_video_res_masks = F.interpolate(
+ new_det_low_res_masks,
+ size=(orig_vid_height, orig_vid_width),
+ mode="bilinear",
+ align_corners=False,
+ ) # (num_obj, 1, H_video, W_video)
+
+ new_masklet_binary = new_masklet_video_res_masks > 0
+ assert len(new_det_obj_ids) == len(new_masklet_video_res_masks)
+ for obj_id, mask in zip(new_det_obj_ids, new_masklet_binary):
+ obj_id_to_mask[obj_id] = mask # (1, H_video, W_video)
+
+ return obj_id_to_mask
+
+ def _get_objects_to_suppress_based_on_most_recently_occluded(
+ self,
+ binary_low_res_masks: Tensor,
+ last_occluded: Tensor, # GPU tensor (N_obj,) with frame indices
+ obj_ids: np.ndarray, # numpy array of object IDs
+ frame_idx: int = None,
+ reverse: bool = False,
+ ):
+ # Suppress overlapping masks for objects that were most recently occluded
+ assert binary_low_res_masks.dtype == torch.bool, (
+ f"Expected boolean tensor, got {binary_low_res_masks.dtype}"
+ )
+ to_suppress = torch.zeros(
+ binary_low_res_masks.size(0),
+ device=binary_low_res_masks.device,
+ dtype=torch.bool,
+ )
+ if len(obj_ids) <= 1:
+ return to_suppress
+
+ iou = mask_iou(binary_low_res_masks, binary_low_res_masks) # [N,N]
+
+ # Create masks for upper triangular matrix (i < j) and IoU threshold
+ mask_iou_thresh = (
+ iou >= self.suppress_overlapping_based_on_recent_occlusion_threshold
+ )
+ overlapping_pairs = torch.triu(mask_iou_thresh, diagonal=1) # [N,N]
+
+ last_occ_expanded_i = last_occluded.unsqueeze(1) # (N, 1)
+ last_occ_expanded_j = last_occluded.unsqueeze(0) # (1, N)
+ cmp_op = torch.gt if not reverse else torch.lt
+
+ if self.allow_unoccluded_to_suppress:
+ # Suppress most recently occluded
+ suppress_i_mask = overlapping_pairs & cmp_op(
+ last_occ_expanded_i, last_occ_expanded_j
+ )
+
+ suppress_j_mask = overlapping_pairs & cmp_op(
+ last_occ_expanded_j, last_occ_expanded_i
+ )
+ else:
+ # Suppress most recently occluded
+ suppress_i_mask = (
+ overlapping_pairs
+ & cmp_op(
+ last_occ_expanded_i, last_occ_expanded_j
+ ) # (last_occ_expanded_i > last_occ_expanded_j)
+ & (last_occ_expanded_j > -1)
+ # j can suppress i only if j was previously occluded
+ )
+
+ suppress_j_mask = (
+ overlapping_pairs
+ & cmp_op(last_occ_expanded_j, last_occ_expanded_i)
+ & (
+ last_occ_expanded_i > -1
+ ) # i can suppress j only if i was previously occluded
+ )
+
+ # Apply suppression
+ to_suppress = suppress_i_mask.any(dim=1) | suppress_j_mask.any(dim=0)
+
+ # Log for debugging
+ if (
+ self.rank == 0
+ and logger.isEnabledFor(logging.DEBUG)
+ and frame_idx is not None
+ ):
+ suppress_i_mask = suppress_i_mask.cpu().numpy()
+ suppress_j_mask = suppress_j_mask.cpu().numpy()
+ last_occluded = last_occluded.cpu().numpy()
+
+ # Find all suppression pairs without using torch.where
+ batch_size = suppress_i_mask.shape[0]
+
+ # Log i-suppression cases (where i gets suppressed in favor of j)
+ for i in range(batch_size):
+ for j in range(batch_size):
+ if suppress_i_mask[i, j]:
+ logger.debug(
+ f"{frame_idx=}: Suppressing obj {obj_ids[i]} last occluded {last_occluded[i]} in favor of {obj_ids[j]} last occluded {last_occluded[j]}"
+ )
+
+ # Log j-suppression cases (where j gets suppressed in favor of i)
+ for i in range(batch_size):
+ for j in range(batch_size):
+ if suppress_j_mask[i, j]:
+ logger.debug(
+ f"{frame_idx=}: Suppressing obj {obj_ids[j]} last occluded {last_occluded[j]} in favor of {obj_ids[i]} last occluded {last_occluded[i]}"
+ )
+
+ return to_suppress
+
+ def _propogate_tracker_one_frame_local_gpu(
+ self,
+ inference_states: List[Any],
+ frame_idx: int,
+ reverse: bool,
+ # by default, we disable memory encoding until we gather all outputs
+ run_mem_encoder: bool = False,
+ # When specified, only return masks/scores for these object ids
+ filter_obj_ids: Optional[List[int]] = None,
+ ):
+ """
+ inference_states: List of inference states, each state corresponds to a different set of objects.
+ """
+ obj_ids_local = []
+ low_res_masks_list = []
+ obj_scores_list = []
+ for inference_state in inference_states:
+ if len(inference_state["obj_ids"]) == 0:
+ continue # skip propagation on empty inference states
+
+ # propagate one frame
+ num_frames_propagated = 0
+ with torch.profiler.record_function("sam2_predictor.propagate_in_video"):
+ for out in self.tracker.propagate_in_video(
+ inference_state,
+ start_frame_idx=frame_idx,
+ # end_frame_idx = start_frame_idx + max_frame_num_to_track
+ # (i.e. propagating 1 frame since end_frame_idx is inclusive)
+ max_frame_num_to_track=0,
+ reverse=reverse,
+ tqdm_disable=True,
+ run_mem_encoder=run_mem_encoder,
+ ):
+ # TODO we only need low-res outputs here for all-gather across GPUs,
+ # so we can remove the high-res interpolation in `propagate_in_video`
+ out_frame_idx, out_obj_ids, out_low_res_masks, _, out_obj_scores = (
+ out
+ )
+ num_frames_propagated += 1
+
+ # only 1 frames should be propagated
+ assert num_frames_propagated == 1 and out_frame_idx == frame_idx, (
+ f"num_frames_propagated: {num_frames_propagated}, out_frame_idx: {out_frame_idx}, frame_idx: {frame_idx}"
+ )
+ assert isinstance(out_obj_ids, list)
+ # Optionally filter to a subset of object ids (for partial propagation).
+ # We also clamp indices to available rows to avoid CUDA index_select assertions.
+ if filter_obj_ids is not None:
+ if len(out_obj_ids) > 0:
+ max_mask_rows = out_low_res_masks.shape[0]
+ max_score_rows = out_obj_scores.shape[0]
+ # Special case: common single-object refinement path where SAM2 returns a single mask row
+ # but a longer out_obj_ids list for the state. Treat the lone row as the requested object.
+ if (
+ len(filter_obj_ids) == 1
+ and max_mask_rows == 1
+ and max_score_rows == 1
+ ):
+ out_obj_ids = [filter_obj_ids[0]]
+ keep_indices = [0]
+ else:
+ keep_indices = [
+ i
+ for i, oid in enumerate(out_obj_ids)
+ if oid in filter_obj_ids
+ and i < max_mask_rows
+ and i < max_score_rows
+ ]
+ else:
+ keep_indices = []
+ if len(keep_indices) > 0:
+ idx_tensor = torch.as_tensor(
+ keep_indices, device=out_low_res_masks.device, dtype=torch.long
+ )
+ out_low_res_masks = out_low_res_masks.index_select(
+ dim=0, index=idx_tensor
+ )
+ out_obj_scores = out_obj_scores.index_select(
+ dim=0, index=idx_tensor
+ )
+ out_obj_ids = [out_obj_ids[i] for i in keep_indices]
+ else:
+ # no selected objects in this local state; skip appending
+ out_obj_ids = []
+
+ if len(out_obj_ids) > 0:
+ obj_ids_local.extend(out_obj_ids)
+ low_res_masks_list.append(out_low_res_masks.squeeze(1))
+ obj_scores_list.append(out_obj_scores.squeeze(1))
+
+ # concatenate the output masklets from all local inference states
+
+ with torch.profiler.record_function(
+ "sam2_predictor.propagate_in_video.fill_holes"
+ ):
+ H_mask = W_mask = self.tracker.low_res_mask_size
+ if len(low_res_masks_list) > 0:
+ low_res_masks_local = torch.cat(low_res_masks_list, dim=0)
+ obj_scores_local = torch.cat(obj_scores_list, dim=0)
+ assert low_res_masks_local.shape[1:] == (H_mask, W_mask)
+
+ # Apply hole filling to the masks
+ low_res_masks_local = fill_holes_in_mask_scores(
+ low_res_masks_local.unsqueeze(1),
+ fill_hole_area=self.fill_hole_area,
+ sprinkle_removal_area=self.sprinkle_removal_area,
+ fill_holes=True,
+ remove_sprinkles=True,
+ )
+ low_res_masks_local = low_res_masks_local.squeeze(1)
+ else:
+ low_res_masks_local = torch.zeros(0, H_mask, W_mask, device=self.device)
+ obj_scores_local = torch.zeros(0, device=self.device)
+
+ if self.is_multiplex and self.tracker.is_multiplex_dynamic:
+ # obj_ids_local might not be sorted, which is problematic because
+ # the rest of the code assumes that they are.
+ # Currently this only happens in the dynamic multiplex setting (since we backfill states)
+ # so we only check for this condition here, but this should be generally applicable.
+ # Note that a similar remapping is necessary when we update the memory, e.g.,
+ # in _tracker_update_memories
+ if obj_ids_local != sorted(obj_ids_local):
+ # Get sorting permutation
+ sort_indices = sorted(
+ range(len(obj_ids_local)), key=lambda i: obj_ids_local[i]
+ )
+ # Apply permutation to reorder everything
+ obj_ids_local = [obj_ids_local[i] for i in sort_indices]
+ low_res_masks_local = low_res_masks_local[sort_indices]
+ obj_scores_local = obj_scores_local[sort_indices]
+
+ if self.is_multiplex and self.tracker.is_multiplex_dynamic:
+ # obj_ids_local might not be sorted, which is problematic because
+ # the rest of the code assumes that they are.
+ # Currently this only happens in the dynamic multiplex setting (since we backfill states)
+ # so we only check for this condition here, but this should be generally applicable.
+ # Note that a similar remapping is necessary when we update the memory, e.g.,
+ # in _tracker_update_memories
+ if obj_ids_local != sorted(obj_ids_local):
+ # Get sorting permutation
+ sort_indices = sorted(
+ range(len(obj_ids_local)), key=lambda i: obj_ids_local[i]
+ )
+ # Apply permutation to reorder everything
+ obj_ids_local = [obj_ids_local[i] for i in sort_indices]
+ if low_res_masks_local.shape[0] == len(sort_indices):
+ low_res_masks_local = low_res_masks_local[sort_indices]
+ obj_scores_local = obj_scores_local[sort_indices]
+
+ return obj_ids_local, low_res_masks_local, obj_scores_local
+
+ def _associate_det_trk(
+ self,
+ det_masks: Tensor,
+ det_scores: Tensor,
+ det_keep: Tensor,
+ trk_masks: Tensor,
+ trk_obj_ids: np.ndarray,
+ default_det_thresh: Optional[float] = None,
+ ):
+ """
+ Match detections on the current frame with the existing masklets.
+
+ Args:
+ - det_masks: (N, H, W) tensor of predicted masks
+ - det_scores: (N,) array of detection scores
+ - trk_masks: (M, H, W) tensor of track masks
+ - trk_obj_ids: (M,) array of object IDs corresponding to trk_masks
+
+ Returns:
+ - new_det_fa_inds: array of new object indices among in FA detection outputs
+ - unmatched_trk_obj_ids: array of existing masklet object IDs that are not matched
+ to any detections on this frame (for unmatched, we only count masklets with >0 area)
+ - det_to_matched_trk_obj_ids: dict[int, np.ndarray]: mapping from FA detection indices
+ to the list of matched tracklet object IDs
+ - empty_trk_obj_ids: array of existing masklet object IDs with zero area in SAM2 prediction
+ """
+ HIGH_CONF_THRESH = 0.8
+
+ iou_threshold = self.assoc_iou_thresh
+ iou_threshold_trk = self.trk_assoc_iou_thresh
+ new_det_thresh = (
+ self.new_det_thresh if default_det_thresh is None else default_det_thresh
+ )
+
+ assert det_masks.is_floating_point(), "float tensor expected (do not binarize)"
+ assert trk_masks.is_floating_point(), "float tensor expected (do not binarize)"
+ assert trk_masks.size(0) == len(trk_obj_ids), (
+ f"trk_masks and trk_obj_ids should have the same length, {trk_masks.size(0)} vs {len(trk_obj_ids)}"
+ )
+ if trk_masks.size(0) == 0:
+ with torch.profiler.record_function("No tracklets"):
+ num_trk = 0
+ is_new_det = det_scores >= new_det_thresh
+ trk_is_unmatched = torch.zeros(
+ num_trk, dtype=torch.bool, device=det_scores.device
+ )
+ trk_is_nonempty = torch.zeros(
+ num_trk, dtype=torch.bool, device=det_scores.device
+ )
+ num_det = det_scores.shape[0]
+ det_to_max_iou_trk_idx = torch.full(
+ (num_det,), -1, dtype=torch.long, device=det_scores.device
+ )
+ det_is_high_conf = det_scores >= HIGH_CONF_THRESH
+ det_is_high_iou = torch.zeros(
+ num_det, dtype=torch.bool, device=det_scores.device
+ )
+ im_mask = torch.zeros(
+ num_det, num_trk, dtype=torch.bool, device=det_scores.device
+ )
+ return LazyAssociateDetTrkResult(
+ trk_is_unmatched,
+ trk_is_nonempty,
+ is_new_det,
+ det_to_max_iou_trk_idx,
+ det_is_high_conf,
+ det_is_high_iou,
+ det_keep,
+ im_mask,
+ )
+ elif det_masks.size(0) == 0:
+ with torch.profiler.record_function("No detections"):
+ assert det_keep.size(0) == 0 # Make sure the keep mask agrees
+ trk_is_nonempty = (trk_masks > 0).any(dim=(1, 2))
+ num_det = 0
+ num_trk = trk_masks.shape[0]
+ trk_is_unmatched = torch.ones(
+ num_trk, dtype=torch.bool, device=trk_masks.device
+ )
+ trk_is_nonempty_tensor = trk_is_nonempty.to(trk_masks.device)
+ is_new_det = torch.zeros(
+ num_det, dtype=torch.bool, device=trk_masks.device
+ )
+ det_to_max_iou_trk_idx = torch.full(
+ (num_det,), -1, dtype=torch.long, device=trk_masks.device
+ )
+ det_is_high_conf = torch.zeros(
+ num_det, dtype=torch.bool, device=trk_masks.device
+ )
+ det_is_high_iou = torch.zeros(
+ num_det, dtype=torch.bool, device=trk_masks.device
+ )
+ im_mask = torch.zeros(
+ num_det, num_trk, dtype=torch.bool, device=trk_masks.device
+ )
+ return LazyAssociateDetTrkResult(
+ trk_is_unmatched,
+ trk_is_nonempty_tensor,
+ is_new_det,
+ det_to_max_iou_trk_idx,
+ det_is_high_conf,
+ det_is_high_iou,
+ det_keep,
+ im_mask,
+ )
+
+ if det_masks.shape[-2:] != trk_masks.shape[-2:]:
+ # resize to the smaller size to save GPU memory
+ if np.prod(det_masks.shape[-2:]) < np.prod(trk_masks.shape[-2:]):
+ trk_masks = F.interpolate(
+ trk_masks.unsqueeze(1),
+ size=det_masks.shape[-2:],
+ mode="bilinear",
+ align_corners=False,
+ ).squeeze(1)
+ else:
+ # resize detections to track size
+ det_masks = F.interpolate(
+ det_masks.unsqueeze(1),
+ size=trk_masks.shape[-2:],
+ mode="bilinear",
+ align_corners=False,
+ ).squeeze(1)
+
+ with torch.profiler.record_function("associate_det_trk_compilable"):
+ if trk_masks.shape[0] < self.max_num_objects:
+ padding_size = self.max_num_objects - trk_masks.shape[0]
+ trk_masks_padded = torch.cat(
+ [
+ trk_masks,
+ torch.zeros(
+ padding_size,
+ *trk_masks.shape[1:],
+ device=trk_masks.device,
+ dtype=trk_masks.dtype,
+ ),
+ ],
+ dim=0,
+ )
+ else:
+ trk_masks_padded = trk_masks
+ result = _associate_det_trk_compilable(
+ det_masks,
+ det_scores,
+ det_keep,
+ trk_masks_padded,
+ new_det_thresh,
+ iou_threshold_trk,
+ iou_threshold,
+ HIGH_CONF_THRESH,
+ self.use_iom_recondition,
+ self.o2o_matching_masklets_enable,
+ self.iom_thresh_recondition,
+ self.iou_thresh_recondition,
+ )
+ (
+ trk_is_unmatched,
+ trk_is_nonempty,
+ is_new_det,
+ det_to_max_iou_trk_idx,
+ det_is_high_conf,
+ det_is_high_iou,
+ det_keep,
+ im_mask,
+ ) = result
+ trk_is_unmatched = trk_is_unmatched[: trk_masks.shape[0]]
+ trk_is_nonempty = trk_is_nonempty[: trk_masks.shape[0]]
+ im_mask = im_mask[:, : trk_masks.shape[0]]
+
+ return LazyAssociateDetTrkResult(
+ trk_is_unmatched,
+ trk_is_nonempty,
+ is_new_det,
+ det_to_max_iou_trk_idx,
+ det_is_high_conf,
+ det_is_high_iou,
+ det_keep,
+ im_mask,
+ )
+
+ def _assign_new_det_to_gpus(self, new_det_num, prev_workload_per_gpu):
+ """Distribute the new objects to the GPUs with the least workload."""
+ workload_per_gpu: np.ndarray = prev_workload_per_gpu.copy()
+ new_det_gpu_ids = np.zeros(new_det_num, np.int64)
+
+ if self.is_multiplex:
+ # assign the objects in a batch of multiplex_count
+ for i in range(0, new_det_num, self.bucket_capacity):
+ # find the GPU with the least workload
+ min_gpu = np.argmin(workload_per_gpu)
+ new_det_gpu_ids[i : i + self.bucket_capacity] = min_gpu
+ workload_per_gpu[min_gpu] += 1
+ else:
+ # assign the objects one by one
+ for i in range(len(new_det_gpu_ids)):
+ # find the GPU with the least workload
+ min_gpu = np.argmin(workload_per_gpu)
+ new_det_gpu_ids[i] = min_gpu
+ workload_per_gpu[min_gpu] += 1
+ return new_det_gpu_ids
+
+ def _process_hotstart_gpu(
+ self,
+ frame_idx: int,
+ reverse: bool,
+ adt_result, # LazyAssociateDetTrkResult (always lazy now)
+ tracker_metadata_prev: Dict[str, Any],
+ gpu_metadata_prev: Dict[str, Tensor],
+ ) -> Tuple[Tensor, Tensor, Dict[str, Tensor]]:
+ """
+ Compute removal/suppression masks entirely on GPU without ANY syncs or branches.
+
+ Uses position-indexed metadata (indexed 0 to N_obj-1) instead of obj_id-indexed
+ to avoid needing obj_ids as GPU tensor.
+
+ Returns:
+ to_remove: boolean tensor (N_obj,) - objects to remove this frame
+ to_suppress: boolean tensor (N_obj,) - objec ts to suppress (overlap suppression)
+ gpu_metadata_new: updated GPU metadata for next frame
+ """
+ # Handle edge case: if adt_result is already realized (no tracks exist),
+ # return empty masks since there's nothing to remove
+ if isinstance(adt_result, RealizedAssociateDetTrkresult):
+ # No tracks exist, so no objects to remove/suppress
+ empty_mask = torch.zeros(0, dtype=torch.bool, device=self.device)
+ return empty_mask, empty_mask, {"N_obj": 0}
+
+ device = adt_result.trk_is_unmatched.device
+ N_obj = adt_result.trk_is_unmatched.size(0) # Number of current objects
+
+ # ============================================================================
+ # STEP 1: Initialize/extract position-indexed GPU metadata
+ # ============================================================================
+
+ # All metadata tensors are indexed by POSITION (0 to N_obj-1), not by obj_id
+ # This grows/shrinks each frame as objects are added/removed
+
+ # Get previous frame's metadata (sized for previous N_obj)
+ # NOTE: Metadata is already compacted from previous frame (removed objects are already filtered out)
+ prev_N_obj = gpu_metadata_prev.get("N_obj", 0)
+
+ if prev_N_obj > 0:
+ # Metadata from previous frame (position-indexed, already compacted)
+ obj_first_frame_prev = gpu_metadata_prev["obj_first_frame"] # (prev_N_obj,)
+ consecutive_unmatch_count_prev = gpu_metadata_prev[
+ "consecutive_unmatch_count"
+ ] # (prev_N_obj,)
+ trk_keep_alive_prev = gpu_metadata_prev["trk_keep_alive"] # (prev_N_obj,)
+ removed_mask_prev = gpu_metadata_prev[
+ "removed_mask"
+ ] # (prev_N_obj,) - should be all False after compaction
+ overlap_pair_counts_prev = gpu_metadata_prev[
+ "overlap_pair_counts"
+ ] # (prev_N_obj, prev_N_obj)
+ last_occluded_prev = gpu_metadata_prev[
+ "last_occluded_tensor"
+ ] # (prev_N_obj,)
+ else:
+ # First frame - no previous metadata
+ obj_first_frame_prev = None
+ consecutive_unmatch_count_prev = None
+ trk_keep_alive_prev = None
+ removed_mask_prev = None
+ overlap_pair_counts_prev = None
+ last_occluded_prev = None
+
+ # ============================================================================
+ # STEP 2: Carry forward metadata from previous frame
+ # ============================================================================
+
+ # Current frame has N_obj objects (from propagation)
+ # New objects are added via extend_gpu_metadata_for_new_objects AFTER compaction,
+ # so prev_N_obj should already include objects detected on previous frame.
+ # N_obj should equal prev_N_obj (no new objects mid-planning-phase).
+ assert N_obj == prev_N_obj, (
+ f"N_obj ({N_obj}) should equal prev_N_obj ({prev_N_obj}); new objects handled after compaction"
+ )
+
+ # Carry forward existing metadata (or initialize if first frame)
+ NEVER_OCCLUDED = -1
+ obj_first_frame = (
+ obj_first_frame_prev
+ if obj_first_frame_prev is not None
+ else torch.full((N_obj,), frame_idx, dtype=torch.long, device=device)
+ )
+ consecutive_unmatch_count = (
+ consecutive_unmatch_count_prev
+ if consecutive_unmatch_count_prev is not None
+ else torch.zeros(N_obj, dtype=torch.long, device=device)
+ )
+ trk_keep_alive = (
+ trk_keep_alive_prev
+ if trk_keep_alive_prev is not None
+ else torch.zeros(N_obj, dtype=torch.long, device=device)
+ )
+ removed_mask = (
+ removed_mask_prev
+ if removed_mask_prev is not None
+ else torch.zeros(N_obj, dtype=torch.bool, device=device)
+ )
+ overlap_pair_counts = (
+ overlap_pair_counts_prev
+ if overlap_pair_counts_prev is not None
+ else torch.zeros((N_obj, N_obj), dtype=torch.long, device=device)
+ )
+ last_occluded = (
+ last_occluded_prev
+ if last_occluded_prev is not None
+ else torch.full((N_obj,), NEVER_OCCLUDED, dtype=torch.long, device=device)
+ )
+
+ # ============================================================================
+ # STEP 3: Update keep-alive counters (fully vectorized)
+ # ============================================================================
+
+ # Determine which tracks are matched by ANY detection
+ trk_is_matched = adt_result.im_mask.any(dim=0) # (N_obj,)
+
+ # Update: +1 for matched, -1 for unmatched, clamp to [min, max]
+ trk_keep_alive = torch.where(
+ trk_is_matched, trk_keep_alive + 1, trk_keep_alive - 1
+ )
+ trk_keep_alive = torch.clamp(
+ trk_keep_alive, min=self.min_trk_keep_alive, max=self.max_trk_keep_alive
+ )
+
+ # Also decrement for empty tracklets (if configured)
+ if self.decrease_trk_keep_alive_for_empty_masklets:
+ trk_keep_alive = torch.where(
+ ~adt_result.trk_is_nonempty,
+ torch.clamp(trk_keep_alive - 1, min=self.min_trk_keep_alive),
+ trk_keep_alive,
+ )
+
+ # ============================================================================
+ # STEP 4: Update total unmatch counters (fully vectorized)
+ # ============================================================================
+
+ # Increment for unmatched, but DON'T reset for matched
+ # Original logic accumulates total unmatched frames, not consecutive
+ consecutive_unmatch_count = torch.where(
+ adt_result.trk_is_unmatched,
+ consecutive_unmatch_count + 1,
+ consecutive_unmatch_count, # Keep previous value, don't reset
+ )
+
+ # ============================================================================
+ # STEP 5: Update pairwise overlap tracking (fully vectorized)
+ # ============================================================================
+
+ # Find detections matched by multiple tracks
+ tracks_per_det = adt_result.im_mask.sum(dim=1) # (N_det,)
+ multi_match_mask = tracks_per_det > 1 # (N_det,)
+
+ # Build overlap increment matrix using einsum
+ multi_match_tracks = adt_result.im_mask & multi_match_mask.unsqueeze(
+ 1
+ ) # (N_det, N_obj)
+
+ # Compute pairwise overlaps: for each detection, outer product of matched tracks
+ pairwise_overlap_this_frame = torch.einsum(
+ "di,dj->dij", multi_match_tracks.float(), multi_match_tracks.float()
+ ) # (N_det, N_obj, N_obj)
+
+ # Sum across detections
+ overlap_increment = pairwise_overlap_this_frame.sum(dim=0) # (N_obj, N_obj)
+ overlap_increment.fill_diagonal_(0) # No self-overlap
+ overlap_increment = torch.triu(
+ overlap_increment, diagonal=1
+ ) # Upper triangle only
+
+ # Add this frame's increments (accumulate across frames, don't reset)
+ # Original logic: overlap_pair_to_frame_inds[key].append(frame_idx) - never clears
+ overlap_pair_counts = overlap_pair_counts + overlap_increment.long()
+
+ # ============================================================================
+ # STEP 6: Compute removal decisions - UNMATCH criterion (fully vectorized)
+ # ============================================================================
+
+ # Hotstart boundary
+ hotstart_diff = (
+ frame_idx - self.hotstart_delay
+ if not reverse
+ else frame_idx + self.hotstart_delay
+ )
+
+ # Check if objects are within hotstart window
+ is_within_hotstart = (
+ (obj_first_frame > hotstart_diff)
+ if not reverse
+ else (obj_first_frame < hotstart_diff)
+ )
+
+ # Remove if: within hotstart AND unmatched >= threshold AND not already removed
+ remove_by_unmatch = (
+ is_within_hotstart
+ & (consecutive_unmatch_count >= self.hotstart_unmatch_thresh)
+ & ~removed_mask
+ )
+
+ # Suppress if: keep_alive <= 0 AND not hotstart-only mode AND not removed
+ suppress_by_unmatch = (
+ (trk_keep_alive <= 0)
+ & torch.tensor(not self.suppress_unmatched_only_within_hotstart)
+ .pin_memory()
+ .to(device=device, non_blocking=True)
+ & ~removed_mask
+ & ~remove_by_unmatch
+ )
+
+ # ============================================================================
+ # STEP 7: Compute removal decisions - OVERLAP criterion (fully vectorized)
+ # ============================================================================
+
+ # For each object, find max overlap count with any EARLIER object
+ # "Earlier" = appeared in an earlier frame
+
+ # Build matrix: is_earlier[i, j] = True if object i appeared before object j
+ first_frames_i = obj_first_frame.unsqueeze(1) # (N_obj, 1)
+ first_frames_j = obj_first_frame.unsqueeze(0) # (1, N_obj)
+
+ if not reverse:
+ is_earlier_matrix = first_frames_i < first_frames_j # (N_obj, N_obj)
+ else:
+ is_earlier_matrix = first_frames_i > first_frames_j # (N_obj, N_obj)
+
+ # ============================================================================
+ # STEP 8: Combine removal/suppression decisions
+ # ============================================================================
+
+ # Mask overlap counts to only consider earlier objects
+ if N_obj == 0:
+ to_remove = remove_by_unmatch
+ else:
+ overlap_with_earlier = torch.where(
+ is_earlier_matrix,
+ overlap_pair_counts,
+ torch.zeros_like(overlap_pair_counts),
+ )
+
+ # For each object (column j), find max overlap with any earlier object (row i)
+ max_overlap_with_earlier, _ = overlap_with_earlier.max(dim=0) # (N_obj,)
+
+ # Remove if: within hotstart AND overlapped with earlier >= threshold
+ remove_by_overlap = (
+ is_within_hotstart
+ & (max_overlap_with_earlier >= self.hotstart_dup_thresh)
+ & ~removed_mask
+ )
+
+ to_remove = remove_by_unmatch | remove_by_overlap # (N_obj,)
+
+ to_suppress = suppress_by_unmatch # (N_obj,)
+
+ # Update removed mask for future frames
+ removed_mask = removed_mask | to_remove
+
+ # ============================================================================
+ # STEP 9: Package updated metadata (NO SYNCS)
+ # ============================================================================
+
+ gpu_metadata_new = {
+ "N_obj": N_obj,
+ "obj_first_frame": obj_first_frame,
+ "consecutive_unmatch_count": consecutive_unmatch_count,
+ "trk_keep_alive": trk_keep_alive,
+ "removed_mask": removed_mask,
+ "overlap_pair_counts": overlap_pair_counts,
+ "last_occluded_tensor": last_occluded,
+ }
+
+ return to_remove, to_suppress, gpu_metadata_new
+
+ def _process_hotstart(
+ self,
+ frame_idx: int,
+ num_frames: int,
+ reverse: bool,
+ det_to_matched_trk_obj_ids: Dict[int, np.ndarray],
+ new_det_obj_ids: np.ndarray,
+ empty_trk_obj_ids: np.ndarray,
+ unmatched_trk_obj_ids: np.ndarray,
+ rank0_metadata: Dict[str, Any],
+ tracker_metadata: Dict[str, Any],
+ ):
+ """Handle hotstart heuristics to remove unmatched or duplicated objects."""
+ # obj_id --> first frame index where the object was detected
+ obj_first_frame_idx = rank0_metadata["obj_first_frame_idx"]
+ # obj_id --> [mismatched frame indices]
+ unmatched_frame_inds = rank0_metadata["unmatched_frame_inds"]
+ trk_keep_alive = rank0_metadata["trk_keep_alive"]
+ # (first_appear_obj_id, obj_id) --> [overlap frame indices]
+ overlap_pair_to_frame_inds = rank0_metadata["overlap_pair_to_frame_inds"]
+ # removed_obj_ids: object IDs that are suppressed via hot-start
+ removed_obj_ids = rank0_metadata["removed_obj_ids"]
+ suppressed_obj_ids = rank0_metadata["suppressed_obj_ids"][frame_idx]
+
+ obj_ids_newly_removed = set() # object IDs to be newly removed on this frame
+ hotstart_diff = (
+ frame_idx - self.hotstart_delay
+ if not reverse
+ else frame_idx + self.hotstart_delay
+ )
+
+ # Step 1: log the frame index where each object ID first appears
+ for obj_id in new_det_obj_ids:
+ if obj_id not in obj_first_frame_idx:
+ obj_first_frame_idx[obj_id] = frame_idx
+ assert obj_id not in trk_keep_alive
+ trk_keep_alive[obj_id] = self.init_trk_keep_alive
+
+ matched_trks = set()
+ # We use the det-->tracks list to check for matched objects. Otherwise, we need to compute areas to decide whether they're occluded
+ for matched_trks_per_det in det_to_matched_trk_obj_ids.values():
+ matched_trks.update(matched_trks_per_det)
+ for obj_id in matched_trks:
+ # NOTE: To minimize number of configurable params, we use the hotstart_unmatch_thresh to set the max value of trk_keep_alive
+ trk_keep_alive[obj_id] = min(
+ self.max_trk_keep_alive, trk_keep_alive[obj_id] + 1
+ )
+ for obj_id in unmatched_trk_obj_ids:
+ unmatched_frame_inds[obj_id].append(frame_idx)
+ # NOTE: To minimize number of configurable params, we use the hotstart_unmatch_thresh to set the min value of trk_keep_alive
+ # The max keep alive is 2x the min, means the model prefers to keep the prediction rather than suppress it if it was matched long enough.
+ trk_keep_alive[obj_id] = max(
+ self.min_trk_keep_alive, trk_keep_alive[obj_id] - 1
+ )
+ if self.decrease_trk_keep_alive_for_empty_masklets:
+ for obj_id in empty_trk_obj_ids:
+ # NOTE: To minimize number of configurable params, we use the hotstart_unmatch_thresh to set the min value of trk_keep_alive
+ trk_keep_alive[obj_id] = max(
+ self.min_trk_keep_alive, trk_keep_alive[obj_id] - 1
+ )
+
+ # Step 2: removed tracks that has not matched with detections for `hotstart_unmatch_thresh` frames with hotstart period
+ # a) add unmatched frame indices for each existing object ID
+ # note that `unmatched_trk_obj_ids` contains those frames where the SAM2 output mask
+ # doesn't match any FA detection; it excludes those frames where SAM2 gives an empty mask
+ # b) remove a masklet if it first appears after `hotstart_diff` and is unmatched for more
+ # than `self.hotstart_unmatch_thresh` frames
+ for obj_id, frame_indices in unmatched_frame_inds.items():
+ if obj_id in removed_obj_ids or obj_id in obj_ids_newly_removed:
+ continue # skip if the object is already removed
+ if len(frame_indices) >= self.hotstart_unmatch_thresh:
+ is_within_hotstart = (
+ obj_first_frame_idx[obj_id] > hotstart_diff and not reverse
+ ) or (obj_first_frame_idx[obj_id] < hotstart_diff and reverse)
+ if is_within_hotstart:
+ obj_ids_newly_removed.add(obj_id)
+ logger.info(
+ f"Removing object {obj_id} at frame {frame_idx} "
+ f"since it is unmatched for frames: {frame_indices}"
+ )
+ if (
+ trk_keep_alive[obj_id] <= 0 # Object has not been matched for too long
+ and not self.suppress_unmatched_only_within_hotstart
+ and obj_id not in removed_obj_ids
+ and obj_id not in obj_ids_newly_removed
+ ):
+ logger.debug(
+ f"Suppressing object {obj_id} at frame {frame_idx}, due to being unmatched"
+ )
+ suppressed_obj_ids.add(obj_id)
+
+ # Step 3: removed tracks that overlaps with another track for `hotstart_dup_thresh` frames
+ # a) find overlaps tracks -- we consider overlap if they match to the same detection
+ for _, matched_trk_obj_ids in det_to_matched_trk_obj_ids.items():
+ if len(matched_trk_obj_ids) < 2:
+ continue # only count detections that are matched to multiple (>=2) masklets
+ # if there are multiple matched track ids, we need to find the one that appeared first;
+ # these later appearing ids may be removed since they may be considered as duplicates
+ first_appear_obj_id = (
+ min(matched_trk_obj_ids, key=lambda x: obj_first_frame_idx[x])
+ if not reverse
+ else max(matched_trk_obj_ids, key=lambda x: obj_first_frame_idx[x])
+ )
+ for obj_id in matched_trk_obj_ids:
+ if obj_id != first_appear_obj_id:
+ key = (first_appear_obj_id, obj_id)
+ overlap_pair_to_frame_inds[key].append(frame_idx)
+
+ # b) remove a masklet if it first appears after `hotstart_diff` and it overlaps with another
+ # masklet (that appears earlier) for more than `self.hotstart_dup_thresh` frames
+ for (first_obj_id, obj_id), frame_indices in overlap_pair_to_frame_inds.items():
+ if obj_id in removed_obj_ids or obj_id in obj_ids_newly_removed:
+ continue # skip if the object is already removed
+ if (obj_first_frame_idx[obj_id] > hotstart_diff and not reverse) or (
+ obj_first_frame_idx[obj_id] < hotstart_diff and reverse
+ ):
+ if len(frame_indices) >= self.hotstart_dup_thresh:
+ obj_ids_newly_removed.add(obj_id)
+ logger.info(
+ f"Removing object {obj_id} at frame {frame_idx} "
+ f"since it overlaps with another track {first_obj_id} at frames: {frame_indices}"
+ )
+
+ removed_obj_ids.update(obj_ids_newly_removed)
+ return obj_ids_newly_removed, rank0_metadata
+
+ def _tracker_update_memories(
+ self,
+ sam2_inference_states: List[Any],
+ frame_idx: int,
+ tracker_metadata: Dict[str, Any],
+ low_res_masks: Tensor,
+ ):
+ """
+ Run Sam2 memory encoder, enforcing non-overlapping constraints globally.
+ """
+ # TODO: Add most recently occluded heuristic for suppression of overlapping masks
+ if len(sam2_inference_states) == 0:
+ return
+ # Avoid an extra interpolation step by directly interpolating to `interpol_size`
+ high_res_H, high_res_W = (
+ self.tracker.maskmem_backbone.mask_downsampler.interpol_size
+ )
+ # NOTE: inspect this part if we observe OOMs in the demo
+ high_res_masks = F.interpolate(
+ low_res_masks.unsqueeze(1),
+ size=(high_res_H, high_res_W),
+ mode="bilinear",
+ align_corners=False,
+ )
+ # We first apply non-overlapping constraints before memory encoding. This may include some suppression heuristics.
+ with torch.profiler.record_function(
+ "sam2_predictor.propagate_in_video.apply_non_overlapping_constraints"
+ ):
+ # TODO: try _apply_object_wise_non_overlapping_constraints instead
+ high_res_masks = self.tracker._suppress_object_pw_area_shrinkage(
+ high_res_masks
+ )
+ # Instead of gathering the predicted object scores, we use mask areas as a proxy.
+ object_score_logits = torch.where(
+ (high_res_masks > 0).any(dim=(-1, -2)), 10.0, -10.0
+ )
+
+ if self.is_multiplex and self.tracker.is_multiplex_dynamic:
+ # The objects in the masks are ordered w.r.t. object IDs,
+ # which might not be true in the dynamic multiplex case with backfilling
+ # (see also _propogate_tracker_one_frame_local_gpu)
+ # We need to plan globally for the mask assignment here
+ object_idx_assignment: dict[int, list[int]] = {}
+ all_object_ids: list[int] = []
+ object_id_to_state_i: dict[int, int] = {}
+ for state_i, sam2_state in enumerate(sam2_inference_states):
+ obj_ids = sam2_state["obj_ids"]
+ all_object_ids.extend(obj_ids)
+ for obj_id in obj_ids:
+ object_id_to_state_i[obj_id] = state_i
+ object_idx_assignment[state_i] = []
+ sorted_indices = sorted(
+ range(len(all_object_ids)), key=lambda i: all_object_ids[i]
+ )
+ # Build the object_idx_assignment mapping
+ for global_idx, local_idx in enumerate(sorted_indices):
+ obj_id = all_object_ids[local_idx]
+ object_idx_assignment[object_id_to_state_i[obj_id]].append(global_idx)
+
+ # Run the memory encoder on local slices for each GPU
+ start_idx_gpu = sum(tracker_metadata["num_obj_per_gpu"][: self.rank])
+ start_idx_state = start_idx_gpu
+ for state_i, sam2_state in enumerate(sam2_inference_states):
+ num_obj_per_state = len(sam2_state["obj_ids"])
+ if num_obj_per_state == 0:
+ continue
+ # Get the local high-res masks and object score logits for this inference state
+ if self.is_multiplex and self.tracker.is_multiplex_dynamic:
+ local_idx = (
+ torch.tensor(object_idx_assignment[state_i])
+ .pin_memory()
+ .to(device=high_res_masks.device, non_blocking=True)
+ )
+ local_high_res_masks = high_res_masks[local_idx]
+ local_object_score_logits = object_score_logits[local_idx]
+ else:
+ end_idx_state = start_idx_state + num_obj_per_state
+ local_high_res_masks = high_res_masks[start_idx_state:end_idx_state]
+ local_object_score_logits = object_score_logits[
+ start_idx_state:end_idx_state
+ ]
+ local_batch_size = local_high_res_masks.size(0)
+ # Run Sam2 memory encoder. Note that we do not re-enforce the non-overlapping constraint as it is turned off by default
+
+ encoded_mem = self.tracker._run_memory_encoder(
+ sam2_state,
+ frame_idx,
+ local_batch_size,
+ local_high_res_masks,
+ local_object_score_logits,
+ is_mask_from_pts=False,
+ )
+ if self.is_multiplex:
+ (
+ local_maskmem_features,
+ local_maskmem_pos_enc,
+ local_image_features,
+ local_image_pos_enc,
+ ) = encoded_mem
+ else:
+ local_maskmem_features, local_maskmem_pos_enc = encoded_mem
+
+ # Store encoded memories in the local inference state
+ output_dict = sam2_state["output_dict"]
+ for storage_key in ["cond_frame_outputs", "non_cond_frame_outputs"]:
+ if frame_idx not in output_dict[storage_key]:
+ continue
+ output_dict[storage_key][frame_idx]["maskmem_features"] = (
+ local_maskmem_features
+ )
+ output_dict[storage_key][frame_idx]["maskmem_pos_enc"] = [
+ pos for pos in local_maskmem_pos_enc
+ ]
+ if self.is_multiplex:
+ output_dict[storage_key][frame_idx]["image_features"] = (
+ local_image_features
+ )
+ output_dict[storage_key][frame_idx]["image_pos_enc"] = (
+ local_image_pos_enc
+ )
+
+ if self.reapply_no_object_pointer:
+ # reapply the no_object_pointer projection for the objects suppressed by the heuristics
+ newly_suppressed_objects = (
+ output_dict[storage_key][frame_idx]["object_score_logits"]
+ > self.tracker.object_score_logit_threshold
+ ) & (local_object_score_logits < 0)
+ if torch.any(newly_suppressed_objects):
+ existing_pointers = output_dict[storage_key][frame_idx][
+ "obj_ptr"
+ ]
+
+ multiplex_state = sam2_state["multiplex_state"]
+ existing_pointers = multiplex_state.demux(existing_pointers)
+
+ newly_suppressed_objects = newly_suppressed_objects.float()
+ new_pointers = (
+ newly_suppressed_objects
+ * self.tracker.no_obj_ptr_linear(existing_pointers)
+ + (1 - newly_suppressed_objects) * existing_pointers
+ )
+
+ output_dict[storage_key][frame_idx]["obj_ptr"] = (
+ multiplex_state.mux(new_pointers)
+ )
+ elif self.reapply_no_object_pointer:
+ raise NotImplementedError(
+ "reapply_no_object_pointer is not implemented for non-multiplex"
+ )
+
+ # for batched inference state, we also need to add per-object
+ # memory slides to support instance interactivity
+ self.tracker.add_output_per_object(
+ inference_state=sam2_state,
+ frame_idx=frame_idx,
+ current_out=output_dict[storage_key][frame_idx],
+ storage_key=storage_key,
+ )
+ start_idx_state += num_obj_per_state
+
+ def _tracker_add_new_objects(
+ self,
+ frame_idx: int,
+ num_frames: int,
+ new_obj_ids: List[int],
+ new_obj_masks: Tensor,
+ tracker_states_local: List[Any],
+ orig_vid_height: int,
+ orig_vid_width: int,
+ feature_cache: Dict,
+ ):
+ """Add new objects to SAM2 inference states."""
+
+ prev_sam2_state = (
+ tracker_states_local[0] if len(tracker_states_local) > 0 else None
+ )
+ # prepare inference_state
+ if self.tracker.is_multiplex_dynamic:
+ # in multiplex_dynamic mode, we first try to find the best-fit
+ # inference state for the new objects.
+ # Create a new state if needed
+ num_new_objects = len(new_obj_ids)
+
+ # Try to find existing states with available slots
+ best_state = None
+ best_available_slots = float("inf")
+
+ for state in tracker_states_local:
+ available_slots = state["multiplex_state"].available_slots
+ # Find the state with the least available slots that can still fit the new objects
+ if (
+ available_slots >= num_new_objects
+ and available_slots < best_available_slots
+ ):
+ best_state = state
+ best_available_slots = available_slots
+
+ if best_state is not None:
+ # Use the existing state with sufficient available slots
+ new_sam2_state = best_state
+ else:
+ # Need to create a new state
+ new_sam2_state = self.tracker.init_state(
+ cached_features=feature_cache,
+ video_height=orig_vid_height,
+ video_width=orig_vid_width,
+ num_frames=num_frames,
+ )
+ new_sam2_state["backbone_out"] = (
+ prev_sam2_state.get("backbone_out", None)
+ if prev_sam2_state is not None
+ else None
+ )
+ # Add the new state to our local states list
+ tracker_states_local.append(new_sam2_state)
+ else:
+ if self.tracker.per_obj_inference:
+ # in per_obj_inference mode, init_state happens only once,
+ # new obj_ids will be added to the existing inference state
+ if prev_sam2_state is not None:
+ new_sam2_state = prev_sam2_state
+ else:
+ new_sam2_state = self.tracker.init_state(
+ cached_features=feature_cache,
+ video_height=orig_vid_height,
+ video_width=orig_vid_width,
+ num_frames=num_frames,
+ )
+ new_sam2_state["backbone_out"] = None
+ tracker_states_local = [new_sam2_state]
+ else:
+ # batch objects that first appear on the same frame together
+ # Clear inference state. Keep the cached image features if available.
+ new_sam2_state = self.tracker.init_state(
+ cached_features=feature_cache,
+ video_height=orig_vid_height,
+ video_width=orig_vid_width,
+ num_frames=num_frames,
+ )
+ new_sam2_state["backbone_out"] = (
+ prev_sam2_state.get("backbone_out", None)
+ if prev_sam2_state is not None
+ else None
+ )
+ tracker_states_local.append(new_sam2_state)
+
+ assert len(new_obj_ids) == new_obj_masks.size(0)
+ assert new_obj_masks.is_floating_point()
+ # TODO consider removing this interpolation -- it's probably no longer needed
+ # we should edit `self.tracker.add_new_mask` to directly take low-res input masks
+ input_mask_res = self.tracker.input_mask_size
+ new_obj_masks = F.interpolate(
+ new_obj_masks.unsqueeze(1),
+ size=(input_mask_res, input_mask_res),
+ mode="bilinear",
+ align_corners=False,
+ ).squeeze(1)
+ new_obj_masks = new_obj_masks > 0
+
+ if self.is_multiplex:
+ # add all objects at once
+ # NOTE: In the current implementation, add_new_masks also runs the memory encoder
+ # the non-overlapping constraint is enforced
+ self.tracker.add_new_masks(
+ inference_state=new_sam2_state,
+ frame_idx=frame_idx,
+ obj_ids=new_obj_ids,
+ masks=new_obj_masks,
+ add_mask_to_memory=True,
+ )
+ else:
+ # add object one by one
+ for new_obj_id, new_mask in zip(new_obj_ids, new_obj_masks):
+ self.tracker.add_new_mask(
+ inference_state=new_sam2_state,
+ frame_idx=frame_idx,
+ obj_id=new_obj_id,
+ mask=new_mask,
+ add_mask_to_memory=True,
+ )
+ # NOTE: we skip enforcing the non-overlapping constraint **globally** when adding new objects.
+ self.tracker.propagate_in_video_preflight(new_sam2_state, run_mem_encoder=True)
+
+ return tracker_states_local
+
+ def _tracker_remove_objects(
+ self, tracker_states_local: List[Any], obj_ids: list[int]
+ ):
+ """
+ Remove an object from SAM2 inference states. This would remove the object from
+ all frames in the video.
+ """
+ if self.is_multiplex:
+ tracker_states_local_before_removal = tracker_states_local.copy()
+ tracker_states_local.clear()
+ for sam2_inference_state in tracker_states_local_before_removal:
+ # we try to remove `obj_id` on every inference state with `strict=False`
+ # it will not do anything if an inference state doesn't contain `obj_id`
+ new_obj_ids, _ = self.tracker.remove_objects(
+ sam2_inference_state, obj_ids, strict=False, need_output=False
+ )
+ # only keep an inference state if it's non-empty after object removal
+ if len(new_obj_ids) > 0:
+ tracker_states_local.append(sam2_inference_state)
+ else:
+ for obj_id in obj_ids:
+ self._tracker_remove_object(tracker_states_local, obj_id)
+
+ def update_masklet_confirmation_status(
+ self,
+ rank0_metadata: Dict[str, Any],
+ obj_ids_all_gpu_prev: np.ndarray,
+ obj_ids_all_gpu_updated: np.ndarray,
+ det_to_matched_trk_obj_ids: Dict[int, np.ndarray],
+ new_det_obj_ids: np.ndarray,
+ ):
+ """
+ Update masklet confirmation status.
+ """
+ confirmation_data = rank0_metadata["masklet_confirmation"]
+ status_prev = confirmation_data["status"]
+ consecutive_det_num_prev = confirmation_data["consecutive_det_num"]
+
+ N_prev = len(obj_ids_all_gpu_prev)
+ N_updated = len(obj_ids_all_gpu_updated)
+
+ # a) Map previous confirmation data to updated positions
+ # For small arrays, simple dict lookup is fast
+ unconfirmed_val = MaskletConfirmationStatus.UNCONFIRMED.value
+ status = np.full(N_updated, unconfirmed_val, dtype=np.int64)
+ consecutive_det_num = np.zeros(N_updated, dtype=np.int64)
+
+ if N_prev > 0 and N_updated > 0:
+ # Build mapping: obj_id -> new index
+ obj_id_to_new_idx = {
+ obj_id: idx for idx, obj_id in enumerate(obj_ids_all_gpu_updated)
+ }
+
+ # Copy previous values for objects that still exist
+ for old_idx, obj_id in enumerate(obj_ids_all_gpu_prev):
+ new_idx = obj_id_to_new_idx.get(obj_id)
+ if new_idx is not None:
+ status[new_idx] = status_prev[old_idx]
+ consecutive_det_num[new_idx] = consecutive_det_num_prev[old_idx]
+
+ # b) Update confirmation status based on current frame detections
+ # Build set of all matched object IDs
+ matched_obj_ids = set(new_det_obj_ids)
+ for matched_trk_ids in det_to_matched_trk_obj_ids.values():
+ matched_obj_ids.update(matched_trk_ids)
+
+ # Update consecutive detection count and status
+ for idx, obj_id in enumerate(obj_ids_all_gpu_updated):
+ if obj_id in matched_obj_ids:
+ consecutive_det_num[idx] += 1
+ else:
+ consecutive_det_num[idx] = 0
+
+ # Update status to CONFIRMED where threshold is met
+ if (
+ consecutive_det_num[idx]
+ >= self.masklet_confirmation_consecutive_det_thresh
+ ):
+ status[idx] = MaskletConfirmationStatus.CONFIRMED.value
+
+ # Store updated arrays
+ confirmation_data["status"] = status
+ confirmation_data["consecutive_det_num"] = consecutive_det_num
+ return rank0_metadata
+
+
+class Sam3MultiplexPredictorWrapper(Sam3MultiplexTrackerPredictor):
+ """
+ Wraps a pre-built multiplex tracker model with the same interface as the
+ onevision Sam3MultiplexTrackerPredictor class. Inherits from Sam3MultiplexTrackerPredictor to pass
+ isinstance checks, but skips Sam3MultiplexTrackerPredictor.__init__ (which requires Hydra).
+
+ Provides bf16 autocast, attribute proxying, and configuration flags
+ needed by Sam3MultiplexTracking.
+
+ The onevision Sam3MultiplexTrackerPredictor builds the tracker from Hydra config and applies
+ extensive hydra_overrides. This version skips Hydra entirely — the caller
+ is responsible for building the tracker via model_builder.py with the
+ correct parameters.
+
+ Key parameters that the onevision Sam3MultiplexTrackerPredictor sets via hydra_overrides
+ (documented here for reference — these must be set in model_builder.py):
+ - image_size=1008, backbone_stride=14
+ - maskmem_backbone.mask_downsampler.interpol_size=[1152,1152]
+ - always_start_from_first_ann_frame=false
+ - non_overlap_masks_for_mem_enc=false, non_overlap_masks_for_output=false
+ - max_cond_frames_in_attn=4
+ - offload_output_to_cpu_for_eval=false, trim_past_non_cond_mem_for_eval=false
+ - sam_mask_decoder_extra_args: dynamic_multimask_via_stability=true, etc.
+ - binarize_mask_from_pts_for_mem_enc=true (SAM2 tracker default)
+ - only_obj_ptrs_in_the_past_for_eval=true
+ - clear_non_cond_mem_around_input=true
+ - transformer.encoder.layer.self_attention.feat_sizes=[72,72]
+ - transformer.encoder.layer.cross_attention.feat_sizes=[72,72]
+ - fill_hole_area=
+ - use_fa3, use_rope_real on self_attention, cross_attention,
+ self_attention_rope, cross_attention_rope
+ - use_memory_selection
+ """
+
+ def __init__(
+ self,
+ model,
+ per_obj_inference=False,
+ fill_hole_area=0,
+ is_multiplex=True,
+ is_multiplex_dynamic=True,
+ ):
+ # Skip Sam3MultiplexTrackerPredictor.__init__ (requires Hydra) — call nn.Module.__init__ directly
+ nn.Module.__init__(self)
+ self.model = model
+ self.per_obj_inference = per_obj_inference
+ self.fill_hole_area = fill_hole_area
+ self.is_multiplex = is_multiplex
+ self.is_multiplex_dynamic = is_multiplex_dynamic
+
+ # use bfloat16 inference for Flash Attention kernel
+ self.bf16_context = torch.autocast(device_type="cuda", dtype=torch.bfloat16)
+ self.bf16_context.__enter__()
diff --git a/sam3/model/sam3_multiplex_detector.py b/sam3/model/sam3_multiplex_detector.py
new file mode 100644
index 0000000..435d6d6
--- /dev/null
+++ b/sam3/model/sam3_multiplex_detector.py
@@ -0,0 +1,943 @@
+import os
+
+import torch
+from sam3.model.vl_combiner import SAM3VLBackbone
+
+try:
+ from sam3.model.vl_combiner import SAM3VLBackboneTri
+except ImportError:
+ SAM3VLBackboneTri = None
+from typing import Dict, List, Optional
+
+import numpy as np
+from sam3.model.data_misc import BatchedDatapoint, FindStage
+from sam3.model.geometry_encoders import Prompt
+from sam3.model.model_misc import SAM3Output
+from sam3.model.sam3_image import Sam3Image
+from sam3.model.sam3_multiplex_detector_utils import nms_masks
+
+
+class Sam3MultiplexImageBase(Sam3Image):
+ """A wrapper class to run Sam3Image on videos for per-frame detection (no tracking)."""
+
+ def __init__(
+ self,
+ *args,
+ tracking_score_thresh: float = 0.0,
+ offload_outputs_to_cpu_for_eval: bool = False,
+ **kwargs,
+ ):
+ super().__init__(*args, **kwargs)
+ self.tracking_score_thresh = tracking_score_thresh
+ self.offload_outputs_to_cpu_for_eval = offload_outputs_to_cpu_for_eval
+ self.trim_outputs_for_eval = True # dummy option -- it doesn't do anything
+
+ def forward(
+ self,
+ input: BatchedDatapoint,
+ is_inference=False, # (a dummy parameter not used anymore)
+ ):
+ assert not self.training, (
+ "Sam3MultiplexImageBase should only be used in eval mode."
+ )
+
+ device = self.device
+ backbone_out = {"img_batch_all_stages": input.img_batch}
+ text_outputs = self.backbone.forward_text(input.find_text_batch, device=device)
+ backbone_out.update(text_outputs)
+ num_frames = len(input.find_inputs)
+
+ previous_stages_out = SAM3Output(
+ iter_mode=SAM3Output.IterMode.LAST_STEP_PER_STAGE
+ )
+ for frame_idx in range(num_frames):
+ find_input = input.find_inputs[frame_idx]
+ find_target = input.find_targets[frame_idx]
+ geometric_prompt = self._get_geo_prompt_from_find_input(find_input)
+ cur_out, _ = self.forward_video_grounding(
+ backbone_out=backbone_out,
+ find_input=find_input,
+ find_target=find_target,
+ geometric_prompt=geometric_prompt,
+ )
+ # offload model outputs to CPU (to save GPU memory) for evaluation
+ if self.offload_outputs_to_cpu_for_eval:
+ cur_out = {k: v.cpu() for k, v in cur_out.items()}
+
+ previous_stages_out.append([cur_out])
+
+ get_queries = None
+ return previous_stages_out, get_queries
+
+ def forward_video_grounding(
+ self,
+ backbone_out,
+ find_input,
+ find_target,
+ geometric_prompt: Prompt,
+ **kwargs,
+ ):
+ # route this to the image grounding forward method
+ out = self.forward_grounding(
+ backbone_out=backbone_out,
+ find_input=find_input,
+ find_target=find_target,
+ geometric_prompt=geometric_prompt,
+ )
+ # trim the output to only include the necessary keys
+ out = {
+ "pred_logits": out["pred_logits"],
+ "pred_boxes": out["pred_boxes"],
+ "pred_boxes_xyxy": out["pred_boxes_xyxy"],
+ "pred_masks": out["pred_masks"],
+ "pred_object_ids": self._get_dummy_object_ids(out["pred_logits"]),
+ }
+ return out, backbone_out
+
+ def _get_dummy_object_ids(self, pred_logits):
+ """Generate dummy object IDs for the detected objects, based on their detection query indices."""
+ # Assuming pred_logits has shape [batch_size, num_queries, num_classes]
+ B, Q, _ = pred_logits.shape
+ is_above_thresh = pred_logits.squeeze(2) > self.tracking_score_thresh
+ dummy_obj_ids = torch.arange(Q, device=self.device).expand(B, -1)
+ dummy_obj_ids = torch.where(is_above_thresh, dummy_obj_ids, -1)
+ return dummy_obj_ids
+
+ def _trim_outputs(self, *args, **kwargs):
+ pass # not needed for image-on-video
+
+ def _batch_find_inputs(
+ self,
+ find_inputs: List[FindStage],
+ chunk_start: int,
+ chunk_end: int,
+ ) -> FindStage:
+ """
+ Batch multiple FindStage objects into a single batched FindStage.
+
+ For each frame in the chunk, creates img_ids that point to the correct
+ frame index. When processing streaming video, the img_ids are the actual
+ frame indices (e.g., [0, 1, 2, ..., 15] for chunk 0-16), and the modulo
+ for circular buffer access is applied later in _get_img_feats.
+
+ Args:
+ find_inputs: List of FindStage objects for all frames.
+ chunk_start: Start index of the chunk.
+ chunk_end: End index of the chunk (exclusive).
+
+ Returns:
+ A single FindStage with batched tensors.
+ """
+ chunk_find_inputs = [
+ find_inputs[i % len(find_inputs)] for i in range(chunk_start, chunk_end)
+ ]
+
+ # Generate img_ids based on chunk frame indices
+ # Each frame in the chunk gets its corresponding frame index
+ # The modulo for circular buffer access is handled in _get_img_feats
+ device = chunk_find_inputs[0].img_ids.device
+ dtype = chunk_find_inputs[0].img_ids.dtype
+ img_ids_list = [
+ torch.tensor([i], device=device, dtype=dtype)
+ for i in range(chunk_start, chunk_end)
+ ]
+ batched_img_ids = torch.cat(img_ids_list, dim=0)
+
+ # Generate img_ids_np to match
+ img_ids_np_list = [np.array([i]) for i in range(chunk_start, chunk_end)]
+ batched_img_ids_np = np.concatenate(img_ids_np_list, axis=0)
+
+ # Concatenate text_ids
+ text_ids_list = [fi.text_ids for fi in chunk_find_inputs]
+ batched_text_ids = torch.cat(text_ids_list, dim=0)
+
+ # Concatenate input_boxes
+ input_boxes_list = [fi.input_boxes for fi in chunk_find_inputs]
+ batched_input_boxes = (
+ torch.cat(input_boxes_list, dim=0)
+ if input_boxes_list[0] is not None
+ else None
+ )
+
+ # Concatenate input_boxes_mask
+ input_boxes_mask_list = [fi.input_boxes_mask for fi in chunk_find_inputs]
+ batched_input_boxes_mask = (
+ torch.cat(input_boxes_mask_list, dim=0)
+ if input_boxes_mask_list[0] is not None
+ else None
+ )
+
+ # Concatenate input_boxes_label
+ input_boxes_label_list = [fi.input_boxes_label for fi in chunk_find_inputs]
+ batched_input_boxes_label = (
+ torch.cat(input_boxes_label_list, dim=0)
+ if input_boxes_label_list[0] is not None
+ else None
+ )
+
+ # Concatenate input_points
+ input_points_list = [fi.input_points for fi in chunk_find_inputs]
+ batched_input_points = (
+ torch.cat(input_points_list, dim=0)
+ if input_points_list[0] is not None
+ else None
+ )
+
+ # Concatenate input_points_mask
+ input_points_mask_list = [fi.input_points_mask for fi in chunk_find_inputs]
+ batched_input_points_mask = (
+ torch.cat(input_points_mask_list, dim=0)
+ if input_points_mask_list[0] is not None
+ else None
+ )
+
+ # Handle optional fields
+ input_boxes_before_embed_list = [
+ fi.input_boxes_before_embed for fi in chunk_find_inputs
+ ]
+ batched_input_boxes_before_embed = (
+ torch.cat(input_boxes_before_embed_list, dim=0)
+ if input_boxes_before_embed_list[0] is not None
+ else None
+ )
+
+ input_points_before_embed_list = [
+ fi.input_points_before_embed for fi in chunk_find_inputs
+ ]
+ batched_input_points_before_embed = (
+ torch.cat(input_points_before_embed_list, dim=0)
+ if input_points_before_embed_list[0] is not None
+ else None
+ )
+
+ # Create batched FindStage
+ batched_find_input = FindStage(
+ img_ids=batched_img_ids,
+ img_ids_np=batched_img_ids_np,
+ text_ids=batched_text_ids,
+ input_boxes=batched_input_boxes,
+ input_boxes_mask=batched_input_boxes_mask,
+ input_boxes_label=batched_input_boxes_label,
+ input_points=batched_input_points,
+ input_points_mask=batched_input_points_mask,
+ ptrs=None, # Not batching pointers for now
+ ptrs_seg=None,
+ object_ids=None,
+ input_boxes_before_embed=batched_input_boxes_before_embed,
+ input_points_before_embed=batched_input_points_before_embed,
+ )
+
+ return batched_find_input
+
+ def _batch_geometric_prompts(
+ self,
+ geometric_prompts: List[Prompt],
+ chunk_start: int,
+ chunk_end: int,
+ ) -> Prompt:
+ """
+ Batch multiple Prompt objects into a single batched Prompt.
+
+ Args:
+ geometric_prompts: List of Prompt objects for all frames.
+ chunk_start: Start index of the chunk.
+ chunk_end: End index of the chunk (exclusive).
+
+ Returns:
+ A single Prompt with batched tensors.
+ """
+ chunk_prompts = [geometric_prompts[i] for i in range(chunk_start, chunk_end)]
+ return self._batch_geometric_prompts_from_list(chunk_prompts)
+
+ def _batch_geometric_prompts_from_list(
+ self,
+ chunk_prompts: List[Prompt],
+ ) -> Prompt:
+ """
+ Batch a list of Prompt objects into a single batched Prompt.
+
+ Prompt uses seq-first, batch-second convention:
+ - box_embeddings: N_boxes x B x C_box - batch along dim 1
+ - box_mask: B x N_boxes - batch along dim 0
+ - box_labels: N_boxes x B - batch along dim 1
+ - point_embeddings: N_points x B x C_point - batch along dim 1
+ - point_mask: B x N_points - batch along dim 0
+ - point_labels: N_points x B - batch along dim 1
+
+ Args:
+ chunk_prompts: List of Prompt objects to batch.
+
+ Returns:
+ A single Prompt with batched tensors.
+ """
+
+ # Helper function to batch tensors along specified dimension
+ def batch_tensors(tensors, dim):
+ if tensors[0] is None:
+ return None
+ return torch.cat(tensors, dim=dim)
+
+ # Batch box embeddings (N_boxes x B x C_box - batch along dim 1)
+ box_embeddings_list = [p.box_embeddings for p in chunk_prompts]
+ batched_box_embeddings = batch_tensors(box_embeddings_list, dim=1)
+
+ # Batch box mask (B x N_boxes - batch along dim 0)
+ box_mask_list = [p.box_mask for p in chunk_prompts]
+ batched_box_mask = batch_tensors(box_mask_list, dim=0)
+
+ # Batch box labels (N_boxes x B - batch along dim 1)
+ box_labels_list = [p.box_labels for p in chunk_prompts]
+ batched_box_labels = batch_tensors(box_labels_list, dim=1)
+
+ # Batch point embeddings (N_points x B x C_point - batch along dim 1)
+ point_embeddings_list = [p.point_embeddings for p in chunk_prompts]
+ batched_point_embeddings = batch_tensors(point_embeddings_list, dim=1)
+
+ # Batch point mask (B x N_points - batch along dim 0)
+ point_mask_list = [p.point_mask for p in chunk_prompts]
+ batched_point_mask = batch_tensors(point_mask_list, dim=0)
+
+ # Batch point labels (N_points x B - batch along dim 1)
+ point_labels_list = [p.point_labels for p in chunk_prompts]
+ batched_point_labels = batch_tensors(point_labels_list, dim=1)
+
+ # Create batched Prompt
+ batched_prompt = Prompt(
+ box_embeddings=batched_box_embeddings,
+ box_mask=batched_box_mask,
+ box_labels=batched_box_labels,
+ point_embeddings=batched_point_embeddings,
+ point_mask=batched_point_mask,
+ point_labels=batched_point_labels,
+ )
+
+ return batched_prompt
+
+
+class Sam3MultiplexDetector(Sam3MultiplexImageBase):
+ def __init__(
+ self,
+ *args,
+ async_all_gather=True,
+ gather_backbone_out=None,
+ is_multiplex=False,
+ **kwargs,
+ ):
+ super().__init__(*args, **kwargs)
+ self.rank = int(os.getenv("RANK", "0"))
+ self.world_size = int(os.getenv("WORLD_SIZE", "1"))
+ self.async_all_gather = async_all_gather
+
+ # if gather_backbone is not set, default to gathering only for `SAM3VLBackbone`
+ if gather_backbone_out is None:
+ gather_backbone_out = isinstance(self.backbone, SAM3VLBackbone) or (
+ SAM3VLBackboneTri is not None
+ and isinstance(self.backbone, SAM3VLBackboneTri)
+ )
+ self.gather_backbone_out = gather_backbone_out
+ self.is_multiplex = is_multiplex
+
+ def forward_video_grounding_multigpu(
+ self,
+ backbone_out,
+ find_inputs,
+ geometric_prompt: Prompt,
+ frame_idx,
+ num_frames,
+ # `multigpu_buffer` is a dict to cache FA outputs in a chunk between different calls
+ multigpu_buffer,
+ track_in_reverse=False,
+ # whether to also return the SAM2 backbone features (in addition to FA results)
+ return_sam2_backbone_feats=False,
+ # whether to perform NMS and suppress the scores of those detections removed by NMS
+ run_nms=False,
+ nms_prob_thresh=None,
+ nms_iou_thresh=None,
+ nms_use_iom=False,
+ # tracking bounds to respect max_frame_num_to_track
+ max_frame_num_to_track=None,
+ propagate_in_video_start_frame_idx=None,
+ # feature_cache for buffered backbone computation
+ feature_cache=None,
+ **kwargs,
+ ):
+ """
+ Compute the FA detection outputs in a distributed manner, where all GPUs process
+ a chunk of frames (equal to the number of GPUs) at once and store them in cache.
+ """
+ # Calculate valid frame range based on max_frame_num_to_track
+ # We prevent pre-fetching beyond the tracking window relative to current frame
+ if max_frame_num_to_track is not None:
+ if propagate_in_video_start_frame_idx is None:
+ propagate_in_video_start_frame_idx = 0
+ if track_in_reverse:
+ # When going backwards, limit how far back we can go from current frame
+ valid_frame_start = max(
+ 0,
+ propagate_in_video_start_frame_idx - max_frame_num_to_track + 1,
+ )
+ valid_frame_end = num_frames
+ else:
+ # When going forwards, limit how far ahead we can go from current frame
+ valid_frame_start = 0
+ valid_frame_end = min(
+ num_frames,
+ propagate_in_video_start_frame_idx + max_frame_num_to_track,
+ )
+ else:
+ # No tracking limit specified, use full video range
+ valid_frame_start = 0
+ valid_frame_end = num_frames
+
+ # Step 1: fetch the FA outputs in the current chunk from buffer
+ frame_idx_curr_b = frame_idx - frame_idx % self.world_size
+ frame_idx_curr_e = min(frame_idx_curr_b + self.world_size, num_frames)
+
+ # Clamp the current chunk to the valid tracking range
+ frame_idx_curr_b = max(frame_idx_curr_b, valid_frame_start)
+ frame_idx_curr_e = min(frame_idx_curr_e, valid_frame_end)
+ # in case the current frame's FA results are not in the buffer yet, build the current chunk
+ # (this should only happen on the first chunk, since we are also building the next chunk below)
+ if frame_idx not in multigpu_buffer:
+ with torch.profiler.record_function("build_multigpu_buffer_next_chunk1"):
+ self._build_multigpu_buffer_next_chunk(
+ backbone_out=backbone_out,
+ find_inputs=find_inputs,
+ geometric_prompt=geometric_prompt,
+ frame_idx_begin=frame_idx_curr_b,
+ frame_idx_end=frame_idx_curr_e,
+ num_frames=num_frames,
+ multigpu_buffer=multigpu_buffer,
+ run_nms=run_nms,
+ nms_prob_thresh=nms_prob_thresh,
+ nms_iou_thresh=nms_iou_thresh,
+ nms_use_iom=nms_use_iom,
+ feature_cache=feature_cache,
+ )
+
+ # read out the current frame's results from `multigpu_buffer`
+ out = {}
+ for k, (v, handle) in multigpu_buffer[frame_idx].items():
+ if self.is_multiplex:
+ if (
+ k.startswith("interactive_backbone_")
+ or k.startswith("propagation_backbone_")
+ ) and not return_sam2_backbone_feats:
+ continue
+ else:
+ if k.startswith("sam2_backbone_") and not return_sam2_backbone_feats:
+ continue
+ if handle is not None:
+ handle.wait() # wait for async all-gather to finish
+ out[k] = v
+
+ # Step 2: remove FA outputs of the previous chunk from cache to save GPU memory
+ if not track_in_reverse and frame_idx_curr_b - self.world_size >= 0:
+ frame_idx_prev_e = frame_idx_curr_b
+ frame_idx_prev_b = frame_idx_curr_b - self.world_size
+ elif track_in_reverse and frame_idx_curr_e < num_frames:
+ frame_idx_prev_b = frame_idx_curr_e
+ frame_idx_prev_e = min(frame_idx_prev_b + self.world_size, num_frames)
+ else:
+ frame_idx_prev_b = frame_idx_prev_e = None
+ if frame_idx_prev_b is not None:
+ for frame_idx_rm in range(frame_idx_prev_b, frame_idx_prev_e):
+ multigpu_buffer.pop(frame_idx_rm, None)
+
+ # Step 3: compute and cache FA outputs of the next chunk ahead of time
+ # (so that we can overlap computation with all-gather transfer)
+ # Respect tracking bounds when calculating next chunk
+
+ if not track_in_reverse and frame_idx_curr_e < valid_frame_end:
+ frame_idx_next_b = frame_idx_curr_e
+ frame_idx_next_e = min(frame_idx_next_b + self.world_size, valid_frame_end)
+ elif (
+ track_in_reverse and frame_idx_curr_b - self.world_size >= valid_frame_start
+ ):
+ frame_idx_next_e = frame_idx_curr_b
+ frame_idx_next_b = max(
+ frame_idx_curr_b - self.world_size, valid_frame_start
+ )
+ else:
+ frame_idx_next_b = frame_idx_next_e = None
+ if frame_idx_next_b is not None and frame_idx_next_b not in multigpu_buffer:
+ with torch.profiler.record_function("build_multigpu_buffer_next_chunk2"):
+ self._build_multigpu_buffer_next_chunk(
+ backbone_out=backbone_out,
+ find_inputs=find_inputs,
+ geometric_prompt=geometric_prompt,
+ frame_idx_begin=frame_idx_next_b,
+ frame_idx_end=frame_idx_next_e,
+ num_frames=num_frames,
+ multigpu_buffer=multigpu_buffer,
+ run_nms=run_nms,
+ nms_prob_thresh=nms_prob_thresh,
+ nms_iou_thresh=nms_iou_thresh,
+ feature_cache=feature_cache,
+ )
+
+ return out, backbone_out
+
+ def _build_multigpu_buffer_next_chunk(
+ self,
+ backbone_out,
+ find_inputs,
+ geometric_prompt: Prompt,
+ frame_idx_begin,
+ frame_idx_end,
+ num_frames,
+ multigpu_buffer,
+ run_nms=False,
+ nms_prob_thresh=None,
+ nms_iou_thresh=None,
+ nms_use_iom=False,
+ feature_cache=None,
+ ):
+ """Compute FA outputs on a chunk of frames and store their results in multigpu_buffer."""
+ # each GPU computes FA on one frame in the chunk (in a round-robin manner)
+ frame_idx_local_gpu = min(frame_idx_begin + self.rank, frame_idx_end - 1)
+ # `forward_grounding` (from base class `Sam3MultiplexImageBase`) runs FA on a single frame
+ with torch.profiler.record_function("forward_grounding"):
+ out_local = self.forward_grounding(
+ backbone_out=backbone_out,
+ # HACK: Since find_inputs is on GPU having to realloc is expensive so changing the values in place for the prod usecase
+ # i.e. when using the streaming frame loader resource instead of local file. For non-prod is always
+ # frame_idx_local_gpu < len(find_inputs) so should be a no-op
+ find_input=find_inputs[frame_idx_local_gpu % len(find_inputs)],
+ find_target=None,
+ geometric_prompt=geometric_prompt,
+ feature_cache=feature_cache,
+ )
+ if run_nms:
+ with torch.profiler.record_function("nms_masks"):
+ # run NMS as a post-processing step on top of the detection outputs
+ assert nms_prob_thresh is not None and nms_iou_thresh is not None
+ pred_probs = out_local["pred_logits"].squeeze(-1).sigmoid()
+ pred_masks = out_local["pred_masks"]
+ # loop over text prompts (not an overhead for demo where there's only 1 prompt)
+ for prompt_idx in range(pred_probs.size(0)):
+ keep = nms_masks(
+ pred_probs=pred_probs[prompt_idx],
+ pred_masks=pred_masks[prompt_idx],
+ prob_threshold=nms_prob_thresh,
+ iou_threshold=nms_iou_thresh,
+ nms_use_iom=nms_use_iom,
+ do_compile=getattr(self, "compile_model", False),
+ running_in_prod=getattr(self, "running_in_prod", False),
+ )
+ # set a very low threshold for those detections removed by NMS
+ out_local["pred_logits"][prompt_idx, :, 0] -= 1e4 * (~keep).float()
+
+ if self.gather_backbone_out:
+ # gather the SAM 2 backbone features across GPUs
+ if self.is_multiplex:
+ # Note that we should not need to compute the interaction features every frame
+ # TODO: rooms for optimization
+
+ # Interaction features
+ inte_feats = out_local["prev_encoder_out"]["backbone_out"][
+ "interactive"
+ ]
+ assert inte_feats["vision_mask"] is None
+ assert (
+ len(inte_feats["backbone_fpn"]) == 3
+ ) # SAM2 backbone always have 3 levels
+ assert all(x.mask is None for x in inte_feats["backbone_fpn"])
+ # cast the SAM2 backbone features to bfloat16 for all-gather (this is usually
+ # a no-op, SAM2 backbone features are likely already in bfloat16 due to AMP)
+ inte_backbone_fpn_bf16 = [
+ x.to(torch.bfloat16) for x in inte_feats["backbone_fpn"]
+ ]
+ inte_fpn0, inte_fpn_handle0 = self._gather_tensor(
+ inte_backbone_fpn_bf16[0].tensors
+ )
+ inte_fpn1, inte_fpn_handle1 = self._gather_tensor(
+ inte_backbone_fpn_bf16[1].tensors
+ )
+ inte_fpn2, inte_fpn_handle2 = self._gather_tensor(
+ inte_backbone_fpn_bf16[2].tensors
+ )
+ # vision_pos_enc is the same on all frames, so no need to all-gather them
+ inte_vision_pos_enc = inte_feats["vision_pos_enc"]
+
+ feats = out_local["prev_encoder_out"]["backbone_out"]["sam2_backbone_out"]
+ assert feats["vision_mask"] is None
+ assert len(feats["backbone_fpn"]) == 3 # SAM2 backbone always have 3 levels
+ assert all(x.mask is None for x in feats["backbone_fpn"])
+ # cast the SAM2 backbone features to bfloat16 for all-gather (this is usually
+ # a no-op, SAM2 backbone features are likely already in bfloat16 due to AMP)
+ backbone_fpn_bf16 = [x.to(torch.bfloat16) for x in feats["backbone_fpn"]]
+ fpn0, fpn_handle0 = self._gather_tensor(backbone_fpn_bf16[0].tensors)
+ fpn1, fpn_handle1 = self._gather_tensor(backbone_fpn_bf16[1].tensors)
+ fpn2, fpn_handle2 = self._gather_tensor(backbone_fpn_bf16[2].tensors)
+ # vision_pos_enc is the same on all frames, so no need to all-gather them
+ vision_pos_enc = feats["vision_pos_enc"]
+
+ # trim the FA output to only include the necessary keys
+ out_local = {
+ "pred_logits": out_local["pred_logits"],
+ "pred_boxes": out_local["pred_boxes"],
+ "pred_boxes_xyxy": out_local["pred_boxes_xyxy"],
+ "pred_masks": out_local["pred_masks"],
+ "pred_object_ids": self._get_dummy_object_ids(out_local["pred_logits"]),
+ }
+
+ # gather the results: after this step, each GPU will receive FA outputs on
+ # all frames in the chunk and store them in `multigpu_buffer`
+ out_gathered = {k: self._gather_tensor(v) for k, v in out_local.items()}
+ for rank in range(self.world_size):
+ frame_idx_to_save = frame_idx_begin + rank
+ if frame_idx_to_save >= num_frames:
+ continue
+ frame_buffer = {
+ k: (v[rank], handle) for k, (v, handle) in out_gathered.items()
+ }
+ if self.gather_backbone_out:
+ # also add gathered SAM 2 backbone features to frame_buffer
+ if self.is_multiplex:
+ frame_buffer["interactive_backbone_fpn_0"] = (
+ inte_fpn0[rank],
+ inte_fpn_handle0,
+ )
+ frame_buffer["interactive_backbone_fpn_1"] = (
+ inte_fpn1[rank],
+ inte_fpn_handle1,
+ )
+ frame_buffer["interactive_backbone_fpn_2"] = (
+ inte_fpn2[rank],
+ inte_fpn_handle2,
+ )
+ frame_buffer["interactive_backbone_pos_enc"] = (
+ inte_vision_pos_enc,
+ None,
+ )
+ frame_buffer["sam2_backbone_fpn_0"] = (fpn0[rank], fpn_handle0)
+ frame_buffer["sam2_backbone_fpn_1"] = (fpn1[rank], fpn_handle1)
+ frame_buffer["sam2_backbone_fpn_2"] = (fpn2[rank], fpn_handle2)
+ frame_buffer["sam2_backbone_pos_enc"] = (vision_pos_enc, None)
+
+ multigpu_buffer[frame_idx_to_save] = frame_buffer
+
+ def _gather_tensor(self, x):
+ if self.world_size == 1:
+ return [x], None
+
+ async_op = self.async_all_gather
+ # here `.contiguous()` is required -- otherwise NCCL all_gather
+ # sometimes gives wrong results (based on Ronghang's observations)
+ x = x.contiguous() # ensure contiguous memory for NCCL
+ output_list = [torch.empty_like(x) for _ in range(self.world_size)]
+ handle = torch.distributed.all_gather(output_list, x, async_op=async_op)
+ return output_list, handle
+
+ def forward_video_grounding_batched_multigpu(
+ self,
+ backbone_out,
+ find_inputs,
+ geometric_prompt: Prompt,
+ frame_idx,
+ num_frames,
+ # `grounding_cache` is a dict to cache FA outputs in a chunk between different calls
+ grounding_cache,
+ track_in_reverse=False,
+ # whether to also return the SAM2 backbone features (in addition to FA results)
+ return_sam2_backbone_feats=False,
+ # whether to perform NMS and suppress the scores of those detections removed by NMS
+ run_nms=False,
+ nms_prob_thresh=None,
+ nms_iou_thresh=None,
+ nms_use_iom=False,
+ # tracking bounds to respect max_frame_num_to_track
+ max_frame_num_to_track=None,
+ propagate_in_video_start_frame_idx=None,
+ # feature_cache for buffered backbone computation
+ feature_cache=None,
+ # batch_size for batched forward_grounding (default: 16)
+ batch_size=16,
+ ):
+ """
+ Fully batched forward_grounding that processes chunks of frames together on each GPU.
+
+ Unlike forward_video_grounding_multigpu which processes 1 frame per GPU per chunk,
+ this method processes `batch_size` frames at once using the batched forward_grounding
+ approach from Sam3MultiplexImageBase.
+
+ For single-GPU (world_size=1), this is equivalent to forward_grounding_batched.
+ For multi-GPU, each GPU processes batch_size frames in parallel.
+
+ Args:
+ backbone_out: Dictionary containing backbone outputs and image batch.
+ find_inputs: List of FindStage objects for all frames.
+ geometric_prompt: Prompt object (used as template, individual prompts are
+ constructed from find_inputs for batching).
+ frame_idx: Current frame index to process.
+ num_frames: Total number of frames in the video.
+ grounding_cache: Dictionary to cache grounding outputs.
+ track_in_reverse: If True, processing in reverse frame order.
+ return_sam2_backbone_feats: Whether to also return SAM2 backbone features.
+ run_nms: Whether to perform NMS on detection outputs.
+ nms_prob_thresh: Probability threshold for NMS.
+ nms_iou_thresh: IoU threshold for NMS.
+ nms_use_iom: Whether to use IoM for NMS.
+ max_frame_num_to_track: Maximum number of frames to track.
+ propagate_in_video_start_frame_idx: Start frame index for propagation.
+ feature_cache: Optional dictionary for backbone feature caching.
+ batch_size: Number of frames to batch together per GPU (default: 16).
+
+ Returns:
+ Tuple of (out, backbone_out) where out contains detection results for frame_idx.
+ """
+ # Calculate valid frame range based on max_frame_num_to_track
+ if max_frame_num_to_track is not None:
+ if propagate_in_video_start_frame_idx is None:
+ propagate_in_video_start_frame_idx = 0
+ if track_in_reverse:
+ valid_frame_start = (
+ propagate_in_video_start_frame_idx - max_frame_num_to_track + 1
+ )
+ valid_frame_end = propagate_in_video_start_frame_idx
+ else:
+ valid_frame_start = propagate_in_video_start_frame_idx
+ valid_frame_end = (
+ propagate_in_video_start_frame_idx + max_frame_num_to_track
+ )
+ else:
+ valid_frame_start = 0
+ valid_frame_end = num_frames
+
+ # Initialize grounding_buffer if not present
+ if "grounding_buffer" not in grounding_cache:
+ grounding_cache["grounding_buffer"] = {}
+
+ # Calculate chunk boundaries - use batch_size instead of world_size
+ chunk_start = (frame_idx // batch_size) * batch_size
+ chunk_end = min(chunk_start + batch_size, valid_frame_end)
+ chunk_key = (chunk_start, chunk_end)
+
+ # Process chunk if not already cached
+ if chunk_key not in grounding_cache["grounding_buffer"]:
+ with torch.profiler.record_function(
+ "forward_grounding_batched.process_chunk"
+ ):
+ chunk_outputs = self._process_grounding_chunk_batched(
+ backbone_out=backbone_out,
+ find_inputs=find_inputs,
+ chunk_start=chunk_start,
+ chunk_end=chunk_end,
+ run_nms=run_nms,
+ nms_prob_thresh=nms_prob_thresh,
+ nms_iou_thresh=nms_iou_thresh,
+ nms_use_iom=nms_use_iom,
+ feature_cache=feature_cache,
+ return_sam2_backbone_feats=return_sam2_backbone_feats,
+ )
+ grounding_cache["grounding_buffer"][chunk_key] = chunk_outputs
+
+ # Auto-cleanup previous chunks
+ self._cleanup_previous_chunks_multigpu(
+ grounding_cache=grounding_cache,
+ current_chunk_key=chunk_key,
+ batch_size=batch_size,
+ num_frames=num_frames,
+ track_in_reverse=track_in_reverse,
+ )
+
+ # Retrieve the cached output for this frame
+ chunk_outputs = grounding_cache["grounding_buffer"][chunk_key]
+ local_idx = frame_idx - chunk_start
+
+ # Slice out the output for this specific frame
+ out = self._slice_batched_output(
+ chunk_outputs, local_idx, return_sam2_backbone_feats
+ )
+
+ return out, backbone_out
+
+ def _process_grounding_chunk_batched(
+ self,
+ backbone_out,
+ find_inputs,
+ chunk_start: int,
+ chunk_end: int,
+ run_nms: bool,
+ nms_prob_thresh,
+ nms_iou_thresh,
+ nms_use_iom: bool,
+ feature_cache,
+ return_sam2_backbone_feats: bool,
+ ):
+ """
+ Process a chunk of frames through the full forward_grounding pipeline in batch.
+ """
+ chunk_size = chunk_end - chunk_start
+
+ # Build geometric prompts for the chunk
+ chunk_geo_prompts = [
+ self._get_geo_prompt_from_find_input(find_inputs[i % len(find_inputs)])
+ for i in range(chunk_start, chunk_end)
+ ]
+
+ # Batch the find_inputs for this chunk
+ batched_find_input = self._batch_find_inputs(
+ find_inputs, chunk_start, chunk_end
+ )
+
+ # Batch the geometric prompts
+ batched_geometric_prompt = self._batch_geometric_prompts_from_list(
+ chunk_geo_prompts
+ )
+
+ # Run forward_grounding on the batched input
+ with torch.profiler.record_function("forward_grounding_batched.forward"):
+ out = self.forward_grounding(
+ backbone_out=backbone_out,
+ find_input=batched_find_input,
+ find_target=None,
+ geometric_prompt=batched_geometric_prompt,
+ feature_cache=feature_cache,
+ )
+
+ # Apply NMS per frame in the batch
+ if run_nms:
+ with torch.profiler.record_function("forward_grounding_batched.nms"):
+ assert nms_prob_thresh is not None and nms_iou_thresh is not None
+ pred_probs = out["pred_logits"].squeeze(-1).sigmoid()
+ pred_masks = out["pred_masks"]
+ # pred_probs shape: [batch_size, num_queries]
+ # pred_masks shape: [batch_size, num_queries, H, W]
+ # Use batched NMS to process all frames at once
+ keep = nms_masks(
+ pred_probs=pred_probs,
+ pred_masks=pred_masks,
+ prob_threshold=nms_prob_thresh,
+ iou_threshold=nms_iou_thresh,
+ nms_use_iom=nms_use_iom,
+ do_compile=getattr(self, "compile_model", False),
+ running_in_prod=getattr(self, "running_in_prod", False),
+ )
+ # Set a very low threshold for detections removed by NMS
+ # keep shape: [batch_size, num_queries]
+ out["pred_logits"][:, :, 0] -= 1e4 * (~keep).float()
+
+ # Extract SAM2 backbone features if requested
+ if return_sam2_backbone_feats and "prev_encoder_out" in out:
+ backbone_data = out["prev_encoder_out"]["backbone_out"]
+ if self.is_multiplex and "interactive" in backbone_data:
+ out["_interactive_backbone"] = backbone_data["interactive"]
+ if "sam2_backbone_out" in backbone_data:
+ out["_sam2_backbone"] = backbone_data["sam2_backbone_out"]
+
+ out["_chunk_size"] = chunk_size
+ return out
+
+ def _slice_batched_output(
+ self,
+ chunk_outputs,
+ local_idx: int,
+ return_sam2_backbone_feats: bool,
+ ):
+ """
+ Slice a single frame's output from the batched chunk outputs.
+ """
+ out = {}
+
+ # Keys to slice at batch dimension
+ batch_dim_keys = {
+ "pred_logits",
+ "pred_boxes",
+ "pred_boxes_xyxy",
+ "pred_masks",
+ "pred_logits_o2m",
+ "pred_boxes_o2m",
+ "pred_boxes_xyxy_o2m",
+ "pred_masks_o2m",
+ "queries",
+ "presence_logit_dec",
+ }
+
+ # Keys to skip
+ skip_keys = {
+ "_chunk_size",
+ "_interactive_backbone",
+ "_sam2_backbone",
+ "prev_encoder_out",
+ "encoder_hidden_states",
+ "aux_outputs",
+ }
+
+ for key, value in chunk_outputs.items():
+ if key in skip_keys:
+ continue
+ if key in batch_dim_keys and isinstance(value, torch.Tensor):
+ out[key] = value[local_idx : local_idx + 1]
+ elif isinstance(value, torch.Tensor):
+ try:
+ out[key] = value[local_idx : local_idx + 1]
+ except (IndexError, RuntimeError):
+ out[key] = value
+
+ # Add object IDs
+ if "pred_logits" in out:
+ out["pred_object_ids"] = self._get_dummy_object_ids(out["pred_logits"])
+
+ # Add SAM2 backbone features if requested
+ if return_sam2_backbone_feats:
+ if "_sam2_backbone" in chunk_outputs:
+ sam2_bb = chunk_outputs["_sam2_backbone"]
+ out["sam2_backbone_fpn_0"] = sam2_bb["backbone_fpn"][0].tensors[
+ local_idx : local_idx + 1
+ ]
+ out["sam2_backbone_fpn_1"] = sam2_bb["backbone_fpn"][1].tensors[
+ local_idx : local_idx + 1
+ ]
+ out["sam2_backbone_fpn_2"] = sam2_bb["backbone_fpn"][2].tensors[
+ local_idx : local_idx + 1
+ ]
+ out["sam2_backbone_pos_enc"] = [
+ x[local_idx : local_idx + 1] for x in sam2_bb["vision_pos_enc"]
+ ]
+
+ if self.is_multiplex and "_interactive_backbone" in chunk_outputs:
+ inte_bb = chunk_outputs["_interactive_backbone"]
+ out["interactive_backbone_fpn_0"] = inte_bb["backbone_fpn"][0].tensors[
+ local_idx : local_idx + 1
+ ]
+ out["interactive_backbone_fpn_1"] = inte_bb["backbone_fpn"][1].tensors[
+ local_idx : local_idx + 1
+ ]
+ out["interactive_backbone_fpn_2"] = inte_bb["backbone_fpn"][2].tensors[
+ local_idx : local_idx + 1
+ ]
+ out["interactive_backbone_pos_enc"] = [
+ x[local_idx : local_idx + 1] for x in inte_bb["vision_pos_enc"]
+ ]
+
+ return out
+
+ def _cleanup_previous_chunks_multigpu(
+ self,
+ grounding_cache,
+ current_chunk_key,
+ batch_size: int,
+ num_frames: int,
+ track_in_reverse: bool,
+ ):
+ """Remove previous chunks from cache to save GPU memory."""
+ chunk_start, chunk_end = current_chunk_key
+
+ if not track_in_reverse:
+ prev_chunk_start = chunk_start - batch_size
+ if prev_chunk_start >= 0:
+ prev_chunk_end = chunk_start
+ prev_chunk_key = (prev_chunk_start, prev_chunk_end)
+
+ # Cleanup grounding_buffer entry
+ chunk = grounding_cache["grounding_buffer"].pop(prev_chunk_key, None)
+ if chunk is not None:
+ del chunk
+ else:
+ next_chunk_start = chunk_end
+ if next_chunk_start < num_frames:
+ next_chunk_end = min(next_chunk_start + batch_size, num_frames)
+ next_chunk_key = (next_chunk_start, next_chunk_end)
+ grounding_cache["grounding_buffer"].pop(next_chunk_key, None)
diff --git a/sam3/model/sam3_multiplex_detector_utils.py b/sam3/model/sam3_multiplex_detector_utils.py
new file mode 100644
index 0000000..26eb9a9
--- /dev/null
+++ b/sam3/model/sam3_multiplex_detector_utils.py
@@ -0,0 +1,369 @@
+import logging
+
+import numpy as np
+import torch
+from sam3 import perflib
+
+try:
+ # Ronghang's generic GPU NMS implementation; install via
+ # pip uninstall -y torch_generic_nms; TORCH_CUDA_ARCH_LIST="8.0 9.0" pip install git+https://github.com/ronghanghu/torch_generic_nms
+ from torch_generic_nms import generic_nms
+
+ GENERIC_NMS_AVAILABLE = True
+except ImportError:
+ GENERIC_NMS_AVAILABLE = False
+
+from sam3.perflib.masks_ops import mask_iou
+from sam3.train.masks_ops import mask_iom
+
+
+def nms_masks(
+ pred_probs: torch.Tensor,
+ pred_masks: torch.Tensor,
+ prob_threshold: float,
+ iou_threshold: float,
+ nms_use_iom: bool = False,
+ do_compile: bool = False,
+ running_in_prod: bool = False,
+) -> torch.Tensor:
+ """
+ Args:
+ - pred_probs: (num_det,) or (B, num_det) float Tensor, containing the score (probability) of each detection
+ - pred_masks: (num_det, H_mask, W_mask) or (B, num_det, H_mask, W_mask) float Tensor, containing the binary segmentation mask of each detection
+ - prob_threshold: float, score threshold to prefilter detections (NMS is performed on detections above threshold)
+ - iou_threshold: float, mask IoU threshold for NMS (it would also be used as IoM threshold if `nms_use_iom` is True)
+ - nms_use_iom: bool, if True, use IoM instead of IoU for NMS
+ - do_compile: bool, whether to compile the function for optimization
+ - running_in_prod: bool, whether the function is running in production (ie, in Instagram)
+
+ Returns:
+ - keep: (num_det,) or (B, num_det) bool Tensor, indicating whether each detection is kept after score thresholding + NMS
+ """
+ if do_compile and perflib.is_enabled:
+ # Apply torch.compile with the same settings as before
+ compiled_fn = torch.compile(
+ _nms_masks_core,
+ mode="max-autotune",
+ fullgraph=True,
+ # dynamic=False,
+ )
+ return compiled_fn(
+ pred_probs, pred_masks, prob_threshold, iou_threshold, nms_use_iom
+ )
+ else:
+ return _nms_masks_core(
+ pred_probs, pred_masks, prob_threshold, iou_threshold, nms_use_iom
+ )
+
+
+def _nms_masks_core(
+ pred_probs: torch.Tensor,
+ pred_masks: torch.Tensor,
+ prob_threshold: float,
+ iou_threshold: float,
+ nms_use_iom: bool = False,
+) -> torch.Tensor:
+ """Core NMS implementation without compilation.
+
+ Supports both single-frame and batched inputs:
+ - Single-frame: pred_probs (num_det,), pred_masks (num_det, H, W)
+ - Batched: pred_probs (B, num_det), pred_masks (B, num_det, H, W)
+
+ Returns:
+ - keep: bool Tensor with same leading dimensions as input, indicating kept detections
+ """
+ # Check if input is batched (has batch dimension)
+ is_batched = pred_probs.dim() == 2
+
+ if is_batched:
+ return _nms_masks_core_batched(
+ pred_probs, pred_masks, prob_threshold, iou_threshold, nms_use_iom
+ )
+ else:
+ # Single-frame input: use original logic
+ return _nms_masks_core_single(
+ pred_probs, pred_masks, prob_threshold, iou_threshold, nms_use_iom
+ )
+
+
+def _nms_masks_core_batched(
+ pred_probs: torch.Tensor,
+ pred_masks: torch.Tensor,
+ prob_threshold: float,
+ iou_threshold: float,
+ nms_use_iom: bool = False,
+) -> torch.Tensor:
+ """Core NMS implementation for batched inputs using vectorized operations.
+
+ Args:
+ - pred_probs: (B, num_det) float Tensor
+ - pred_masks: (B, num_det, H_mask, W_mask) float Tensor
+ - prob_threshold: float, score threshold to prefilter detections
+ - iou_threshold: float, mask IoU/IoM threshold for NMS
+ - nms_use_iom: bool, if True, use IoM instead of IoU for NMS
+
+ Returns:
+ - keep: (B, num_det) bool Tensor
+ """
+ B, num_det, H, W = pred_masks.shape
+ device = pred_masks.device
+
+ is_valid = pred_probs > prob_threshold # (B, num_det)
+ masks_binary = pred_masks > 0 # (B, num_det, H, W)
+
+ if perflib.is_enabled:
+ # Compute batched pairwise IoU/IoM
+ if nms_use_iom:
+ overlaps = _batched_mask_iom(masks_binary) # (B, num_det, num_det)
+ else:
+ overlaps = _batched_mask_iou(masks_binary) # (B, num_det, num_det)
+ keep = _batched_generic_nms_mask(overlaps, pred_probs, is_valid, iou_threshold)
+ return keep
+
+ # Non-perflib path: compute batched IoU/IoM
+ if nms_use_iom:
+ overlaps = _batched_mask_iom(masks_binary) # (B, num_det, num_det)
+ else:
+ overlaps = _batched_mask_iou(masks_binary) # (B, num_det, num_det)
+
+ # Apply batched NMS
+ keep = _batched_generic_nms_mask(overlaps, pred_probs, is_valid, iou_threshold)
+ return keep
+
+
+def _batched_mask_iou(masks: torch.Tensor) -> torch.Tensor:
+ """Compute batched pairwise IoU for masks.
+
+ Args:
+ - masks: (B, N, H, W) bool Tensor
+
+ Returns:
+ - ious: (B, N, N) float Tensor
+ """
+ B, N, H, W = masks.shape
+ # Flatten spatial dims: (B, N, H*W)
+ masks_flat = masks.reshape(B, N, -1).float()
+
+ # Compute intersection via batched matrix multiplication: (B, N, N)
+ intersection = torch.bmm(masks_flat, masks_flat.transpose(1, 2))
+
+ # Compute areas: (B, N)
+ areas = masks_flat.sum(dim=-1)
+
+ # Compute union: (B, N, N)
+ union = areas.unsqueeze(2) + areas.unsqueeze(1) - intersection
+
+ return intersection / (union + 1e-8)
+
+
+def _batched_mask_iom(masks: torch.Tensor) -> torch.Tensor:
+ """Compute batched pairwise IoM (Intersection over Minimum) for masks.
+
+ Args:
+ - masks: (B, N, H, W) bool Tensor
+
+ Returns:
+ - ioms: (B, N, N) float Tensor
+ """
+ B, N, H, W = masks.shape
+ # Flatten spatial dims: (B, N, H*W)
+ masks_flat = masks.reshape(B, N, -1).float()
+
+ # Compute intersection via batched matrix multiplication: (B, N, N)
+ intersection = torch.bmm(masks_flat, masks_flat.transpose(1, 2))
+
+ # Compute areas: (B, N)
+ areas = masks_flat.sum(dim=-1)
+
+ # Compute min area: (B, N, N)
+ min_area = torch.minimum(areas.unsqueeze(2), areas.unsqueeze(1))
+
+ return intersection / (min_area + 1e-8)
+
+
+def _batched_generic_nms_mask(
+ ious: torch.Tensor,
+ scores: torch.Tensor,
+ is_valid: torch.Tensor,
+ iou_threshold: float,
+) -> torch.Tensor:
+ """Batched NMS using vectorized operations.
+
+ Args:
+ - ious: (B, N, N) float Tensor, pairwise IoU/IoM matrix
+ - scores: (B, N) float Tensor, detection scores
+ - is_valid: (B, N) bool Tensor, valid detections mask
+ - iou_threshold: float, threshold for suppression
+
+ Returns:
+ - keep: (B, N) bool Tensor
+ """
+ B, N = scores.shape
+ device = scores.device
+
+ # Sort by score descending for each batch: (B, N)
+ order = scores.argsort(dim=-1, descending=True)
+
+ # Create batch indices for advanced indexing
+ batch_idx = torch.arange(B, device=device).unsqueeze(1).expand(B, N)
+
+ # Reorder IoU matrix according to sorted scores: (B, N, N)
+ # ious_sorted[b, i, j] = ious[b, order[b, i], order[b, j]]
+ ious_sorted = ious[batch_idx.unsqueeze(2), order.unsqueeze(2), order.unsqueeze(1)]
+
+ # Create threshold mask: (B, N, N)
+ threshold_mask = ious_sorted > iou_threshold
+
+ # Initialize keep mask with valid detections in sorted order: (B, N)
+ keep = is_valid[batch_idx, order]
+
+ # Upper triangular mask to avoid double processing: (N, N)
+ triu = torch.triu(torch.ones(N, N, device=device, dtype=torch.bool), diagonal=1)
+
+ # Vectorized NMS - iterate through detections
+ for i in range(N):
+ # For each position i, suppress later detections with high overlap
+ # Only suppress if current detection is kept
+ suppress = (
+ threshold_mask[:, i, :] & triu[i].unsqueeze(0) & keep[:, i].unsqueeze(1)
+ )
+ keep = keep & ~suppress
+
+ # Return keep mask in original order: (B, N)
+ original_keep = torch.zeros_like(keep)
+ original_keep[batch_idx, order] = keep
+ return original_keep
+
+
+def _nms_masks_core_single(
+ pred_probs: torch.Tensor,
+ pred_masks: torch.Tensor,
+ prob_threshold: float,
+ iou_threshold: float,
+ nms_use_iom: bool = False,
+) -> torch.Tensor:
+ """Core NMS implementation for a single frame (no batch dimension).
+
+ Args:
+ - pred_probs: (num_det,) float Tensor
+ - pred_masks: (num_det, H_mask, W_mask) float Tensor
+ - prob_threshold: float, score threshold to prefilter detections
+ - iou_threshold: float, mask IoU/IoM threshold for NMS
+ - nms_use_iom: bool, if True, use IoM instead of IoU for NMS
+
+ Returns:
+ - keep: (num_det,) bool Tensor
+ """
+ is_valid = pred_probs > prob_threshold # (num_det,)
+
+ if perflib.is_enabled:
+ masks_binary = pred_masks > 0 # (num_det, H_mask, W_mask)
+ if nms_use_iom:
+ ious = perf_mask_iom(masks_binary, masks_binary) # (num_det, num_det)
+ else:
+ ious = perf_mask_iou(masks_binary, masks_binary) # (num_det, num_det)
+ kept_mask = generic_nms_mask(ious, pred_probs, is_valid, iou_threshold)
+ return kept_mask
+ # prefilter the detections with prob_threshold ("valid" are those above prob_threshold)
+ probs = pred_probs[is_valid] # (num_valid,)
+ masks_binary = pred_masks[is_valid] > 0 # (num_valid, H_mask, W_mask)
+ if probs.numel() == 0:
+ return is_valid # no valid detection, return empty keep mask
+
+ if nms_use_iom:
+ overlaps = mask_iom(masks_binary, masks_binary) # (num_valid, num_valid)
+ else:
+ overlaps = mask_iou(masks_binary, masks_binary) # (num_valid, num_valid)
+ # kept_inds are the indices among `probs` of those kept detections after NMS
+ if GENERIC_NMS_AVAILABLE:
+ kept_inds = generic_nms(overlaps, probs, iou_threshold, use_iou_matrix=True)
+ else:
+ logging.warning(
+ "Falling back to CPU mask NMS implementation -- please install `torch_generic_nms` via\n\t"
+ 'pip uninstall -y torch_generic_nms; TORCH_CUDA_ARCH_LIST="8.0 9.0" pip install git+https://github.com/ronghanghu/torch_generic_nms'
+ )
+ kept_inds = generic_nms_cpu(overlaps, probs, iou_threshold)
+
+ # valid_inds are the indices among `probs` of valid detections before NMS (or -1 for invalid)
+ valid_inds = torch.where(is_valid, is_valid.cumsum(dim=0) - 1, -1) # (num_det,)
+ keep = torch.isin(valid_inds, kept_inds) # (num_det,)
+ return keep
+
+
+def generic_nms_cpu(
+ ious: torch.Tensor, scores: torch.Tensor, iou_threshold=0.5
+) -> torch.Tensor:
+ """
+ A generic version of `torchvision.ops.nms` that takes a pairwise IoU matrix. (CPU implementation
+ based on https://github.com/jwyang/faster-rcnn.pytorch/blob/master/lib/model/nms/nms_cpu.py)
+ """
+ ious_np = ious.float().detach().cpu().numpy()
+ scores_np = scores.float().detach().cpu().numpy()
+ order = scores_np.argsort()[::-1]
+ kept_inds = []
+ while order.size > 0:
+ i = order.item(0)
+ kept_inds.append(i)
+ inds = np.where(ious_np[i, order[1:]] <= iou_threshold)[0]
+ order = order[inds + 1]
+
+ return torch.tensor(kept_inds, dtype=torch.int64, device=scores.device)
+
+
+def generic_nms_mask(
+ ious: torch.Tensor, scores: torch.Tensor, is_valid: torch.Tensor, iou_threshold=0.5
+) -> torch.Tensor:
+ """
+ A generic version of `torchvision.ops.nms` that takes a pairwise IoU matrix. (CPU implementation
+ using vectorized operations similar to nms_masks_kernel)
+ """
+ # Sort by score descending
+ order = scores.argsort(descending=True)
+
+ # Reorder IoU matrix according to sorted scores
+ ious_sorted = ious[order][:, order]
+
+ # Create threshold mask
+ threshold_mask = ious_sorted > iou_threshold
+
+ # Initialize keep mask
+ # keep = torch.ones(len(scores), device=ious.device, dtype=torch.bool)
+ keep = is_valid[order]
+
+ # Upper triangular mask to avoid double processing
+ tr = torch.triu(torch.ones_like(threshold_mask), diagonal=1)
+
+ # Vectorized NMS
+ for i in range(len(scores)):
+ # Suppress all boxes that have high IoU with current box
+ m = threshold_mask[i]
+ keep = torch.where(m & tr[i], torch.zeros_like(keep), keep)
+
+ # Return keep mask in original order
+ original_keep = torch.zeros_like(keep)
+ original_keep[order] = keep
+ return original_keep
+
+
+def perf_mask_iou(pred_masks: torch.Tensor, gt_masks: torch.Tensor) -> torch.Tensor:
+ """
+ Compute the IoU (Intersection over Union) between predicted masks and ground truth masks.
+
+ Args:
+ - pred_masks: (N, H, W) bool Tensor, containing binary predicted segmentation masks
+ - gt_masks: (M, H, W) bool Tensor, containing binary ground truth segmentation masks
+
+ Returns:
+ - ious: (N, M) float Tensor, containing IoUs for each pair of predicted and ground truth masks
+ """
+ assert pred_masks.dtype == gt_masks.dtype == torch.bool
+ from sam3.perflib.iou import pairwise_iou
+
+ return pairwise_iou(pred_masks, gt_masks, eps=None)
+
+
+def perf_mask_iom(pred_masks: torch.Tensor, gt_masks: torch.Tensor) -> torch.Tensor:
+ assert pred_masks.dtype == gt_masks.dtype == torch.bool
+ from sam3.perflib.iou import pairwise_iom
+
+ return pairwise_iom(pred_masks, gt_masks)
diff --git a/sam3/model/sam3_multiplex_tracking.py b/sam3/model/sam3_multiplex_tracking.py
new file mode 100644
index 0000000..35183a5
--- /dev/null
+++ b/sam3/model/sam3_multiplex_tracking.py
@@ -0,0 +1,3431 @@
+from collections import defaultdict
+from functools import reduce
+from typing import Dict
+
+import numpy as np
+import sam3.model.sam3_multiplex_base
+import sam3.model.sam3_video_base
+import torch
+import torch.distributed as dist
+import torch.nn.functional as F
+from sam3 import perflib
+from sam3.logger import get_logger
+from sam3.model.box_ops import box_xywh_to_cxcywh, box_xyxy_to_xywh
+from sam3.model.data_misc import BatchedDatapoint
+from sam3.model.sam3_multiplex_base import MaskletConfirmationStatus, Sam3MultiplexBase
+from sam3.model.sam3_tracker_utils import fill_holes_in_mask_scores
+from sam3.model.sam3_video_inference import is_image_type
+from sam3.perflib.compile import (
+ clone_output_wrapper,
+ compile_wrapper,
+ shape_logging_wrapper,
+)
+from sam3.perflib.masks_ops import mask_iou, masks_to_boxes as perf_masks_to_boxes
+from torch import Tensor
+from torchvision.ops import masks_to_boxes
+from tqdm.auto import tqdm
+
+logger = get_logger(__name__)
+
+import gc
+from collections.abc import Mapping, Sequence
+from dataclasses import fields, is_dataclass
+from typing import List
+
+from sam3.model.data_misc import (
+ BatchedPointer,
+ convert_my_tensors,
+ FindStage,
+ NestedTensor,
+)
+from sam3.model.geometry_encoders import Prompt
+from sam3.model.io_utils import load_resource_as_video_frames
+
+
+def recursive_to(data, *args, **kwargs):
+ if isinstance(data, torch.Tensor):
+ ret = data.to(*args, **kwargs)
+ elif isinstance(data, np.ndarray):
+ ret = data
+ elif isinstance(data, Mapping):
+ ret = type(data)()
+ for key in data:
+ ret[key] = recursive_to(data[key], *args, **kwargs)
+ elif isinstance(data, tuple):
+ ret = ()
+ for value in data:
+ ret += (recursive_to(value, *args, **kwargs),)
+ elif isinstance(data, Sequence) and not isinstance(data, str):
+ ret = type(data)()
+ for value in data:
+ ret.append(recursive_to(value, *args, **kwargs))
+ elif is_dataclass(data):
+ ret_cls = type(data)
+ ret_fields = {
+ field.name: recursive_to(getattr(data, field.name), *args, **kwargs)
+ for field in fields(data)
+ }
+ ret = ret_cls(**ret_fields)
+ else:
+ ret = data
+ return ret
+
+
+DUMMY_OUTPUT = "DUMMY_OUTPUT"
+
+
+class Sam3MultiplexTracking(Sam3MultiplexBase):
+ def __init__(
+ self,
+ image_size=1008,
+ image_mean=(0.5, 0.5, 0.5),
+ image_std=(0.5, 0.5, 0.5),
+ compile_model=False,
+ postprocess_batch_size=1,
+ **kwargs,
+ ):
+ """
+ hotstart_delay: int, the delay (in #frames) before the model starts to yield output, 0 to disable hotstart delay.
+ hotstart_unmatch_thresh: int, remove the object if it has this many unmatched frames within its hotstart_delay period.
+ If `hotstart_delay` is set to 0, this parameter is ignored.
+ hotstart_dup_thresh: int, remove the object if it has overlapped with another object this many frames within its hotstart_delay period.
+ postprocess_batch_size: int, the number of frames to accumulate before running postprocessing. Set to 1 to disable batching.
+ """
+ super().__init__(**kwargs)
+ self.image_size = image_size
+ self.image_mean = image_mean
+ self.image_std = image_std
+ self.compile_model = compile_model
+ self.detector.compile_model = self.compile_model
+ self.postprocess_batch_size = postprocess_batch_size
+
+ TEXT_ID_FOR_TEXT = 0
+ TEXT_ID_FOR_VISUAL = 1
+ TEXT_ID_FOR_GEOMETRIC = 2
+
+ def _construct_initial_input_batch(self, inference_state, images):
+ """Construct an initial `BatchedDatapoint` instance as input."""
+ # 1) img_batch
+ num_frames = len(images)
+ device = inference_state["device"]
+ img_batch = NestedTensor(tensors=images, mask=None)
+
+ # 2) find_text_batch
+ # "" will be replaced by the actual text prompt when adding prompts
+ find_text_batch = ["", "visual", "geometric"]
+
+ # 3) find_inputs
+ input_box_embedding_dim = 258 # historical default
+ input_points_embedding_dim = 257 # historical default
+ dummy_ptrs = BatchedPointer(
+ stage_ids=[], query_ids=[], object_ids=[], ptr_mask=[], ptr_types=[]
+ )
+ stages = [
+ FindStage(
+ img_ids=[stage_id],
+ img_ids_np=np.array([stage_id]),
+ text_ids=[0],
+ input_boxes=[torch.zeros(input_box_embedding_dim)],
+ input_boxes_before_embed=[torch.empty(0, 4)],
+ input_boxes_mask=[torch.empty(0, dtype=torch.bool)],
+ input_boxes_label=[torch.empty(0, dtype=torch.long)],
+ input_points=[torch.empty(0, input_points_embedding_dim)],
+ input_points_before_embed=[torch.empty(0, 3)],
+ input_points_mask=[torch.empty(0)],
+ ptrs=dummy_ptrs,
+ ptrs_seg=dummy_ptrs,
+ object_ids=[],
+ )
+ for stage_id in range(num_frames)
+ ]
+ with torch.profiler.record_function(
+ "Sam3MultiplexTracking._construct_initial_input_batch"
+ ):
+ for i in range(len(stages)):
+ stages[i] = convert_my_tensors(stages[i])
+
+ # construct the final `BatchedDatapoint` and cast to GPU
+ input_batch = BatchedDatapoint(
+ img_batch=img_batch,
+ find_text_batch=find_text_batch,
+ find_inputs=stages,
+ find_targets=[None] * num_frames,
+ get_queries=None,
+ find_metadatas=[None] * num_frames,
+ )
+ with torch.profiler.record_function("Sam3MultiplexTracking.recursive_to"):
+ input_batch = recursive_to(input_batch, device, non_blocking=True)
+ inference_state["input_batch"] = input_batch
+
+ # construct the placeholder interactive prompts and tracking queries
+ bs = 1
+ inference_state["constants"]["empty_geometric_prompt"] = Prompt(
+ box_embeddings=torch.zeros(0, bs, 4, device=device),
+ box_mask=torch.zeros(bs, 0, device=device, dtype=torch.bool),
+ box_labels=torch.zeros(0, bs, device=device, dtype=torch.long),
+ point_embeddings=torch.zeros(0, bs, 2, device=device),
+ point_mask=torch.zeros(bs, 0, device=device, dtype=torch.bool),
+ point_labels=torch.zeros(0, bs, device=device, dtype=torch.long),
+ )
+
+ # constructing an output list in inference state (we start with an empty list)
+ inference_state["previous_stages_out"] = [None] * num_frames
+ inference_state["text_prompt"] = None
+ inference_state["per_frame_raw_point_input"] = [None] * num_frames
+ inference_state["per_frame_raw_box_input"] = [None] * num_frames
+ inference_state["per_frame_visual_prompt"] = [None] * num_frames
+ inference_state["per_frame_geometric_prompt"] = [None] * num_frames
+ inference_state["per_frame_cur_step"] = [0] * num_frames
+
+ # placeholders for cached outputs
+ # (note: currently, a single visual prompt embedding is shared for all frames)
+ inference_state["backbone_out"] = None
+ inference_state["visual_prompt_embed"] = None
+ inference_state["visual_prompt_mask"] = None
+
+ def _get_visual_prompt(self, inference_state, frame_idx, boxes_cxcywh, box_labels):
+ batch_size = 1
+ geometric_prompt = Prompt(
+ box_embeddings=torch.zeros(
+ 0, batch_size, 4, device=inference_state["device"]
+ ),
+ box_mask=torch.zeros(
+ batch_size, 0, device=inference_state["device"], dtype=torch.bool
+ ),
+ point_embeddings=None,
+ point_mask=None,
+ )
+
+ geometric_prompt.append_boxes(
+ boxes=boxes_cxcywh.view(-1, batch_size, 4).to(inference_state["device"]),
+ labels=box_labels.view(-1, batch_size).to(inference_state["device"]),
+ )
+
+ return boxes_cxcywh, box_labels, geometric_prompt
+
+ @torch.inference_mode()
+ def init_state(
+ self,
+ resource_path,
+ offload_video_to_cpu=False,
+ async_loading_frames=False,
+ use_torchcodec=False,
+ use_cv2=False,
+ input_is_mp4=False,
+ ):
+ # Initialize inference state (inlined from Sam3DemoMixin.init_state)
+ if use_torchcodec:
+ video_loader_type = "torchcodec"
+ elif use_cv2:
+ video_loader_type = "cv2"
+ else:
+ video_loader_type = "cv2"
+ images, orig_height, orig_width = load_resource_as_video_frames(
+ resource_path=resource_path,
+ image_size=self.image_size,
+ offload_video_to_cpu=offload_video_to_cpu,
+ img_mean=self.image_mean,
+ img_std=self.image_std,
+ async_loading_frames=async_loading_frames,
+ video_loader_type=video_loader_type,
+ )
+ inference_state = {}
+ inference_state["image_size"] = self.image_size
+ inference_state["num_frames"] = len(images)
+ inference_state["device"] = torch.device("cuda")
+ inference_state["orig_height"] = orig_height
+ inference_state["orig_width"] = orig_width
+ inference_state["constants"] = {}
+ self._construct_initial_input_batch(inference_state, images)
+ # initialize extra states
+ # sam2_inference_states will contain separate inference_states for each frame having new objects if
+ # self.tracker.per_obj_inference is False (bucketized batching), or a single inference_state
+ # containing all objects if self.tracker.per_obj_inference is True (no batching at all).
+ inference_state["sam2_inference_states"] = []
+ inference_state["tracker_metadata"] = {}
+ inference_state["feature_cache"] = {}
+ inference_state["cached_frame_outputs"] = {}
+ inference_state["is_image_only"] = is_image_type(resource_path)
+ return inference_state
+
+ def reset_state(self, inference_state):
+ # Inlined from Sam3DemoMixin.reset_state
+ inference_state["input_batch"].find_text_batch[0] = ""
+ inference_state["text_prompt"] = None
+ for t in range(inference_state["num_frames"]):
+ inference_state["input_batch"].find_inputs[t].text_ids[...] = 0
+ inference_state["previous_stages_out"][t] = None
+ inference_state["per_frame_raw_point_input"][t] = None
+ inference_state["per_frame_raw_box_input"][t] = None
+ inference_state["per_frame_visual_prompt"][t] = None
+ inference_state["per_frame_geometric_prompt"][t] = None
+ inference_state["per_frame_cur_step"][t] = 0
+ inference_state["backbone_out"] = None
+ inference_state["visual_prompt_embed"] = None
+ inference_state["visual_prompt_mask"] = None
+ # reset extra states
+ inference_state["sam2_inference_states"].clear()
+ inference_state["tracker_metadata"].clear()
+ inference_state["feature_cache"].clear()
+ inference_state["cached_frame_outputs"] = {}
+
+ def _get_processing_order(
+ self, inference_state, start_frame_idx, max_frame_num_to_track, reverse
+ ):
+ num_frames = inference_state["num_frames"]
+ previous_stages_out = inference_state["previous_stages_out"]
+ if all(out is None for out in previous_stages_out) and start_frame_idx is None:
+ raise RuntimeError(
+ "No prompts are received on any frames. Please add prompt on at least one frame before propagation."
+ )
+ # set start index, end index, and processing order
+ if start_frame_idx is None:
+ # default: start from the earliest frame with input points
+ start_frame_idx = min(
+ t for t, out in enumerate(previous_stages_out) if out is not None
+ )
+ if max_frame_num_to_track is None:
+ # default: track all the frames in the video
+ max_frame_num_to_track = num_frames
+ if reverse:
+ end_frame_idx = start_frame_idx - max_frame_num_to_track
+ end_frame_idx = max(end_frame_idx, 0)
+ processing_order = range(start_frame_idx - 1, end_frame_idx - 1, -1)
+ else:
+ end_frame_idx = start_frame_idx + max_frame_num_to_track
+ end_frame_idx = min(end_frame_idx, num_frames - 1)
+ processing_order = range(start_frame_idx, end_frame_idx + 1)
+ return processing_order, end_frame_idx
+
+ @torch.inference_mode()
+ def propagate_in_video(
+ self,
+ inference_state,
+ start_frame_idx=None,
+ max_frame_num_to_track=None,
+ reverse=False,
+ output_prob_thresh=0.5,
+ compute_stability_score=False,
+ is_instance_processing=False,
+ **kwargs, # To support passing extra args to child classes
+ ):
+ """
+ Propagate the prompts to get grounding results for the entire video. This method
+ is a generator and yields inference outputs for all frames in the range specified
+ by `start_frame_idx`, `max_frame_num_to_track`, and `reverse`.
+ """
+ # compile the model (it's a no-op if the model is already compiled)
+ # note that it's intentionally added to `self.propagate_in_video`, so that the first
+ # `self.add_prompt` call will be done in eager mode to fill in the decoder buffers
+ # such as positional encoding cache)
+ self._compile_model()
+
+ processing_order, end_frame_idx = self._get_processing_order(
+ inference_state,
+ start_frame_idx,
+ max_frame_num_to_track,
+ reverse=reverse,
+ )
+
+ # Store max_frame_num_to_track in feature_cache for downstream methods
+ inference_state["feature_cache"]["tracking_bounds"] = {
+ "max_frame_num_to_track": max_frame_num_to_track,
+ "propagate_in_video_start_frame_idx": start_frame_idx,
+ }
+
+ hotstart_buffer = []
+ hotstart_removed_obj_ids = set()
+ # when deciding whether to output a masklet on `yield_frame_idx`, we check whether the object is confirmed
+ # in a future frame (`unconfirmed_frame_delay` frames after the current frame). For example, if we require
+ # an object to be detected in 3 consecutive frames to be confirmed, then we look 2 frames in the future --
+ # e.g., we output an object on frame 4 only if it becomes confirmed on frame 6.
+ unconfirmed_status_delay = self.masklet_confirmation_consecutive_det_thresh - 1
+ unconfirmed_obj_ids_per_frame = {} # frame_idx -> hidden_obj_ids
+
+ # Batch postprocessing: accumulate yield_list entries and process every postprocess_batch_size frames
+ postprocess_yield_list = []
+
+ for frame_idx in tqdm(
+ processing_order, desc="propagate_in_video", disable=self.rank > 0
+ ):
+ out = self._run_single_frame_inference(
+ inference_state,
+ frame_idx,
+ reverse,
+ is_instance_processing=is_instance_processing,
+ )
+
+ if self.hotstart_delay > 0:
+ # accumulate the outputs for the first `hotstart_delay` frames
+ hotstart_buffer.append([frame_idx, out])
+ # update the object IDs removed by hotstart so that we don't output them
+ if self.rank == 0:
+ hotstart_removed_obj_ids.update(out["removed_obj_ids"])
+ unconfirmed_obj_ids = out.get("unconfirmed_obj_ids", None)
+ if unconfirmed_obj_ids is not None:
+ unconfirmed_obj_ids_per_frame[frame_idx] = unconfirmed_obj_ids
+
+ if frame_idx == end_frame_idx:
+ # we reached the end of propagation -- yield all frames in the buffer
+ yield_list = hotstart_buffer
+ hotstart_buffer = []
+ elif len(hotstart_buffer) >= self.hotstart_delay:
+ # we have enough frames -- yield and remove the first (oldest) frame from the buffer
+ yield_list = hotstart_buffer[:1]
+ hotstart_buffer = hotstart_buffer[1:]
+ else:
+ # not enough frames yet -- skip yielding
+ yield_list = []
+ else:
+ yield_list = [(frame_idx, out)] # output the current frame
+
+ # Accumulate yield_list into postprocess_yield_list
+ # Snapshot hotstart_removed_obj_ids at the time of accumulation to preserve
+ # the correct state for each frame (important: this set is mutated over time)
+ for yield_frame_idx, yield_out in yield_list:
+ postprocess_yield_list.append(
+ (yield_frame_idx, yield_out, set(hotstart_removed_obj_ids))
+ )
+
+ # Process batch when we have enough frames
+ while len(postprocess_yield_list) >= self.postprocess_batch_size:
+ batch_to_process = postprocess_yield_list[: self.postprocess_batch_size]
+ postprocess_yield_list = postprocess_yield_list[
+ self.postprocess_batch_size :
+ ]
+
+ with torch.profiler.record_function(
+ "Sam3MultiplexTracking.postprocess_output_batched"
+ ):
+ if self.rank == 0:
+ # Prepare batched inputs for postprocessing
+ H_video, W_video = (
+ inference_state["orig_height"],
+ inference_state["orig_width"],
+ )
+ num_frames = inference_state["num_frames"]
+
+ batched_outs = []
+ frame_indices = []
+ for (
+ yield_frame_idx,
+ yield_out,
+ removed_obj_ids_snapshot,
+ ) in batch_to_process:
+ suppressed_obj_ids = yield_out["suppressed_obj_ids"]
+ unconfirmed_status_frame_idx = (
+ yield_frame_idx + unconfirmed_status_delay
+ if not reverse
+ else yield_frame_idx - unconfirmed_status_delay
+ )
+ unconfirmed_status_frame_idx = max(
+ 0, min(unconfirmed_status_frame_idx, num_frames - 1)
+ )
+ unconfirmed_obj_ids = unconfirmed_obj_ids_per_frame.get(
+ unconfirmed_status_frame_idx, None
+ )
+
+ batched_outs.append(
+ (
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ )
+ )
+ frame_indices.append(yield_frame_idx)
+
+ # Cache frame outputs
+ self._cache_frame_outputs(
+ inference_state,
+ yield_frame_idx,
+ yield_out["obj_id_to_mask"],
+ suppressed_obj_ids=suppressed_obj_ids,
+ removed_obj_ids=removed_obj_ids_snapshot,
+ unconfirmed_obj_ids=unconfirmed_obj_ids,
+ )
+
+ if self.postprocess_batch_size > 1:
+ # Process all frames in batch
+ postprocessed_outs = self._postprocess_output_batched(
+ H_video, W_video, batched_outs
+ )
+ else:
+ # Process each frame individually but output together
+ postprocessed_outs = []
+ for (
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ ) in batched_outs:
+ postprocessed_out = self._postprocess_output(
+ inference_state,
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ )
+ postprocessed_outs.append(postprocessed_out)
+
+ # Yield results
+ for yield_frame_idx, postprocessed_out in zip(
+ frame_indices, postprocessed_outs
+ ):
+ yield yield_frame_idx, postprocessed_out
+ else:
+ # No output on other GPUs
+ for yield_frame_idx, _, _ in batch_to_process:
+ yield yield_frame_idx, DUMMY_OUTPUT
+
+ # Flush any remaining frames in the postprocess buffer
+ if len(postprocess_yield_list) > 0:
+ with torch.profiler.record_function(
+ "Sam3MultiplexTracking.postprocess_output_batched"
+ ):
+ if self.rank == 0:
+ H_video, W_video = (
+ inference_state["orig_height"],
+ inference_state["orig_width"],
+ )
+ num_frames = inference_state["num_frames"]
+
+ batched_outs = []
+ frame_indices = []
+ for (
+ yield_frame_idx,
+ yield_out,
+ removed_obj_ids_snapshot,
+ ) in postprocess_yield_list:
+ suppressed_obj_ids = yield_out["suppressed_obj_ids"]
+ unconfirmed_status_frame_idx = (
+ yield_frame_idx + unconfirmed_status_delay
+ if not reverse
+ else yield_frame_idx - unconfirmed_status_delay
+ )
+ unconfirmed_status_frame_idx = max(
+ 0, min(unconfirmed_status_frame_idx, num_frames - 1)
+ )
+ unconfirmed_obj_ids = unconfirmed_obj_ids_per_frame.get(
+ unconfirmed_status_frame_idx, None
+ )
+
+ batched_outs.append(
+ (
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ )
+ )
+ frame_indices.append(yield_frame_idx)
+
+ self._cache_frame_outputs(
+ inference_state,
+ yield_frame_idx,
+ yield_out["obj_id_to_mask"],
+ suppressed_obj_ids=suppressed_obj_ids,
+ removed_obj_ids=removed_obj_ids_snapshot,
+ unconfirmed_obj_ids=unconfirmed_obj_ids,
+ )
+
+ if self.postprocess_batch_size > 1:
+ postprocessed_outs = self._postprocess_output_batched(
+ H_video, W_video, batched_outs
+ )
+ else:
+ # Process each frame individually but output together
+ postprocessed_outs = []
+ for (
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ ) in batched_outs:
+ postprocessed_out = self._postprocess_output(
+ inference_state,
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ )
+ postprocessed_outs.append(postprocessed_out)
+
+ for yield_frame_idx, postprocessed_out in zip(
+ frame_indices, postprocessed_outs
+ ):
+ yield yield_frame_idx, postprocessed_out
+ else:
+ for yield_frame_idx, _, _ in postprocess_yield_list:
+ yield yield_frame_idx, DUMMY_OUTPUT
+
+ if self.is_multiplex:
+ # log the bucket utilization stats
+ # bucket utilization rate is total valid objects / total capacity -> represents rooms for improvement
+ # subscription rate is total valid objects / total number of buckets -> represents speedup
+ total_valid_objects = 0
+ total_num_buckets = 0
+ for state in inference_state["sam2_inference_states"]:
+ assert (
+ len(state["obj_ids"])
+ == state["multiplex_state"].total_valid_entries
+ )
+ total_valid_objects += len(state["obj_ids"])
+ total_num_buckets += state["multiplex_state"].num_buckets
+ if total_num_buckets > 0:
+ bucket_utilization_rate = (
+ total_valid_objects / (total_num_buckets * self.bucket_capacity)
+ ) * 100
+ subscription_rate = (total_valid_objects / total_num_buckets) * 100
+ logger.info(
+ f"Bucket utilization rate: {bucket_utilization_rate:.2f}%, subscription rate: {subscription_rate:.2f}%"
+ )
+
+ def _run_single_frame_inference(
+ self,
+ inference_state,
+ frame_idx,
+ reverse,
+ is_instance_processing=False,
+ ):
+ """
+ Perform inference on a single frame and get its inference results. This would
+ also update `inference_state`.
+ """
+ # prepare inputs
+ input_batch = inference_state["input_batch"]
+ tracker_states_local = inference_state["sam2_inference_states"]
+ geometric_prompt = (
+ inference_state["constants"]["empty_geometric_prompt"]
+ if inference_state["per_frame_geometric_prompt"][frame_idx] is None
+ else inference_state["per_frame_geometric_prompt"][frame_idx]
+ )
+ text_batch_key = tuple(input_batch.find_text_batch)
+ inference_state["feature_cache"]["text"] = {
+ text_batch_key: {
+ "language_features": inference_state["backbone_out"][
+ "language_features"
+ ],
+ "language_mask": inference_state["backbone_out"]["language_mask"],
+ }
+ }
+ # run inference for the current frame
+ (
+ obj_id_to_mask,
+ obj_id_to_score,
+ tracker_states_local_new,
+ tracker_metadata_new,
+ frame_stats,
+ _,
+ ) = self._det_track_one_frame(
+ frame_idx=frame_idx,
+ num_frames=inference_state["num_frames"],
+ reverse=reverse,
+ input_batch=input_batch,
+ geometric_prompt=geometric_prompt,
+ tracker_states_local=tracker_states_local,
+ tracker_metadata_prev=inference_state["tracker_metadata"],
+ feature_cache=inference_state["feature_cache"],
+ orig_vid_height=inference_state["orig_height"],
+ orig_vid_width=inference_state["orig_width"],
+ is_image_only=inference_state["is_image_only"],
+ )
+ # update inference state
+ inference_state["sam2_inference_states"] = tracker_states_local_new
+ inference_state["tracker_metadata"] = tracker_metadata_new
+ # use a dummy string in "previous_stages_out" to indicate this frame has outputs
+ inference_state["previous_stages_out"][frame_idx] = "_THIS_FRAME_HAS_OUTPUTS_"
+
+ if self.rank == 0:
+ self._cache_frame_outputs(inference_state, frame_idx, obj_id_to_mask)
+
+ out = {
+ "obj_id_to_mask": obj_id_to_mask,
+ "obj_id_to_score": obj_id_to_score, # first frame detection score
+ "obj_id_to_sam2_score": tracker_metadata_new[
+ "obj_id_to_sam2_score_frame_wise"
+ ][frame_idx],
+ }
+ # removed_obj_ids is only needed on rank 0 to handle hotstart delay buffer
+ if self.rank == 0:
+ rank0_metadata = tracker_metadata_new["rank0_metadata"]
+ removed_obj_ids = rank0_metadata["removed_obj_ids"]
+ out["removed_obj_ids"] = removed_obj_ids
+ out["suppressed_obj_ids"] = rank0_metadata["suppressed_obj_ids"][frame_idx]
+ out["frame_stats"] = frame_stats
+ if self.masklet_confirmation_enable:
+ status = rank0_metadata["masklet_confirmation"]["status"]
+ is_unconfirmed = status == MaskletConfirmationStatus.UNCONFIRMED.value
+ out["unconfirmed_obj_ids"] = tracker_metadata_new["obj_ids_all_gpu"][
+ is_unconfirmed
+ ].tolist()
+ else:
+ out["unconfirmed_obj_ids"] = []
+
+ return out
+
+ def _postprocess_output(
+ self,
+ inference_state,
+ out,
+ removed_obj_ids=None,
+ suppressed_obj_ids=None,
+ unconfirmed_obj_ids=None,
+ ):
+ obj_id_to_mask = out["obj_id_to_mask"] # low res masks
+ curr_obj_ids = sorted(obj_id_to_mask.keys())
+ H_video, W_video = inference_state["orig_height"], inference_state["orig_width"]
+ if len(curr_obj_ids) == 0:
+ out_obj_ids = torch.zeros(0, dtype=torch.int64)
+ out_probs = torch.zeros(0, dtype=torch.float32)
+ out_binary_masks = torch.zeros(0, H_video, W_video, dtype=torch.bool)
+ out_boxes_xywh = torch.zeros(0, 4, dtype=torch.float32)
+ else:
+ out_obj_ids = torch.tensor(curr_obj_ids, dtype=torch.int64)
+ out_probs = torch.tensor(
+ [out["obj_id_to_score"][obj_id] for obj_id in curr_obj_ids]
+ )
+ out_sam2_probs = torch.tensor(
+ [
+ (
+ out["obj_id_to_sam2_score"][obj_id]
+ if obj_id in out["obj_id_to_sam2_score"]
+ else 0.0
+ )
+ for obj_id in curr_obj_ids
+ ]
+ )
+ out_binary_masks = torch.cat(
+ [obj_id_to_mask[obj_id] for obj_id in curr_obj_ids], dim=0
+ )
+
+ assert out_binary_masks.dtype == torch.bool
+ keep = out_binary_masks.any(dim=(1, 2)).cpu() # remove masks with 0 areas
+ # hide outputs for those object IDs in `obj_ids_to_hide`
+ obj_ids_to_hide = []
+ if suppressed_obj_ids is not None:
+ obj_ids_to_hide.extend(suppressed_obj_ids)
+ if removed_obj_ids is not None:
+ obj_ids_to_hide.extend(removed_obj_ids)
+ if unconfirmed_obj_ids is not None:
+ obj_ids_to_hide.extend(unconfirmed_obj_ids)
+ if len(obj_ids_to_hide) > 0:
+ obj_ids_to_hide_t = torch.tensor(obj_ids_to_hide, dtype=torch.int64)
+ keep &= ~torch.isin(out_obj_ids, obj_ids_to_hide_t)
+
+ # slice those valid entries from the original outputs
+ keep_idx = torch.nonzero(keep, as_tuple=True)[0]
+ keep_idx_gpu = keep_idx.pin_memory().to(
+ device=out_binary_masks.device, non_blocking=True
+ )
+
+ out_obj_ids = torch.index_select(out_obj_ids, 0, keep_idx)
+ out_probs = torch.index_select(out_probs, 0, keep_idx)
+ out_sam2_probs = torch.index_select(out_sam2_probs, 0, keep_idx)
+ out_binary_masks = torch.index_select(out_binary_masks, 0, keep_idx_gpu)
+
+ if perflib.is_enabled:
+ out_boxes_xyxy = perf_masks_to_boxes(
+ out_binary_masks, out_obj_ids.tolist()
+ )
+ else:
+ out_boxes_xyxy = masks_to_boxes(out_binary_masks)
+
+ out_boxes_xywh = box_xyxy_to_xywh(out_boxes_xyxy) # convert to xywh format
+ # normalize boxes
+ out_boxes_xywh[..., 0] /= W_video
+ out_boxes_xywh[..., 1] /= H_video
+ out_boxes_xywh[..., 2] /= W_video
+ out_boxes_xywh[..., 3] /= H_video
+
+ # apply non-overlapping constraints on the existing masklets
+ if out_binary_masks.shape[0] > 1:
+ assert len(out_binary_masks) == len(out_sam2_probs)
+ out_binary_masks = (
+ self.tracker._apply_object_wise_non_overlapping_constraints(
+ out_binary_masks.unsqueeze(1),
+ out_sam2_probs.unsqueeze(1).to(out_binary_masks.device),
+ background_value=0,
+ ).squeeze(1)
+ ) > 0
+
+ prod_outputs = {}
+ if self.running_in_prod:
+ with torch.profiler.record_function(
+ "Sam3MultiplexTracking._postprocess_output.prod_outputs"
+ ):
+ out_centers = torch.zeros(
+ out_binary_masks.shape[0],
+ 2,
+ dtype=torch.float32,
+ device=out_binary_masks.device,
+ )
+
+ y_coords = torch.arange(
+ H_video, device=out_binary_masks.device, dtype=torch.float32
+ )
+ x_coords = torch.arange(
+ W_video, device=out_binary_masks.device, dtype=torch.float32
+ )
+ y_grid = y_coords.view(1, H_video, 1)
+ x_grid = x_coords.view(1, 1, W_video)
+ with torch.profiler.record_function(
+ "Sam3MultiplexTracking._postprocess_output.prod_outputs.center"
+ ):
+ weighted_y_sum = (out_binary_masks * y_grid).sum(dim=(1, 2))
+ weighted_x_sum = (out_binary_masks * x_grid).sum(dim=(1, 2))
+ total_mass = out_binary_masks.sum(dim=(1, 2)).clamp_min(1e-6)
+ center_y = weighted_y_sum / total_mass / H_video
+ center_x = weighted_x_sum / total_mass / W_video
+ out_centers[:, 0] = center_x
+ out_centers[:, 1] = center_y
+
+ with torch.profiler.record_function(
+ "Sam3MultiplexTracking._postprocess_output.prod_outputs.to_cpu"
+ ):
+ prod_outputs["out_centers"] = out_centers.cpu().numpy()
+
+ outputs = {
+ "out_obj_ids": out_obj_ids.cpu().numpy(),
+ "out_probs": out_probs.cpu().numpy(),
+ "out_boxes_xywh": out_boxes_xywh.cpu().numpy(),
+ "out_binary_masks": out_binary_masks.cpu().numpy(),
+ "frame_stats": out.get("frame_stats", None),
+ } | prod_outputs
+
+ return outputs
+
+ def _postprocess_output_batched(
+ self,
+ H_video,
+ W_video,
+ batched_outs,
+ ):
+ """
+ Batched version of _postprocess_output that batches GPU computations
+ (keep filtering, box computation) across frames for efficiency.
+
+ Args:
+ H_video: Video height
+ W_video: Video width
+ batched_outs: List of tuples, each containing:
+ (out, removed_obj_ids, suppressed_obj_ids, unconfirmed_obj_ids)
+ where out is the output dict from _run_single_frame_inference
+
+ Returns:
+ List of output dicts, one per frame in batched_outs
+ """
+ batch_size = len(batched_outs)
+ if batch_size == 0:
+ return []
+
+ # ========== Phase 1: Collect per-frame data ==========
+ # We'll track: frame_data[i] = (obj_ids, probs, sam2_probs, masks, keep_mask, frame_stats)
+ # or None if frame has no objects
+ frame_data = []
+ device = None
+
+ for (
+ out,
+ removed_obj_ids,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ ) in batched_outs:
+ obj_id_to_mask = out["obj_id_to_mask"]
+ curr_obj_ids = sorted(obj_id_to_mask.keys())
+ frame_stats = out.get("frame_stats", None)
+
+ if len(curr_obj_ids) == 0:
+ frame_data.append((None, None, None, None, None, frame_stats))
+ continue
+
+ out_obj_ids = torch.tensor(curr_obj_ids, dtype=torch.int64)
+ obj_id_to_score_dict = out["obj_id_to_score"]
+ obj_id_to_sam2_score = out["obj_id_to_sam2_score"]
+
+ if device is None:
+ device = obj_id_to_mask[curr_obj_ids[0]].device
+ default_sam2_score = torch.zeros((), dtype=torch.float32, device=device)
+
+ probs_list = []
+ sam2_probs_list = []
+ binary_masks_list = []
+
+ for obj_id in curr_obj_ids:
+ probs_list.append(obj_id_to_score_dict[obj_id])
+ sam2_probs_list.append(
+ obj_id_to_sam2_score.get(obj_id, default_sam2_score)
+ )
+ binary_masks_list.append(obj_id_to_mask[obj_id])
+
+ out_probs = torch.tensor(probs_list, dtype=torch.float32)
+ out_sam2_probs_gpu = torch.stack(sam2_probs_list)
+ out_binary_masks = torch.cat(binary_masks_list, dim=0)
+
+ # Compute keep mask (which objects to hide)
+ obj_ids_to_hide = []
+ if suppressed_obj_ids is not None:
+ obj_ids_to_hide.extend(suppressed_obj_ids)
+ if removed_obj_ids is not None:
+ obj_ids_to_hide.extend(removed_obj_ids)
+ if unconfirmed_obj_ids is not None:
+ obj_ids_to_hide.extend(unconfirmed_obj_ids)
+
+ if len(obj_ids_to_hide) > 0:
+ obj_ids_to_hide_t = torch.tensor(obj_ids_to_hide, dtype=torch.int64)
+ hide_mask = torch.isin(out_obj_ids, obj_ids_to_hide_t)
+ else:
+ hide_mask = torch.zeros(len(out_obj_ids), dtype=torch.bool)
+
+ frame_data.append(
+ (
+ out_obj_ids,
+ out_probs,
+ out_sam2_probs_gpu,
+ out_binary_masks,
+ hide_mask,
+ frame_stats,
+ )
+ )
+
+ # ========== Phase 2: Batch concatenate masks for GPU operations ==========
+ # Collect frames with objects
+ frames_with_objects = []
+ frame_obj_counts = [] # Number of objects per frame (for frames with objects only)
+ all_masks_list = []
+ all_hide_masks_list = []
+
+ for i, data in enumerate(frame_data):
+ if data[0] is not None:
+ frames_with_objects.append(i)
+ frame_obj_counts.append(data[0].shape[0])
+ all_masks_list.append(data[3]) # binary_masks
+ all_hide_masks_list.append(data[4]) # hide_mask
+
+ # Handle case where all frames have 0 objects
+ if len(frames_with_objects) == 0:
+ outputs = []
+ for data in frame_data:
+ output_dict = {
+ "out_obj_ids": np.zeros(0, dtype=np.int64),
+ "out_probs": np.zeros(0, dtype=np.float32),
+ "out_boxes_xywh": np.zeros((0, 4), dtype=np.float32),
+ "out_binary_masks": np.zeros((0, H_video, W_video), dtype=bool),
+ "frame_stats": data[5],
+ }
+ if self.running_in_prod:
+ output_dict["out_centers"] = np.zeros((0, 2), dtype=np.float32)
+ outputs.append(output_dict)
+ return outputs
+
+ # Concatenate all masks for batched GPU operations
+ all_masks = torch.cat(all_masks_list, dim=0)
+ all_hide_masks = torch.cat(all_hide_masks_list, dim=0)
+
+ # ========== Phase 3: Batched keep mask computation on GPU ==========
+ # Compute which masks have non-zero area (batched on GPU)
+ has_area = all_masks.any(dim=(1, 2)) # GPU operation
+
+ # Combine with hide mask (move hide_mask to GPU for the operation)
+ all_hide_masks_gpu = all_hide_masks.to(device=all_masks.device)
+ keep_mask_gpu = has_area & ~all_hide_masks_gpu
+
+ # Get keep indices
+ keep_indices = torch.nonzero(keep_mask_gpu, as_tuple=True)[0]
+
+ if len(keep_indices) == 0:
+ # All objects filtered out
+ outputs = []
+ for data in frame_data:
+ output_dict = {
+ "out_obj_ids": np.zeros(0, dtype=np.int64),
+ "out_probs": np.zeros(0, dtype=np.float32),
+ "out_boxes_xywh": np.zeros((0, 4), dtype=np.float32),
+ "out_binary_masks": np.zeros((0, H_video, W_video), dtype=bool),
+ "frame_stats": data[5],
+ }
+ if self.running_in_prod:
+ output_dict["out_centers"] = np.zeros((0, 2), dtype=np.float32)
+ outputs.append(output_dict)
+ return outputs
+
+ # ========== Phase 4: Batched filtering and box computation ==========
+ # Filter masks on GPU
+ kept_masks = torch.index_select(all_masks, 0, keep_indices)
+
+ # Compute bounding boxes in batch on GPU
+ if perflib.is_enabled:
+ # Need to gather obj_ids for perflib
+ all_obj_ids_list = [frame_data[i][0] for i in frames_with_objects]
+ all_obj_ids_cat = torch.cat(all_obj_ids_list, dim=0)
+ kept_obj_ids_for_perf = torch.index_select(
+ all_obj_ids_cat, 0, keep_indices.cpu()
+ )
+ kept_boxes_xyxy = perf_masks_to_boxes(
+ kept_masks, kept_obj_ids_for_perf.tolist()
+ )
+ else:
+ kept_boxes_xyxy = masks_to_boxes(kept_masks)
+
+ kept_boxes_xywh = box_xyxy_to_xywh(kept_boxes_xyxy)
+ kept_boxes_xywh[..., 0] /= W_video
+ kept_boxes_xywh[..., 1] /= H_video
+ kept_boxes_xywh[..., 2] /= W_video
+ kept_boxes_xywh[..., 3] /= H_video
+
+ # ========== Phase 5: Split back to per-frame for non-overlapping ==========
+ # Compute how many objects were kept per frame
+ keep_indices_cpu = keep_indices.cpu()
+ keep_set = set(keep_indices_cpu.tolist())
+
+ kept_counts = []
+ offset = 0
+ for count in frame_obj_counts:
+ kept_in_frame = sum(
+ 1 for j in range(offset, offset + count) if j in keep_set
+ )
+ kept_counts.append(kept_in_frame)
+ offset += count
+
+ # Split the kept tensors back to per-frame
+ split_masks = torch.split(kept_masks, kept_counts)
+ split_boxes = torch.split(kept_boxes_xywh, kept_counts)
+
+ # Also need to split obj_ids, probs, sam2_probs (filtering from original frame_data)
+ # We need to track which original indices were kept per frame
+ frame_kept_indices = [] # List of (local_kept_indices) per frame
+ offset = 0
+ for count in frame_obj_counts:
+ local_kept = []
+ for j in range(offset, offset + count):
+ if j in keep_set:
+ local_kept.append(j - offset) # Local index within frame
+ frame_kept_indices.append(local_kept)
+ offset += count
+
+ # ========== Phase 6: Apply non-overlapping per frame, collect final results ==========
+ final_results = [] # List of (frame_idx, obj_ids, probs, boxes, masks)
+
+ for idx, frame_i in enumerate(frames_with_objects):
+ data = frame_data[frame_i]
+ local_kept = frame_kept_indices[idx]
+
+ if len(local_kept) == 0:
+ continue
+
+ # Get the filtered data for this frame
+ local_kept_t = torch.tensor(local_kept, dtype=torch.int64)
+ out_obj_ids = torch.index_select(data[0], 0, local_kept_t)
+ out_probs = torch.index_select(data[1], 0, local_kept_t)
+ out_sam2_probs = torch.index_select(
+ data[2], 0, local_kept_t.to(data[2].device)
+ )
+ out_masks = split_masks[idx]
+ out_boxes = split_boxes[idx]
+
+ # Apply non-overlapping constraints (per-frame operation)
+ if out_masks.shape[0] > 1:
+ # Copy sam2_probs to CPU pinned memory then back to GPU for the operation
+ out_sam2_probs_cpu = torch.empty(
+ out_sam2_probs.shape, dtype=out_sam2_probs.dtype, pin_memory=True
+ )
+ out_sam2_probs_cpu.copy_(out_sam2_probs, non_blocking=True)
+ out_masks = (
+ self.tracker._apply_object_wise_non_overlapping_constraints(
+ out_masks.unsqueeze(1),
+ out_sam2_probs_cpu.unsqueeze(1).to(out_masks.device),
+ background_value=0,
+ ).squeeze(1)
+ ) > 0
+
+ final_results.append(
+ (frame_i, out_obj_ids, out_probs, out_boxes, out_masks)
+ )
+
+ # ========== Phase 6.5: Compute centers for prod ==========
+ all_centers = None
+ if self.running_in_prod and len(final_results) > 0:
+ with torch.profiler.record_function(
+ "Sam3MultiplexTracking._postprocess_output_batched.prod_outputs"
+ ):
+ # Concatenate all masks for batched center computation
+ all_masks = torch.cat([r[4] for r in final_results], dim=0)
+ if all_masks.shape[0] > 0:
+ y_coords = torch.arange(
+ H_video, device=all_masks.device, dtype=torch.float32
+ )
+ x_coords = torch.arange(
+ W_video, device=all_masks.device, dtype=torch.float32
+ )
+ y_grid = y_coords.view(1, H_video, 1)
+ x_grid = x_coords.view(1, 1, W_video)
+
+ weighted_y_sum = (all_masks * y_grid).sum(dim=(1, 2))
+ weighted_x_sum = (all_masks * x_grid).sum(dim=(1, 2))
+ total_mass = all_masks.sum(dim=(1, 2)).clamp_min(1e-6)
+ center_y = weighted_y_sum / total_mass / H_video
+ center_x = weighted_x_sum / total_mass / W_video
+ all_centers = torch.stack([center_x, center_y], dim=1)
+
+ # Handle case where all filtered out
+ if len(final_results) == 0:
+ outputs = []
+ for data in frame_data:
+ output_dict = {
+ "out_obj_ids": np.zeros(0, dtype=np.int64),
+ "out_probs": np.zeros(0, dtype=np.float32),
+ "out_boxes_xywh": np.zeros((0, 4), dtype=np.float32),
+ "out_binary_masks": np.zeros((0, H_video, W_video), dtype=bool),
+ "frame_stats": data[5],
+ }
+ if self.running_in_prod:
+ output_dict["out_centers"] = np.zeros((0, 2), dtype=np.float32)
+ outputs.append(output_dict)
+ return outputs
+
+ # ========== Phase 7: Concatenate for batched GPU→CPU copy ==========
+ final_obj_ids = torch.cat([r[1] for r in final_results], dim=0)
+ final_probs = torch.cat([r[2] for r in final_results], dim=0)
+ final_boxes = torch.cat([r[3] for r in final_results], dim=0)
+ final_masks = torch.cat([r[4] for r in final_results], dim=0)
+
+ total_objects = final_obj_ids.shape[0]
+
+ # Initialize or resize batched CPU buffer
+ batched_buffer_size = self.postprocess_batch_size * self.max_num_objects
+ needs_buffer_init = not hasattr(self, "buffer_cpu_batched")
+ needs_buffer_resize = not needs_buffer_init and (
+ self.buffer_cpu_batched["out_binary_masks"].shape[0] != batched_buffer_size
+ or self.buffer_cpu_batched["out_binary_masks"].shape[1] != H_video
+ or self.buffer_cpu_batched["out_binary_masks"].shape[2] != W_video
+ )
+
+ if needs_buffer_init or needs_buffer_resize:
+ self.buffer_cpu_batched = {
+ "out_obj_ids": torch.zeros(
+ batched_buffer_size,
+ dtype=torch.int64,
+ device="cpu",
+ pin_memory=True,
+ ),
+ "out_probs": torch.zeros(
+ batched_buffer_size,
+ dtype=torch.float32,
+ device="cpu",
+ pin_memory=True,
+ ),
+ "out_boxes_xywh": torch.zeros(
+ batched_buffer_size,
+ 4,
+ dtype=torch.float32,
+ device="cpu",
+ pin_memory=True,
+ ),
+ "out_binary_masks": torch.zeros(
+ batched_buffer_size,
+ H_video,
+ W_video,
+ dtype=bool,
+ device="cpu",
+ pin_memory=True,
+ ),
+ }
+ if self.running_in_prod:
+ self.buffer_cpu_batched["out_centers"] = torch.zeros(
+ batched_buffer_size,
+ 2,
+ dtype=torch.float32,
+ device="cpu",
+ pin_memory=True,
+ )
+
+ self.buffer_cpu_batched["out_obj_ids"][:total_objects].copy_(final_obj_ids)
+ self.buffer_cpu_batched["out_probs"][:total_objects].copy_(final_probs)
+ self.buffer_cpu_batched["out_boxes_xywh"][:total_objects].copy_(final_boxes)
+ self.buffer_cpu_batched["out_binary_masks"][:total_objects].copy_(final_masks)
+
+ if all_centers is not None:
+ self.buffer_cpu_batched["out_centers"][:total_objects].copy_(all_centers)
+
+ # ========== Phase 8: Build output list ==========
+ # Create mapping from frame index to (offset, count) in the buffer
+ frame_to_offset_count = {}
+ offset = 0
+ for frame_i, obj_ids, _, _, _ in final_results:
+ count = obj_ids.shape[0]
+ frame_to_offset_count[frame_i] = (offset, count)
+ offset += count
+
+ outputs = []
+ for i, data in enumerate(frame_data):
+ frame_stats = data[5]
+ if i not in frame_to_offset_count:
+ # Frame has no objects (either originally or after filtering)
+ output_dict = {
+ "out_obj_ids": np.zeros(0, dtype=np.int64),
+ "out_probs": np.zeros(0, dtype=np.float32),
+ "out_boxes_xywh": np.zeros((0, 4), dtype=np.float32),
+ "out_binary_masks": np.zeros((0, H_video, W_video), dtype=bool),
+ "frame_stats": frame_stats,
+ }
+ if all_centers is not None:
+ output_dict["out_centers"] = np.zeros((0, 2), dtype=np.float32)
+ outputs.append(output_dict)
+ else:
+ buf_offset, num_objects = frame_to_offset_count[i]
+ output_dict = {
+ "out_obj_ids": self.buffer_cpu_batched["out_obj_ids"][
+ buf_offset : buf_offset + num_objects
+ ]
+ .numpy()
+ .copy(),
+ "out_probs": self.buffer_cpu_batched["out_probs"][
+ buf_offset : buf_offset + num_objects
+ ]
+ .numpy()
+ .copy(),
+ "out_boxes_xywh": self.buffer_cpu_batched["out_boxes_xywh"][
+ buf_offset : buf_offset + num_objects
+ ]
+ .numpy()
+ .copy(),
+ "out_binary_masks": self.buffer_cpu_batched["out_binary_masks"][
+ buf_offset : buf_offset + num_objects
+ ]
+ .numpy()
+ .copy(),
+ "frame_stats": frame_stats,
+ }
+ if all_centers is not None:
+ output_dict["out_centers"] = (
+ self.buffer_cpu_batched["out_centers"][
+ buf_offset : buf_offset + num_objects
+ ]
+ .numpy()
+ .copy()
+ )
+ outputs.append(output_dict)
+
+ return outputs
+
+ def _cache_frame_outputs(
+ self,
+ inference_state,
+ frame_idx,
+ obj_id_to_mask,
+ suppressed_obj_ids=None,
+ removed_obj_ids=None,
+ unconfirmed_obj_ids=None,
+ ):
+ if "cached_frame_outputs" not in inference_state:
+ inference_state["cached_frame_outputs"] = {}
+
+ # Filter out suppressed, removed, and unconfirmed objects from the cache
+ filtered_obj_id_to_mask = obj_id_to_mask.copy()
+
+ objects_to_exclude = set()
+ if suppressed_obj_ids is not None:
+ objects_to_exclude.update(suppressed_obj_ids)
+ if removed_obj_ids is not None:
+ objects_to_exclude.update(removed_obj_ids)
+ if unconfirmed_obj_ids is not None:
+ objects_to_exclude.update(unconfirmed_obj_ids)
+
+ if objects_to_exclude:
+ for obj_id in objects_to_exclude:
+ if obj_id in filtered_obj_id_to_mask:
+ del filtered_obj_id_to_mask[obj_id]
+
+ inference_state["cached_frame_outputs"][frame_idx] = filtered_obj_id_to_mask
+
+ def _build_sam2_output(
+ self, inference_state, frame_idx, refined_obj_id_to_mask=None
+ ):
+ if not frame_idx in inference_state["cached_frame_outputs"]:
+ return {}
+
+ cached_outputs = inference_state["cached_frame_outputs"][frame_idx]
+ obj_id_to_mask = cached_outputs.copy()
+
+ # Update with refined masks if provided
+ if refined_obj_id_to_mask is not None:
+ for obj_id, refined_mask in refined_obj_id_to_mask.items():
+ assert refined_mask is not None, (
+ f"Refined mask data must be provided for obj_id {obj_id}"
+ )
+ obj_id_to_mask[obj_id] = refined_mask
+
+ return obj_id_to_mask
+
+ def _compile_model(self):
+ """Compile the SAM model with torch.compile for speedup."""
+ # TODO: compile SAM2 model components
+ is_compiled = getattr(self, "_model_is_compiled", False)
+ if is_compiled or not self.compile_model:
+ return
+
+ import torch._dynamo
+
+ # a larger cache size to hold varying number of shapes for torch.compile
+ # see https://github.com/pytorch/pytorch/blob/v2.5.1/torch/_dynamo/config.py#L42-L49
+ torch._dynamo.config.cache_size_limit = 128
+ torch._dynamo.config.accumulated_cache_size_limit = 2048
+ torch._dynamo.config.capture_scalar_outputs = True
+ torch._dynamo.config.suppress_errors = True
+
+ # Compile module components following https://www.internalfb.com/diff/D70935785
+ # skip compilation of `_encode_prompt` since it sometimes tiggger SymInt errors
+ # self._encode_prompt = clone_output_wrapper(
+ # torch.compile(self._encode_prompt, fullgraph=True, mode="max-autotune")
+ # )
+
+ ## Compile SAM3 model components (matching OV: clone_output_wrapper(torch.compile(fn)))
+ self.detector.backbone.language_backbone.encoder.forward = clone_output_wrapper(
+ torch.compile(
+ self.detector.backbone.language_backbone.encoder.forward,
+ fullgraph=True,
+ mode="max-autotune",
+ )
+ )
+
+ self.detector.backbone.vision_backbone.forward = clone_output_wrapper(
+ torch.compile(
+ self.detector.backbone.vision_backbone.forward,
+ fullgraph=True,
+ mode="max-autotune",
+ )
+ )
+ self.detector.transformer.encoder.forward = clone_output_wrapper(
+ torch.compile(
+ self.detector.transformer.encoder.forward,
+ fullgraph=True,
+ mode="max-autotune",
+ )
+ )
+ self.detector.transformer.decoder.forward = clone_output_wrapper(
+ torch.compile(
+ self.detector.transformer.decoder.forward,
+ fullgraph=True,
+ mode="max-autotune",
+ dynamic=False, # note: FA decoder uses static shapes
+ )
+ )
+
+ self.detector.segmentation_head.forward = clone_output_wrapper(
+ torch.compile(
+ self.detector.segmentation_head.forward,
+ fullgraph=True,
+ mode="max-autotune",
+ )
+ )
+
+ ## Compile SAM2 model components
+ self.tracker.maskmem_backbone.forward = compile_wrapper(
+ self.tracker.maskmem_backbone.forward,
+ mode="max-autotune",
+ fullgraph=True,
+ dynamic=False,
+ )
+
+ self.tracker.transformer.encoder.forward = shape_logging_wrapper(
+ compile_wrapper(
+ self.tracker.transformer.encoder.forward,
+ mode="max-autotune-no-cudagraphs",
+ fullgraph=True,
+ dynamic=True,
+ ),
+ keep_kwargs=["src", "src_pos", "prompt", "prompt_pos"],
+ )
+
+ self.tracker.sam_mask_decoder.forward = compile_wrapper(
+ self.tracker.sam_mask_decoder.forward,
+ mode="max-autotune",
+ fullgraph=True,
+ dynamic=False, # Accuracy regression on True
+ )
+
+ sam3.model.sam3_video_base._associate_det_trk_compilable = compile_wrapper(
+ sam3.model.sam3_video_base._associate_det_trk_compilable,
+ mode="max-autotune-no-cudagraphs",
+ fullgraph=True,
+ dynamic=False,
+ )
+
+ self.tracker._suppress_object_pw_area_shrinkage = compile_wrapper(
+ self.tracker._suppress_object_pw_area_shrinkage,
+ mode="max-autotune-no-cudagraphs",
+ fullgraph=True,
+ dynamic=False,
+ )
+
+ self._model_is_compiled = True
+
+ def _warm_up_vg_propagation(self, inference_state, start_frame_idx=0):
+ # use different tracking score thresholds for each round to simulate different number of output objects
+ num_objects_list = range(self.num_obj_for_compile + 1)
+ num_rounds = 3
+ orig_new_det_thresh = self.new_det_thresh
+ for i in range(num_rounds):
+ for num_objects in num_objects_list:
+ logger.info(
+ f"round {i + 1}/{num_rounds} warming up model compilation -- simulating {num_objects}/{self.num_obj_for_compile} objects"
+ )
+ # Initialize text prompt and cache image features
+ self.add_prompt(
+ inference_state, frame_idx=start_frame_idx, text_str="cat"
+ )
+ if num_objects > 0:
+ inference_state = self.add_fake_objects_to_inference_state(
+ inference_state, num_objects, frame_idx=start_frame_idx
+ )
+ inference_state["tracker_metadata"]["rank0_metadata"].update(
+ {
+ "masklet_confirmation": {
+ "status": np.zeros(num_objects, dtype=np.int64),
+ "consecutive_det_num": np.zeros(
+ num_objects, dtype=np.int64
+ ),
+ }
+ }
+ )
+ for _ in self.propagate_in_video(
+ inference_state, start_frame_idx, reverse=False
+ ):
+ pass
+ for _ in self.propagate_in_video(
+ inference_state, start_frame_idx, reverse=True
+ ):
+ pass
+ self.reset_state(inference_state)
+ logger.info(
+ f"{i + 1}/{num_rounds} warming up model compilation -- completed round {i + 1} out of {num_rounds}"
+ )
+
+ # Warm up SAM2 memory encoder with varying input shapes
+ num_iters = 3
+ feat_size = self.tracker.sam_image_embedding_size**2 # 72 * 72 = 5184
+ hidden_dim = self.tracker.hidden_dim # 256
+ mem_dim = self.tracker.mem_dim # 64 for non-multiplex, 256 for multiplex
+ is_multiplex = self.tracker.is_multiplex
+
+ for _ in tqdm(range(num_iters)):
+ for b in range(1, self.num_obj_for_compile + 1):
+ for i in range(
+ 1,
+ self.tracker.max_cond_frames_in_attn + self.tracker.num_maskmem,
+ ):
+ for j in range(
+ self.tracker.max_cond_frames_in_attn
+ + self.tracker.max_obj_ptrs_in_encoder
+ ):
+ if is_multiplex:
+ # Multiplex encoder: mem_dim == hidden_dim, uses decoupled cross-attention
+ # num_obj_ptr_tokens = j (since hidden_dim // mem_dim = 1)
+ num_obj_ptr_tokens = j
+ memory_seq_len = feat_size * i + num_obj_ptr_tokens
+
+ # src and memory have batch=num_buckets (b)
+ src = torch.randn(
+ feat_size, b, hidden_dim, device=self.device
+ )
+ src_pos = torch.randn(
+ feat_size, b, hidden_dim, device=self.device
+ )
+ memory = torch.randn(
+ memory_seq_len, b, hidden_dim, device=self.device
+ )
+ memory_pos = torch.randn(
+ memory_seq_len, b, hidden_dim, device=self.device
+ )
+
+ # image and memory_image always have batch=1 (shared image features)
+ image = torch.randn(
+ feat_size, 1, hidden_dim, device=self.device
+ )
+ image_pos = torch.randn(
+ feat_size, 1, hidden_dim, device=self.device
+ )
+ memory_image = torch.randn(
+ feat_size * i, 1, hidden_dim, device=self.device
+ )
+ memory_image_pos = torch.randn(
+ feat_size * i, 1, hidden_dim, device=self.device
+ )
+
+ self.tracker.transformer.encoder.forward(
+ image=image,
+ src=src,
+ memory_image=memory_image,
+ memory=memory,
+ image_pos=image_pos,
+ src_pos=src_pos,
+ memory_image_pos=memory_image_pos,
+ memory_pos=memory_pos,
+ num_obj_ptr_tokens=num_obj_ptr_tokens,
+ )
+ else:
+ # Non-multiplex encoder: mem_dim = 64, uses standard cross-attention
+ # num_obj_ptr_tokens = (hidden_dim // mem_dim) * j = 4 * j
+ num_obj_ptr_tokens = (hidden_dim // mem_dim) * j
+ src = torch.randn(
+ feat_size, b, hidden_dim, device=self.device
+ )
+ src_pos = torch.randn(
+ feat_size, b, hidden_dim, device=self.device
+ )
+ prompt = torch.randn(
+ feat_size * i + num_obj_ptr_tokens,
+ b,
+ mem_dim,
+ device=self.device,
+ )
+ prompt_pos = torch.randn(
+ feat_size * i + num_obj_ptr_tokens,
+ b,
+ mem_dim,
+ device=self.device,
+ )
+
+ self.tracker.transformer.encoder.forward(
+ src=src,
+ src_pos=src_pos,
+ prompt=prompt,
+ prompt_pos=prompt_pos,
+ num_obj_ptr_tokens=num_obj_ptr_tokens,
+ )
+
+ # Warm up different number of kbox
+ for _ in tqdm(range(num_iters)):
+ for i in range(1, self.max_num_kboxes + 1):
+ kboxes = (
+ torch.rand(i, 4, dtype=torch.float32) * 0.5
+ ) # Generate positive values between 0 and 1
+ print(
+ "Warming up masks_to_boxes with",
+ i,
+ f"kboxes.shape={kboxes.shape}",
+ )
+ self.add_prompt(
+ inference_state,
+ frame_idx=start_frame_idx,
+ text_str="cat",
+ boxes_xywh=kboxes,
+ box_labels=[1] * len(kboxes),
+ )
+
+ for _ in self.propagate_in_video(
+ inference_state, start_frame_idx, reverse=False
+ ):
+ pass
+
+ self.new_det_thresh = orig_new_det_thresh
+ return inference_state
+
+ def add_fake_objects_to_inference_state(
+ self, inference_state, num_objects, frame_idx
+ ):
+ new_det_obj_ids_local = np.arange(num_objects)
+ high_res_H, high_res_W = (
+ self.tracker.maskmem_backbone.mask_downsampler.interpol_size
+ )
+ new_det_masks = torch.ones(
+ len(new_det_obj_ids_local), high_res_H, high_res_W
+ ).to(self.device)
+
+ inference_state["sam2_inference_states"] = self._tracker_add_new_objects(
+ frame_idx=frame_idx,
+ num_frames=inference_state["num_frames"],
+ new_obj_ids=new_det_obj_ids_local,
+ new_obj_masks=new_det_masks,
+ tracker_states_local=inference_state["sam2_inference_states"],
+ orig_vid_height=inference_state["orig_height"],
+ orig_vid_width=inference_state["orig_width"],
+ feature_cache=inference_state["feature_cache"],
+ )
+
+ # Synthesize obj_id_to_mask data for cached_frame_outputs to support _build_sam2_output during warmup
+ obj_id_to_mask = {}
+ if num_objects > 0:
+ H_video = inference_state["orig_height"]
+ W_video = inference_state["orig_width"]
+
+ video_res_masks = F.interpolate(
+ new_det_masks.unsqueeze(1), # Add channel dimension for interpolation
+ size=(H_video, W_video),
+ mode="bilinear",
+ align_corners=False,
+ ) # (num_objects, 1, H_video, W_video)
+ for i, obj_id in enumerate(new_det_obj_ids_local):
+ obj_id_to_mask[obj_id] = (video_res_masks[i] > 0.0).to(torch.bool)
+ if self.rank == 0:
+ for fidx in range(inference_state["num_frames"]):
+ self._cache_frame_outputs(inference_state, fidx, obj_id_to_mask)
+
+ inference_state["tracker_metadata"] = {
+ "obj_ids_per_gpu": [np.arange(num_objects)],
+ "obj_ids_all_gpu": np.arange(num_objects), # Same as 1 GPU
+ "num_obj_per_gpu": [num_objects],
+ "obj_id_to_score": {i: 1.0 for i in range(num_objects)},
+ "obj_id_to_sam2_score_frame_wise": defaultdict(dict),
+ "obj_id_to_last_occluded": {},
+ "max_obj_id": num_objects,
+ "rank0_metadata": {
+ "masklet_confirmation": {
+ "status": np.zeros(num_objects, dtype=np.int64),
+ "consecutive_det_num": np.zeros(num_objects, dtype=np.int64),
+ },
+ "removed_obj_ids": set(),
+ "suppressed_obj_ids": defaultdict(set),
+ },
+ # gpu_metadata for hotstart tracking on GPU
+ "gpu_metadata": {
+ "N_obj": num_objects,
+ "obj_first_frame": torch.zeros(
+ num_objects, dtype=torch.long, device=self.device
+ ),
+ "consecutive_unmatch_count": torch.zeros(
+ num_objects, dtype=torch.long, device=self.device
+ ),
+ "trk_keep_alive": torch.ones(
+ num_objects, dtype=torch.bool, device=self.device
+ ),
+ "removed_mask": torch.zeros(
+ num_objects, dtype=torch.bool, device=self.device
+ ),
+ "overlap_pair_counts": torch.zeros(
+ (num_objects, num_objects), dtype=torch.long, device=self.device
+ ),
+ "last_occluded_tensor": torch.zeros(
+ num_objects, dtype=torch.long, device=self.device
+ ),
+ },
+ }
+ # Add num_buc_per_gpu for multiplex mode
+ if self.is_multiplex:
+ # Count actual buckets from the inference states
+ num_buc = self._count_buckets_in_states(
+ inference_state["sam2_inference_states"]
+ )
+ inference_state["tracker_metadata"]["num_buc_per_gpu"] = np.array(
+ [num_buc], dtype=np.int64
+ )
+
+ return inference_state
+
+ @torch.inference_mode()
+ @torch.autocast(device_type="cuda", dtype=torch.bfloat16)
+ def warm_up_compilation(self):
+ """
+ Warm up the model by running a dummy inference to compile the model. This is
+ useful to avoid the compilation overhead in the first inference call.
+ """
+ if not self.compile_model:
+ return
+ self._warm_up_complete = False
+ if self.device.type != "cuda":
+ raise RuntimeError(
+ f"The model must be on CUDA for warm-up compilation, got {self.device=}."
+ )
+
+ # temporally set to single GPU temporarily for warm-up compilation
+ orig_rank = self.rank
+ orig_world_size = self.world_size
+ self.rank = self.detector.rank = 0
+ self.world_size = self.detector.world_size = 1
+ orig_recondition_every_nth_frame = self.recondition_every_nth_frame
+ # self.recondition_every_nth_frame = 2
+
+ # Get a random video
+ inference_state = self.init_state(resource_path="")
+ start_frame_idx = 0
+
+ # Run basic propagation warm-up
+ inference_state = self._warm_up_vg_propagation(inference_state, start_frame_idx)
+
+ logger.info("Warm-up compilation completed.")
+
+ # revert to the original GPU and rank
+ self.rank = self.detector.rank = orig_rank
+ self.world_size = self.detector.world_size = orig_world_size
+ self.recondition_every_nth_frame = orig_recondition_every_nth_frame
+ self._warm_up_complete = True
+ self.tracker.transformer.encoder.forward.set_logging(True)
+
+ @torch.inference_mode()
+ def add_prompt(
+ self,
+ inference_state,
+ frame_idx,
+ text_str=None,
+ clear_old_points=True,
+ points=None,
+ point_labels=None,
+ boxes_xywh=None,
+ box_labels=None,
+ clear_old_boxes=True,
+ output_prob_thresh=0.5,
+ ):
+ """
+ Add text, point or box prompts on a single frame. This method returns the inference
+ outputs only on the prompted frame.
+
+ Note that text prompts are NOT associated with a particular frame (i.e. they apply
+ to all frames). However, we only run inference on the frame specified in `frame_idx`.
+
+ Copied from sam3_demo.Sam3DemoMixin.add_prompt, simplified to support only text prompts.
+ """
+ logger.info("Running add_prompt on frame %d", frame_idx)
+
+ device = inference_state["device"]
+ num_frames = inference_state["num_frames"]
+ assert text_str is not None or points is not None or boxes_xywh is not None, (
+ "at least one type of prompt (text, points, boxes) must be provided"
+ )
+ assert 0 <= frame_idx < num_frames, (
+ f"{frame_idx=} is out of range for a total of {num_frames} frames"
+ )
+
+ assert clear_old_boxes, "clear old boxes must be True"
+
+ assert points is None and clear_old_points is True and point_labels is None, (
+ "Point prompts not accepted"
+ )
+
+ # since it's a semantic prompt, we start over
+ self.reset_state(inference_state)
+
+ # 1) add text prompt
+ if text_str is not None:
+ inference_state["text_prompt"] = text_str
+ # add the text prompt into the input batch (to be applied to *all* frames)
+ inference_state["input_batch"].find_text_batch[0] = text_str
+ for t in range(inference_state["num_frames"]):
+ text_id = self.TEXT_ID_FOR_TEXT
+ inference_state["input_batch"].find_inputs[t].text_ids[...] = text_id
+
+ # 2) handle box prompt
+ assert (boxes_xywh is not None) == (box_labels is not None)
+ if boxes_xywh is not None:
+ boxes_xywh = torch.as_tensor(boxes_xywh, dtype=torch.float32)
+ box_labels = torch.as_tensor(box_labels, dtype=torch.long)
+ # input boxes are expected to be [xmin, ymin, width, height] format
+ # in normalized coordinates of range 0~1, similar to FA
+ assert boxes_xywh.dim() == 2
+ assert boxes_xywh.size(0) > 0 and boxes_xywh.size(-1) == 4
+ assert box_labels.dim() == 1 and box_labels.size(0) == boxes_xywh.size(0)
+ boxes_cxcywh = box_xywh_to_cxcywh(boxes_xywh)
+ assert (boxes_xywh >= 0).all().item() and (boxes_xywh <= 1).all().item()
+ assert (boxes_cxcywh >= 0).all().item() and (boxes_cxcywh <= 1).all().item()
+
+ new_box_input = boxes_cxcywh, box_labels
+ inference_state["per_frame_raw_box_input"][frame_idx] = new_box_input
+
+ # handle the case of visual prompt (also added as an input box from the UI)
+ boxes_cxcywh, box_labels, geometric_prompt = self._get_visual_prompt(
+ inference_state, frame_idx, boxes_cxcywh, box_labels
+ )
+
+ inference_state["per_frame_geometric_prompt"][frame_idx] = geometric_prompt
+
+ with torch.profiler.record_function("add_prompt._init_backbone_out"):
+ inference_state["backbone_out"] = self._init_backbone_out(inference_state)
+ out = self._run_single_frame_inference(
+ inference_state,
+ frame_idx,
+ reverse=False,
+ )
+ return frame_idx, self._postprocess_output(inference_state, out)
+
+ def _init_backbone_out(self, inference_state):
+ """
+ Initialize a backbone_out dictionary and extract the text features.
+
+ Note that the visual features of each frame are not extracted here. They will be
+ extracted on the fly when running inference on each frame.
+ """
+ input = inference_state["input_batch"]
+ device = self.device
+ backbone_out = {"img_batch_all_stages": input.img_batch}
+ text_outputs = self.detector.backbone.forward_text(
+ input.find_text_batch, device=device
+ )
+ backbone_out.update(text_outputs)
+ return backbone_out
+
+ @torch.autocast(device_type="cuda", dtype=torch.bfloat16)
+ def forward(self, input: BatchedDatapoint, is_inference: bool = False):
+ """This method is only used for benchmark eval (not used in the demo)."""
+ # set the model to single GPU for benchmark evaluation (to be compatible with trainer)
+ orig_rank = self.rank
+ orig_world_size = self.world_size
+ self.rank = self.detector.rank = 0
+ self.world_size = self.detector.world_size = 1
+
+ # get data
+ text_prompt_ids = input.find_metadatas[0].original_category_id
+ text_prompt_list = input.find_text_batch
+
+ # loop over txt prompts
+ tracking_res = defaultdict(dict) # frame_idx --> {obj_id: mask}
+ scores_labels = defaultdict(tuple) # obj_id --> (score, text_prompt_id)
+ inference_state = self.init_state(resource_path=input.raw_images)
+ for prompt_id, prompt in zip(text_prompt_ids, text_prompt_list):
+ self.add_prompt(inference_state, frame_idx=0, text_str=prompt)
+ start_obj_id = max(scores_labels.keys(), default=-1) + 1 # prev max + 1
+
+ # propagate the prompts
+ obj_ids_this_prompt = set()
+ for frame_idx, out in self.propagate_in_video(
+ inference_state,
+ start_frame_idx=0,
+ max_frame_num_to_track=inference_state["num_frames"],
+ reverse=False,
+ ):
+ out_obj_ids = (
+ out["out_obj_ids"].numpy()
+ if isinstance(out["out_obj_ids"], torch.Tensor)
+ else out["out_obj_ids"]
+ )
+ out_binary_masks = (
+ out["out_binary_masks"].numpy()
+ if isinstance(out["out_binary_masks"], torch.Tensor)
+ else out["out_binary_masks"]
+ )
+
+ current_frame_res = tracking_res[frame_idx]
+ for obj_id, mask in zip(out_obj_ids, out_binary_masks):
+ mask_tensor = torch.tensor(mask[None], dtype=torch.bool)
+ current_frame_res[obj_id + start_obj_id] = mask_tensor
+ obj_ids_this_prompt.update(current_frame_res.keys())
+
+ obj_id_to_score = inference_state["tracker_metadata"]["obj_id_to_score"]
+ for obj_id, score in obj_id_to_score.items():
+ if obj_id + start_obj_id in obj_ids_this_prompt:
+ score_tensor = torch.tensor(score, dtype=torch.float32)
+ scores_labels[obj_id + start_obj_id] = (score_tensor, prompt_id)
+
+ self.reset_state(inference_state)
+
+ video_id = input.find_metadatas[0].original_image_id[0].cpu().item()
+ preds = self.prep_for_evaluator(input.raw_images, tracking_res, scores_labels)
+
+ # revert the model to the original GPU and rank
+ self.rank = self.detector.rank = orig_rank
+ self.world_size = self.detector.world_size = orig_world_size
+ return {video_id: preds}
+
+
+class Sam3MultiplexTrackingProd(Sam3MultiplexTracking):
+ """
+ Subclass of Sam3MultiplexTracking with support for batched processing.
+
+ This class enables processing videos in batches rather than all at once by:
+ 1. Adding an `is_last_batch` parameter to control buffer flushing
+ 2. Persisting generator state (hotstart_buffer, hotstart_removed_obj_ids,
+ unconfirmed_obj_ids_per_frame) in inference_state across generator instantiations
+
+ This is useful for processing large videos in smaller chunks to manage memory
+ or distribute processing across multiple calls.
+ """
+
+ @torch.inference_mode()
+ def init_state(
+ self,
+ resource_path,
+ offload_video_to_cpu=False,
+ async_loading_frames=False,
+ use_torchcodec=False,
+ use_cv2=False,
+ input_is_mp4=False,
+ ):
+ inference_state = super().init_state(
+ resource_path=resource_path,
+ offload_video_to_cpu=offload_video_to_cpu,
+ async_loading_frames=async_loading_frames,
+ use_torchcodec=use_torchcodec,
+ use_cv2=use_cv2,
+ input_is_mp4=input_is_mp4,
+ )
+ # Initialize generator state for batched processing
+ inference_state["generator_state"] = {
+ "hotstart_buffer": [],
+ "hotstart_removed_obj_ids": set(),
+ "unconfirmed_obj_ids_per_frame": {},
+ "postprocess_yield_list": [],
+ }
+ return inference_state
+
+ def reset_state(self, inference_state):
+ super().reset_state(inference_state)
+ # Reset generator state for batched processing
+ inference_state["generator_state"] = {
+ "hotstart_buffer": [],
+ "hotstart_removed_obj_ids": set(),
+ "unconfirmed_obj_ids_per_frame": {},
+ "postprocess_yield_list": [],
+ }
+
+ @torch.inference_mode()
+ def propagate_in_video(
+ self,
+ inference_state,
+ start_frame_idx=None,
+ max_frame_num_to_track=None,
+ reverse=False,
+ output_prob_thresh=0.5,
+ compute_stability_score=False,
+ is_instance_processing=False,
+ is_last_batch=True,
+ ):
+ """
+ Propagate the prompts to get grounding results for the entire video. This method
+ is a generator and yields inference outputs for all frames in the range specified
+ by `start_frame_idx`, `max_frame_num_to_track`, and `reverse`.
+
+ Args:
+ is_last_batch: Whether this is the last batch in a batched processing scenario.
+ When True (default), the hotstart buffer will be flushed at end_frame_idx.
+ When False, the buffer is preserved in inference_state for the next batch.
+ This flag should be set to False for all batches except the last one when
+ processing a video in multiple batches.
+ """
+ # compile the model (it's a no-op if the model is already compiled)
+ # note that it's intentionally added to `self.propagate_in_video`, so that the first
+ # `self.add_prompt` call will be done in eager mode to fill in the decoder buffers
+ # such as positional encoding cache)
+ self._compile_model()
+
+ processing_order, end_frame_idx = self._get_processing_order(
+ inference_state,
+ start_frame_idx,
+ max_frame_num_to_track,
+ reverse=reverse,
+ )
+
+ # Store max_frame_num_to_track in feature_cache for downstream methods
+ inference_state["feature_cache"]["tracking_bounds"] = {
+ "max_frame_num_to_track": max_frame_num_to_track,
+ "propagate_in_video_start_frame_idx": start_frame_idx,
+ }
+
+ # Initialize or retrieve generator state from inference_state to persist across batches
+ if "generator_state" not in inference_state:
+ inference_state["generator_state"] = {
+ "hotstart_buffer": [],
+ "hotstart_removed_obj_ids": set(),
+ "unconfirmed_obj_ids_per_frame": {},
+ "postprocess_yield_list": [],
+ }
+
+ generator_state = inference_state["generator_state"]
+ hotstart_buffer = generator_state["hotstart_buffer"]
+ hotstart_removed_obj_ids = generator_state["hotstart_removed_obj_ids"]
+ unconfirmed_obj_ids_per_frame = generator_state["unconfirmed_obj_ids_per_frame"]
+ postprocess_yield_list = generator_state.get("postprocess_yield_list", [])
+
+ # when deciding whether to output a masklet on `yield_frame_idx`, we check whether the object is confirmed
+ # in a future frame (`unconfirmed_frame_delay` frames after the current frame). For example, if we require
+ # an object to be detected in 3 consecutive frames to be confirmed, then we look 2 frames in the future --
+ # e.g., we output an object on frame 4 only if it becomes confirmed on frame 6.
+ unconfirmed_status_delay = self.masklet_confirmation_consecutive_det_thresh - 1
+
+ for frame_idx in tqdm(
+ processing_order, desc="propagate_in_video", disable=self.rank > 0
+ ):
+ out = self._run_single_frame_inference(
+ inference_state,
+ frame_idx,
+ reverse,
+ is_instance_processing=is_instance_processing,
+ )
+
+ if self.hotstart_delay > 0:
+ # accumulate the outputs for the first `hotstart_delay` frames
+ hotstart_buffer.append([frame_idx, out])
+ # update the object IDs removed by hotstart so that we don't output them
+ if self.rank == 0:
+ hotstart_removed_obj_ids.update(out["removed_obj_ids"])
+ unconfirmed_obj_ids = out.get("unconfirmed_obj_ids", None)
+ if unconfirmed_obj_ids is not None:
+ unconfirmed_obj_ids_per_frame[frame_idx] = unconfirmed_obj_ids
+
+ if frame_idx == end_frame_idx and is_last_batch:
+ # we reached the end of propagation -- yield all frames in the buffer
+ yield_list = hotstart_buffer
+ hotstart_buffer = []
+ elif len(hotstart_buffer) >= self.hotstart_delay:
+ # we have enough frames -- yield and remove the first (oldest) frame from the buffer
+ yield_list = hotstart_buffer[:1]
+ hotstart_buffer = hotstart_buffer[1:]
+ else:
+ # not enough frames yet -- skip yielding
+ yield_list = []
+ else:
+ yield_list = [(frame_idx, out)] # output the current frame
+
+ # Accumulate yield_list into postprocess_yield_list
+ # Snapshot hotstart_removed_obj_ids at the time of accumulation to preserve
+ # the correct state for each frame (important: this set is mutated over time)
+ for yield_frame_idx, yield_out in yield_list:
+ postprocess_yield_list.append(
+ (yield_frame_idx, yield_out, set(hotstart_removed_obj_ids))
+ )
+
+ # Process batch when we have enough frames
+ while len(postprocess_yield_list) >= self.postprocess_batch_size:
+ batch_to_process = postprocess_yield_list[: self.postprocess_batch_size]
+ postprocess_yield_list = postprocess_yield_list[
+ self.postprocess_batch_size :
+ ]
+
+ with torch.profiler.record_function(
+ "Sam3MultiplexTrackingProd.postprocess_output_batched"
+ ):
+ if self.rank == 0:
+ # Prepare batched inputs for postprocessing
+ H_video, W_video = (
+ inference_state["orig_height"],
+ inference_state["orig_width"],
+ )
+ num_frames = inference_state["num_frames"]
+
+ batched_outs = []
+ frame_indices = []
+ for (
+ yield_frame_idx,
+ yield_out,
+ removed_obj_ids_snapshot,
+ ) in batch_to_process:
+ suppressed_obj_ids = yield_out["suppressed_obj_ids"]
+ unconfirmed_status_frame_idx = (
+ yield_frame_idx + unconfirmed_status_delay
+ if not reverse
+ else yield_frame_idx - unconfirmed_status_delay
+ )
+ unconfirmed_status_frame_idx = max(
+ 0, min(unconfirmed_status_frame_idx, num_frames - 1)
+ )
+ unconfirmed_obj_ids = unconfirmed_obj_ids_per_frame.get(
+ unconfirmed_status_frame_idx, None
+ )
+
+ batched_outs.append(
+ (
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ )
+ )
+ frame_indices.append(yield_frame_idx)
+
+ # Cache frame outputs
+ self._cache_frame_outputs(
+ inference_state,
+ yield_frame_idx,
+ yield_out["obj_id_to_mask"],
+ suppressed_obj_ids=suppressed_obj_ids,
+ removed_obj_ids=removed_obj_ids_snapshot,
+ unconfirmed_obj_ids=unconfirmed_obj_ids,
+ )
+
+ # Process all frames in batch
+ if self.postprocess_batch_size > 1:
+ postprocessed_outs = self._postprocess_output_batched(
+ H_video, W_video, batched_outs
+ )
+ else:
+ # Process each frame individually but output together
+ postprocessed_outs = []
+ for (
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ ) in batched_outs:
+ postprocessed_out = self._postprocess_output(
+ inference_state,
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ )
+ postprocessed_outs.append(postprocessed_out)
+
+ # Yield results
+ for yield_frame_idx, postprocessed_out in zip(
+ frame_indices, postprocessed_outs
+ ):
+ yield yield_frame_idx, postprocessed_out
+ else:
+ # No output on other GPUs
+ for yield_frame_idx, _, _ in batch_to_process:
+ yield yield_frame_idx, DUMMY_OUTPUT
+
+ # Handle remaining frames in hotstart buffer at end of last batch
+ if is_last_batch and len(hotstart_buffer) > 0:
+ for yield_frame_idx, yield_out in hotstart_buffer:
+ postprocess_yield_list.append(
+ (yield_frame_idx, yield_out, set(hotstart_removed_obj_ids))
+ )
+ hotstart_buffer = []
+
+ # Flush any remaining frames in the postprocess buffer (even partial
+ # batches) so that the caller gets results as soon as possible. This is
+ # especially important for the first batch where hotstart_delay causes
+ # only a few frames to exit the hotstart buffer — without this flush
+ # the client would have to wait for the next batch before receiving any
+ # output, hurting time-to-first-frame.
+ if len(postprocess_yield_list) > 0:
+ with torch.profiler.record_function(
+ "Sam3MultiplexTrackingProd.postprocess_output_batched"
+ ):
+ if self.rank == 0:
+ H_video, W_video = (
+ inference_state["orig_height"],
+ inference_state["orig_width"],
+ )
+ num_frames = inference_state["num_frames"]
+
+ batched_outs = []
+ frame_indices = []
+ for (
+ yield_frame_idx,
+ yield_out,
+ removed_obj_ids_snapshot,
+ ) in postprocess_yield_list:
+ suppressed_obj_ids = yield_out["suppressed_obj_ids"]
+ unconfirmed_status_frame_idx = (
+ yield_frame_idx + unconfirmed_status_delay
+ if not reverse
+ else yield_frame_idx - unconfirmed_status_delay
+ )
+ unconfirmed_status_frame_idx = max(
+ 0, min(unconfirmed_status_frame_idx, num_frames - 1)
+ )
+ unconfirmed_obj_ids = unconfirmed_obj_ids_per_frame.get(
+ unconfirmed_status_frame_idx, None
+ )
+
+ batched_outs.append(
+ (
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ )
+ )
+ frame_indices.append(yield_frame_idx)
+
+ self._cache_frame_outputs(
+ inference_state,
+ yield_frame_idx,
+ yield_out["obj_id_to_mask"],
+ suppressed_obj_ids=suppressed_obj_ids,
+ removed_obj_ids=removed_obj_ids_snapshot,
+ unconfirmed_obj_ids=unconfirmed_obj_ids,
+ )
+
+ if self.postprocess_batch_size > 1:
+ postprocessed_outs = self._postprocess_output_batched(
+ H_video, W_video, batched_outs
+ )
+ else:
+ # Process each frame individually but output together
+ postprocessed_outs = []
+ for (
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ ) in batched_outs:
+ postprocessed_out = self._postprocess_output(
+ inference_state,
+ yield_out,
+ removed_obj_ids_snapshot,
+ suppressed_obj_ids,
+ unconfirmed_obj_ids,
+ )
+ postprocessed_outs.append(postprocessed_out)
+
+ for yield_frame_idx, postprocessed_out in zip(
+ frame_indices, postprocessed_outs
+ ):
+ yield yield_frame_idx, postprocessed_out
+ else:
+ for yield_frame_idx, _, _ in postprocess_yield_list:
+ yield yield_frame_idx, DUMMY_OUTPUT
+
+ postprocess_yield_list = []
+
+ # Store the generator state back to inference_state for persistence across batches
+ generator_state["postprocess_yield_list"] = postprocess_yield_list
+ generator_state["hotstart_buffer"] = hotstart_buffer
+ generator_state["hotstart_removed_obj_ids"] = hotstart_removed_obj_ids
+ generator_state["unconfirmed_obj_ids_per_frame"] = unconfirmed_obj_ids_per_frame
+
+ if self.is_multiplex:
+ # log the bucket utilization stats
+ # bucket utilization rate is total valid objects / total capacity -> represents rooms for improvement
+ # subscription rate is total valid objects / total number of buckets -> represents speedup
+ total_valid_objects = 0
+ total_num_buckets = 0
+ for state in inference_state["sam2_inference_states"]:
+ assert (
+ len(state["obj_ids"])
+ == state["multiplex_state"].total_valid_entries
+ )
+ total_valid_objects += len(state["obj_ids"])
+ total_num_buckets += state["multiplex_state"].num_buckets
+ if total_num_buckets > 0:
+ bucket_utilization_rate = (
+ total_valid_objects / (total_num_buckets * self.bucket_capacity)
+ ) * 100
+ subscription_rate = (total_valid_objects / total_num_buckets) * 100
+ logger.info(
+ f"Bucket utilization rate: {bucket_utilization_rate:.2f}%, subscription rate: {subscription_rate:.2f}%"
+ )
+
+
+class Sam3MultiplexTrackingWithInteractivity(Sam3MultiplexTracking):
+ def __init__(
+ self,
+ use_prev_mem_frame=False,
+ use_stateless_refinement=False,
+ refinement_detector_cond_frame_removal_window=30 * 4,
+ **kwargs,
+ ):
+ """
+ use_prev_mem_frame: bool, whether to condition on previous memory frames for adding points
+ use_stateless_refinement: bool, whether to enable stateless refinement behavior
+ refinement_detector_cond_frame_removal_window: int, we remove a detector conditioning frame if it
+ is within this many frames of a user refined frame. Set to a large value (e.g. 10000) to
+ always remove detector conditioning frames if there is any user refinement in the video.
+ """
+ super().__init__(**kwargs)
+ self.use_prev_mem_frame = use_prev_mem_frame
+ self.use_stateless_refinement = use_stateless_refinement
+ self.refinement_detector_cond_frame_removal_window = (
+ refinement_detector_cond_frame_removal_window
+ )
+
+ @torch.inference_mode()
+ def init_state(
+ self,
+ resource_path,
+ offload_video_to_cpu=False,
+ async_loading_frames=False,
+ use_torchcodec=False,
+ use_cv2=False,
+ input_is_mp4=False,
+ ):
+ inference_state = super().init_state(
+ resource_path=resource_path,
+ offload_video_to_cpu=offload_video_to_cpu,
+ async_loading_frames=async_loading_frames,
+ use_torchcodec=use_torchcodec,
+ use_cv2=use_cv2,
+ input_is_mp4=input_is_mp4,
+ )
+ # initialize extra states
+ inference_state["action_history"] = [] # for logging user actions
+ if self.tracker.per_obj_inference:
+ # in per_obj mode only 1 inference state is needed, we init it here.
+ inference_state["sam2_inference_states"] = [
+ self._init_new_sam2_state(inference_state)
+ ]
+ return inference_state
+
+ def reset_state(self, inference_state):
+ super().reset_state(inference_state)
+ # reset extra states
+ inference_state["action_history"].clear()
+ if self.tracker.per_obj_inference:
+ inference_state["sam2_inference_states"] = [
+ self._init_new_sam2_state(inference_state)
+ ]
+
+ def _init_new_sam2_state(self, inference_state):
+ return self.tracker.init_state(
+ cached_features=inference_state["feature_cache"],
+ video_height=inference_state["orig_height"],
+ video_width=inference_state["orig_width"],
+ num_frames=inference_state["num_frames"],
+ )
+
+ def cancel_propagation(self, inference_state):
+ """
+ Cancel any ongoing propagation and reset the model state.
+ """
+ logger.info("Cancelling ongoing propagation.")
+ self.add_action_history(
+ inference_state,
+ action_type="propagation_cancel",
+ obj_ids=None,
+ frame_idx=None,
+ )
+
+ def fetch_and_process_single_frame_results(self, inference_state, frame_idx):
+ tracker_metadata = inference_state["tracker_metadata"]
+ obj_id_to_mask = inference_state["cached_frame_outputs"][frame_idx]
+ # post processing - remove suppressed obj_ids
+ obj_id_to_score = tracker_metadata["obj_id_to_score"]
+ suppressed_obj_ids = tracker_metadata["rank0_metadata"]["suppressed_obj_ids"][
+ frame_idx
+ ]
+ obj_id_to_sam2_score = tracker_metadata["obj_id_to_sam2_score_frame_wise"][
+ frame_idx
+ ]
+
+ out = {
+ "obj_id_to_mask": obj_id_to_mask,
+ "obj_id_to_score": obj_id_to_score,
+ "obj_id_to_sam2_score": obj_id_to_sam2_score,
+ }
+ return frame_idx, self._postprocess_output(
+ inference_state, out, suppressed_obj_ids=suppressed_obj_ids
+ )
+
+ @torch.inference_mode()
+ def propagate_in_video(
+ self,
+ inference_state,
+ start_frame_idx=None,
+ max_frame_num_to_track=None,
+ reverse=False,
+ output_prob_thresh=0.5,
+ compute_stability_score=False,
+ is_instance_processing=False,
+ is_last_batch: bool = False,
+ ):
+ # step 1: check which type of propagation to run, should be the same for all GPUs.
+ propagation_type, obj_ids = self.parse_action_history_for_propagation(
+ inference_state
+ )
+ self.add_action_history(
+ inference_state,
+ action_type=propagation_type,
+ obj_ids=obj_ids,
+ frame_idx=start_frame_idx,
+ )
+
+ # step 2: run full VG propagation
+ if propagation_type == "propagation_full":
+ logger.info(f"Running full VG propagation (reverse={reverse}).")
+ yield from super().propagate_in_video(
+ inference_state,
+ start_frame_idx=start_frame_idx,
+ max_frame_num_to_track=max_frame_num_to_track,
+ reverse=reverse,
+ is_last_batch=is_last_batch,
+ )
+ return
+
+ # step 3: run SAM2 partial propagation or direct fetch existing predictions
+ assert propagation_type in ["propagation_partial", "propagation_fetch"]
+ logger.info(
+ f"Running SAM2 propagation for objects {obj_ids} and merging it with existing VG predictions (reverse={reverse})."
+ if propagation_type == "propagation_partial"
+ else f"Fetching existing VG predictions without running any propagation (reverse={reverse})."
+ )
+ processing_order, _end_frame_idx = self._get_processing_order(
+ inference_state,
+ start_frame_idx=start_frame_idx,
+ max_frame_num_to_track=max_frame_num_to_track,
+ reverse=reverse,
+ )
+
+ tracker_metadata = inference_state["tracker_metadata"]
+
+ # if fetch just return from output
+ if propagation_type == "propagation_fetch":
+ for frame_idx in tqdm(processing_order):
+ if self.rank == 0:
+ frame_idx, out = self.fetch_and_process_single_frame_results(
+ inference_state, frame_idx
+ )
+ yield frame_idx, out
+ else:
+ yield frame_idx, DUMMY_OUTPUT # no output for other GPUs
+
+ return
+
+ # get SAM2 inference states containing selected obj_ids
+ if propagation_type == "propagation_partial":
+ # can be empty for GPUs where objects are not in their inference states
+ tracker_states_local = self._get_sam2_inference_states_by_obj_ids(
+ inference_state, obj_ids
+ )
+ for sam2_state in tracker_states_local:
+ self.tracker.propagate_in_video_preflight(
+ sam2_state, run_mem_encoder=True
+ )
+
+ for frame_idx in tqdm(processing_order):
+ # run SAM2 propagation
+ if propagation_type == "propagation_partial":
+ self._prepare_backbone_feats(inference_state, frame_idx, reverse)
+ obj_ids_local, low_res_masks_local, sam2_scores_local = (
+ self._propogate_tracker_one_frame_local_gpu(
+ tracker_states_local,
+ frame_idx=frame_idx,
+ reverse=reverse,
+ run_mem_encoder=True,
+ )
+ )
+
+ # broadcast refined object sam2 scores and masks to all GPUs
+ # handle multiple objects that can be located on different GPUs
+ refined_obj_data = {} # obj_id -> (score, mask_video_res)
+
+ # Collect data for objects on this GPU
+ local_obj_data = {}
+ for obj_id in obj_ids:
+ obj_rank = self._get_gpu_id_by_obj_id(inference_state, obj_id)
+ if self.rank == obj_rank and obj_id in obj_ids_local:
+ refined_obj_idx = obj_ids_local.index(obj_id)
+ refined_mask_low_res = low_res_masks_local[
+ refined_obj_idx
+ ] # (H_low_res, W_low_res)
+ refined_score = sam2_scores_local[refined_obj_idx]
+
+ # Keep low resolution for broadcasting to reduce communication cost
+ local_obj_data[obj_id] = (refined_score, refined_mask_low_res)
+
+ # Broadcast data from each GPU that has refined objects
+ if self.world_size > 1:
+ for obj_id in obj_ids:
+ obj_rank = self._get_gpu_id_by_obj_id(inference_state, obj_id)
+ if self.rank == obj_rank:
+ # This GPU has the object, broadcast its data
+ data_to_broadcast = local_obj_data.get(obj_id, None)
+ data_list = [data_to_broadcast]
+ self.broadcast_python_obj_cpu(data_list, src=obj_rank)
+ if data_to_broadcast is not None:
+ refined_obj_data[obj_id] = data_to_broadcast
+ elif self.rank != obj_rank:
+ # This GPU doesn't have the object, receive data
+ data_list = [None]
+ self.broadcast_python_obj_cpu(data_list, src=obj_rank)
+ if data_list[0] is not None:
+ refined_obj_data[obj_id] = data_list[0]
+ else:
+ # Single GPU case
+ refined_obj_data = local_obj_data
+
+ # Update SAM2 scores for all refined objects
+ for obj_id, (refined_score, _) in refined_obj_data.items():
+ # After broadcast_python_obj_cpu in multi-GPU, tensors may become numpy scalars
+ # Ensure it's a GPU tensor for consistency with base class behavior
+ if not isinstance(refined_score, torch.Tensor):
+ refined_score = torch.tensor(
+ refined_score, dtype=torch.float32, device=self.device
+ )
+ tracker_metadata["obj_id_to_sam2_score_frame_wise"][
+ frame_idx
+ ].update({obj_id: refined_score})
+
+ if self.rank == 0:
+ # get predictions from SAM2 inference states, it includes the original
+ # VG predictions and the refined predictions from interactivity.
+
+ # Prepare refined masks dictionary - upscale to video resolution after broadcast
+ refined_obj_id_to_mask = {}
+ for obj_id, (_, refined_mask_low_res) in refined_obj_data.items():
+ refined_mask_video_res = (
+ self._convert_low_res_mask_to_video_res(
+ refined_mask_low_res, inference_state
+ )
+ ) # (1, H_video, W_video) bool
+ refined_obj_id_to_mask[obj_id] = refined_mask_video_res
+
+ obj_id_to_mask = self._build_sam2_output(
+ inference_state, frame_idx, refined_obj_id_to_mask
+ )
+ out = {
+ "obj_id_to_mask": obj_id_to_mask,
+ "obj_id_to_score": tracker_metadata["obj_id_to_score"],
+ "obj_id_to_sam2_score": tracker_metadata[
+ "obj_id_to_sam2_score_frame_wise"
+ ][frame_idx],
+ }
+ suppressed_obj_ids = tracker_metadata["rank0_metadata"][
+ "suppressed_obj_ids"
+ ][frame_idx]
+ self._cache_frame_outputs(
+ inference_state,
+ frame_idx,
+ obj_id_to_mask,
+ suppressed_obj_ids=suppressed_obj_ids,
+ )
+ suppressed_obj_ids = tracker_metadata["rank0_metadata"][
+ "suppressed_obj_ids"
+ ][frame_idx]
+ yield (
+ frame_idx,
+ self._postprocess_output(
+ inference_state, out, suppressed_obj_ids=suppressed_obj_ids
+ ),
+ )
+ else:
+ yield frame_idx, DUMMY_OUTPUT # no output for other GPUs
+
+ def add_action_history(
+ self, inference_state, action_type, frame_idx=None, obj_ids=None
+ ):
+ """
+ action_history is used to automatically decide what to do during propagation.
+ action_type: one of ["add", "remove", "refine"] + ["propagation_full", "propagation_partial", "propagation_fetch", "propagation_cancel"]
+ """
+ instance_actions = ["add", "remove", "refine"]
+ propagation_actions = [
+ "propagation_full",
+ "propagation_partial",
+ "propagation_fetch",
+ "propagation_cancel",
+ ]
+ assert action_type in instance_actions + propagation_actions, (
+ f"Invalid action type: {action_type}, must be one of {instance_actions + propagation_actions}"
+ )
+ action = {
+ "type": action_type,
+ "frame_idx": frame_idx,
+ "obj_ids": obj_ids,
+ }
+ inference_state["action_history"].append(action)
+
+ def _has_object_been_refined(self, inference_state, obj_id):
+ if "action_history" not in inference_state:
+ return False
+ action_history = inference_state["action_history"]
+ for action in action_history:
+ if action["type"] in ["add", "refine"] and action.get("obj_ids"):
+ if obj_id in action["obj_ids"]:
+ return True
+ return False
+
+ def parse_action_history_for_propagation(self, inference_state):
+ action_history = inference_state["action_history"]
+ if (
+ len(action_history) == 1
+ and action_history[0]["type"] == "propagation_cancel"
+ ):
+ # only one action and it is cancel, we do full propagation
+ return "propagation_full", None
+ elif (
+ len(action_history) >= 2
+ and action_history[-1]["type"] == "propagation_cancel"
+ ):
+ # last action is cancel, we go back to the action before cancel
+ action_before_cancelation = inference_state["action_history"][-2]
+ # the action before cancellation can be a propagation_fetch from running both forward
+ # and backward propagation as in webdemo interface, in that case we go back one more step
+ if action_before_cancelation["type"] == "propagation_fetch":
+ action_before_cancelation = inference_state["action_history"][-3]
+ return action_before_cancelation["type"], action_before_cancelation.get(
+ "obj_ids", None
+ )
+ return self._parse_action_history_for_propagation(
+ inference_state["action_history"], inference_state["num_frames"]
+ )
+
+ def _parse_action_history_for_propagation(self, action_history, num_frames):
+ """
+ Parse the actions in history before the last propagation and prepare for the next propagation.
+ We support multiple actions (add/remove/refine) between two propagations. If we had an action
+ history similar to this ["propagate", "add", "refine", "remove", "add"], the next propagation
+ would remove the removed object, and also propagate the two added/refined objects.
+
+ Returns:
+ propagation_type: one of ["propagation_full", "propagation_partial", "propagation_fetch"]
+ - "propagation_full": run VG propagation for all objects
+ - "propagation_partial": run SAM2 propagation for selected objects, useful for add/refine actions
+ - "propagation_fetch": fetch existing VG predictions without running any propagation
+ - "propagation_cancel": this will be handled in parse_action_history_for_propagation() not this function.
+ obj_ids: list of object ids to run SAM2 propagation on if propagation_type is "propagation_partial".
+
+ TODO: (Jie) this function works for our current workflows, but may need more tests to ensure it works
+ correctly with different action histories for future workflows.
+ """
+ if len(action_history) == 0:
+ # we run propagation for the first time
+ return "propagation_full", None
+
+ if "propagation" in action_history[-1]["type"]:
+ if action_history[-1]["type"] in ["propagation_fetch"]:
+ # last propagation is direct fetch, we fetch existing predictions
+ return "propagation_fetch", None
+ elif action_history[-1]["type"] in [
+ "propagation_partial",
+ "propagation_full",
+ ]:
+ # we do fetch prediction if we have already run propagation twice or we have run
+ # propagation once and it is from the first frame or last frame.
+ if (
+ len(action_history) > 1
+ and action_history[-2]["type"]
+ in ["propagation_partial", "propagation_full"]
+ ) or action_history[-1]["frame_idx"] in [
+ 0,
+ num_frames - 1,
+ ]:
+ # we have run both forward and backward partial/full propagation
+ return "propagation_fetch", None
+ else:
+ # we have run partial/full forward or backward propagation once, need run it for the rest of the frames
+ return action_history[-1]["type"], action_history[-1]["obj_ids"]
+
+ # parse actions since last propagation
+ obj_ids = []
+ for action in action_history[::-1]:
+ if "propagation" in action["type"]:
+ # we reached the last propagation action, stop parsing
+ break
+ if action["type"] in ["add", "refine"]:
+ obj_ids.extend(action["obj_ids"])
+ # else action["type"] == "remove": noop
+ obj_ids = list(set(obj_ids)) if len(obj_ids) > 0 else None
+ propagation_type = (
+ "propagation_partial" if obj_ids is not None else "propagation_fetch"
+ )
+ return propagation_type, obj_ids
+
+ def remove_object(self, inference_state, obj_id, frame_idx, is_user_action=False):
+ """
+ We try to remove object from sam2 states on every GPU, it will do nothing
+ for states without this object.
+ """
+ obj_rank = self._get_gpu_id_by_obj_id(inference_state, obj_id)
+ if obj_rank is None:
+ # Object was already removed (e.g., by hotstart heuristics during
+ # propagation). Log a warning and skip SAM2 state and metadata
+ # removal, but still record action history and clean up cached outputs.
+ logger.warning(
+ f"Object {obj_id} not found in any GPU (already removed). "
+ f"Skipping SAM2 state and metadata removal."
+ )
+ else:
+ tracker_states_local = inference_state["sam2_inference_states"]
+ if self.rank == obj_rank:
+ self._tracker_remove_objects(tracker_states_local, [obj_id])
+
+ # update metadata
+ tracker_metadata = inference_state["tracker_metadata"]
+ _obj_ids = tracker_metadata["obj_ids_per_gpu"][obj_rank]
+ tracker_metadata["obj_ids_per_gpu"][obj_rank] = _obj_ids[_obj_ids != obj_id]
+ tracker_metadata["num_obj_per_gpu"][obj_rank] = len(
+ tracker_metadata["obj_ids_per_gpu"][obj_rank]
+ )
+ tracker_metadata["obj_ids_all_gpu"] = np.concatenate(
+ tracker_metadata["obj_ids_per_gpu"]
+ )
+ tracker_metadata["obj_id_to_score"].pop(obj_id, None)
+ # tracker_metadata["max_obj_id"] # we do not reuse the object id, so we do not update it here
+
+ if is_user_action:
+ self.add_action_history(
+ inference_state, action_type="remove", obj_ids=[obj_id]
+ )
+
+ # Clean up cached frame outputs to remove references to the deleted object
+ if "cached_frame_outputs" in inference_state:
+ for _frame_idx in inference_state["cached_frame_outputs"]:
+ frame_cache = inference_state["cached_frame_outputs"][_frame_idx]
+ if obj_id in frame_cache:
+ del frame_cache[obj_id]
+
+ out = None
+ if frame_idx is not None and self.rank == 0:
+ frame_idx, out = self.fetch_and_process_single_frame_results(
+ inference_state, frame_idx
+ )
+ return frame_idx, out
+
+ def _get_gpu_id_by_obj_id(self, inference_state, obj_id):
+ """
+ Locate GPU ID for a given object.
+ """
+ obj_ids_per_gpu = inference_state["tracker_metadata"]["obj_ids_per_gpu"]
+ for rank, obj_ids in enumerate(obj_ids_per_gpu):
+ if obj_id in obj_ids:
+ return rank
+ return None # object not found in any GPU
+
+ def _get_sam2_inference_states_by_obj_ids(self, inference_state, obj_ids):
+ """
+ Get the SAM2 inference states that contain the given object ids.
+ This is used to run partial SAM2 propagation on a single object/bucket.
+ Possibly multiple or zero states can be returned.
+ """
+ states = [
+ state
+ for state in inference_state["sam2_inference_states"]
+ if set(obj_ids) & set(state["obj_ids"])
+ ]
+ return states
+
+ def _prepare_backbone_feats(self, inference_state, frame_idx, reverse):
+ input_batch = inference_state["input_batch"]
+ feature_cache = inference_state["feature_cache"]
+ num_frames = inference_state["num_frames"]
+ geometric_prompt = (
+ inference_state["constants"]["empty_geometric_prompt"]
+ if inference_state["per_frame_geometric_prompt"][frame_idx] is None
+ else inference_state["per_frame_geometric_prompt"][frame_idx]
+ )
+ _ = self.run_backbone_and_detection(
+ frame_idx=frame_idx,
+ num_frames=num_frames,
+ reverse=reverse,
+ input_batch=input_batch,
+ geometric_prompt=geometric_prompt,
+ feature_cache=feature_cache,
+ )
+
+ @torch.inference_mode()
+ def add_prompt(
+ self,
+ inference_state,
+ frame_idx,
+ text_str=None,
+ clear_old_points=True,
+ points=None,
+ point_labels=None,
+ boxes_xywh=None,
+ box_labels=None,
+ clear_old_boxes=True,
+ output_prob_thresh=0.5,
+ obj_id=None,
+ rel_coordinates=True,
+ ):
+ if points is not None:
+ # SAM2 instance prompts
+ assert text_str is None and boxes_xywh is None, (
+ "When points are provided, text_str and boxes_xywh must be None."
+ )
+ assert obj_id is not None, (
+ "When points are provided, obj_id must be provided."
+ )
+ return self.add_sam2_new_points(
+ inference_state,
+ frame_idx,
+ obj_id=obj_id,
+ points=points,
+ labels=point_labels,
+ clear_old_points=clear_old_points,
+ rel_coordinates=rel_coordinates,
+ use_prev_mem_frame=self.use_prev_mem_frame,
+ )
+ else:
+ # SAM3 prompts — disable batched grounding for single-frame add_prompt
+ _orig_batched = self.use_batched_grounding
+ self.use_batched_grounding = False
+ try:
+ return super().add_prompt(
+ inference_state,
+ frame_idx,
+ text_str=text_str,
+ clear_old_points=clear_old_points,
+ points=points,
+ point_labels=point_labels,
+ boxes_xywh=boxes_xywh,
+ box_labels=box_labels,
+ clear_old_boxes=clear_old_boxes,
+ output_prob_thresh=output_prob_thresh,
+ )
+ finally:
+ self.use_batched_grounding = _orig_batched
+
+ @torch.inference_mode()
+ def add_sam2_new_points(
+ self,
+ inference_state,
+ frame_idx,
+ obj_id,
+ points,
+ labels,
+ clear_old_points,
+ rel_coordinates=True,
+ use_prev_mem_frame=False,
+ ):
+ """Add a new point prompt to SAM2. Suppporting instance refinement to existing
+ objects by passing existing obj_id or adding a new object by passing a new obj_id.
+ use_prev_mem_frame=False to disable cross attention to previous memory frames.
+ Every GPU returns the same results, and results should contain all masks including
+ these masks not refined or not added by the current user points.
+ """
+ assert obj_id is not None, "obj_id must be provided to add new points"
+ tracker_metadata = inference_state["tracker_metadata"]
+ if tracker_metadata == {}:
+ # initialize masklet metadata if it's uninitialized (empty dict)
+ tracker_metadata.update(self._initialize_metadata())
+
+ obj_rank = self._get_gpu_id_by_obj_id(inference_state, obj_id)
+
+ # prepare feature
+ self._prepare_backbone_feats(inference_state, frame_idx, reverse=False)
+
+ object_has_been_refined = self._has_object_been_refined(inference_state, obj_id)
+ if (
+ obj_rank is not None
+ and self.use_stateless_refinement
+ and not object_has_been_refined
+ ):
+ # The first time we start refinement on the object, we remove it.
+ logger.info(
+ f"[rank={self.rank}] Removing object {obj_id} before refinement."
+ )
+ self.remove_object(inference_state, obj_id, is_user_action=False)
+ obj_rank = None
+ elif obj_rank is not None and not object_has_been_refined:
+ # Extract the object into its own singleton inference state if it belongs to a batch
+ if self.rank == obj_rank and not self.tracker.per_obj_inference:
+ tracker_states = self._get_sam2_inference_states_by_obj_ids(
+ inference_state, [obj_id]
+ )
+ assert len(tracker_states) == 1
+ # Check if this is a batched state (contains multiple objects)
+ sam2_state = tracker_states[0]
+ if len(sam2_state["obj_ids"]) > 1:
+ logger.info(
+ f"[rank={self.rank}] Extracting object {obj_id} into singleton inference state."
+ )
+ self._extract_object_to_singleton_state(
+ inference_state, obj_id, obj_rank
+ )
+
+ if obj_rank is None:
+ # new object, we assign it a GPU and create a new inference state if limit allows
+ num_prev_obj = np.sum(tracker_metadata["num_obj_per_gpu"])
+ if num_prev_obj >= self.max_num_objects:
+ logger.warning(
+ f"add_sam2_new_points: cannot add a new object as we are already tracking {num_prev_obj=} "
+ f"masklets (under {self.max_num_objects=})"
+ )
+ return frame_idx, None
+
+ new_det_gpu_ids = self._assign_new_det_to_gpus(
+ new_det_num=1,
+ prev_workload_per_gpu=tracker_metadata["num_obj_per_gpu"],
+ )
+ obj_rank = new_det_gpu_ids[0]
+
+ # get sam2 inference state for the new object
+ if self.rank == obj_rank:
+ if self.tracker.per_obj_inference:
+ sam2_state = inference_state["sam2_inference_states"][0]
+ else:
+ # for batched inference, we create a new inference state
+ sam2_state = self._init_new_sam2_state(inference_state)
+ inference_state["sam2_inference_states"].append(sam2_state)
+
+ # update metadata
+ tracker_metadata["obj_ids_per_gpu"][obj_rank] = np.concatenate(
+ [
+ tracker_metadata["obj_ids_per_gpu"][obj_rank],
+ np.array([obj_id], dtype=np.int64),
+ ]
+ )
+ tracker_metadata["num_obj_per_gpu"][obj_rank] = len(
+ tracker_metadata["obj_ids_per_gpu"][obj_rank]
+ )
+ tracker_metadata["obj_ids_all_gpu"] = np.concatenate(
+ tracker_metadata["obj_ids_per_gpu"]
+ )
+ tracker_metadata["max_obj_id"] = max(tracker_metadata["max_obj_id"], obj_id)
+
+ logger.info(
+ f"[rank={self.rank}] Adding new object with id {obj_id} at frame {frame_idx}."
+ )
+ self.add_action_history(
+ inference_state, "add", frame_idx=frame_idx, obj_ids=[obj_id]
+ )
+ else:
+ # existing object, for refinement
+ if self.rank == obj_rank:
+ tracker_states = self._get_sam2_inference_states_by_obj_ids(
+ inference_state, [obj_id]
+ )
+ assert len(tracker_states) == 1, (
+ f"[rank={self.rank}] Multiple SAM2 inference states found for the same object id."
+ )
+ sam2_state = tracker_states[0]
+
+ # log
+ logger.info(
+ f"[rank={self.rank}] Refining existing object with id {obj_id} at frame {frame_idx}."
+ )
+ self.add_action_history(
+ inference_state, "refine", frame_idx=frame_idx, obj_ids=[obj_id]
+ )
+
+ # assign higher score to added/refined object
+ tracker_metadata["obj_id_to_score"][obj_id] = 1.0
+ tracker_metadata["obj_id_to_sam2_score_frame_wise"][frame_idx][obj_id] = (
+ torch.tensor(1.0, dtype=torch.float32, device=self.device)
+ )
+
+ if self.rank == 0:
+ rank0_metadata = tracker_metadata.get("rank0_metadata", {})
+
+ if "removed_obj_ids" in rank0_metadata:
+ rank0_metadata["removed_obj_ids"].discard(obj_id)
+
+ if "suppressed_obj_ids" in rank0_metadata:
+ for frame_id in rank0_metadata["suppressed_obj_ids"]:
+ rank0_metadata["suppressed_obj_ids"][frame_id].discard(obj_id)
+
+ if "masklet_confirmation" in rank0_metadata:
+ obj_ids_all_gpu = tracker_metadata["obj_ids_all_gpu"]
+ obj_indices = np.where(obj_ids_all_gpu == obj_id)[0]
+ if len(obj_indices) > 0:
+ obj_idx = obj_indices[0]
+ if obj_idx < len(rank0_metadata["masklet_confirmation"]["status"]):
+ rank0_metadata["masklet_confirmation"]["status"][obj_idx] = 1
+ rank0_metadata["masklet_confirmation"]["consecutive_det_num"][
+ obj_idx
+ ] = self.masklet_confirmation_consecutive_det_thresh
+
+ if self.rank == obj_rank:
+ should_fallback_to_original_mask = (
+ len(points) == 0 and inference_state["is_image_only"]
+ )
+ if should_fallback_to_original_mask:
+ mask_input = self._get_mask_input(sam2_state, frame_idx, obj_id)
+ if mask_input is None or 0 in mask_input.shape:
+ logger.warning(
+ f"Cannot retrieve original mask input for obj_id {obj_id} at frame {frame_idx} to fallback."
+ )
+ should_fallback_to_original_mask = False
+ if should_fallback_to_original_mask:
+ # When user cancels all points on an image, we recover the original mask
+ # by re-feeding the detector mask to SAM2.
+ mask_input = self._get_mask_input(sam2_state, frame_idx, obj_id)
+ # clear out states related to this object to have a fresh start
+ self.tracker.clear_all_points_in_frame(
+ sam2_state, frame_idx, obj_id, need_output=False
+ )
+ frame_idx, obj_ids, low_res_masks, video_res_masks = (
+ self.tracker.add_new_mask(
+ sam2_state,
+ frame_idx,
+ obj_id,
+ mask_input,
+ )
+ )
+ else:
+ frame_idx, obj_ids, low_res_masks, video_res_masks = (
+ self.tracker.add_new_points(
+ inference_state=sam2_state,
+ frame_idx=frame_idx,
+ obj_id=obj_id,
+ points=points,
+ labels=labels,
+ clear_old_points=clear_old_points,
+ rel_coordinates=rel_coordinates,
+ use_prev_mem_frame=use_prev_mem_frame,
+ )
+ )
+
+ if video_res_masks is not None and len(video_res_masks) > 0:
+ video_res_masks = fill_holes_in_mask_scores(
+ video_res_masks, # shape (N, 1, H_video, W_video)
+ fill_hole_area=self.fill_hole_area,
+ sprinkle_removal_area=self.sprinkle_removal_area,
+ fill_holes=True,
+ remove_sprinkles=True,
+ )
+
+ # TODO: will this cause issue when user switching to refine another object?
+ # Since the mem encoder has already run for the current input points?
+ # FIX: Synchronize consolidated_frame_inds with actual point/mask
+ # inputs before propagate_in_video_preflight. Two issues can cause
+ # the `all_consolidated_frame_inds == input_frames_inds` assertion
+ # to fail:
+ # 1) VG detector conditioning frames in mask_inputs_per_obj without
+ # corresponding point inputs (stale VG entries).
+ # 2) Previously consolidated point-input frames (from earlier
+ # add_points) whose consolidated_frame_inds entries were lost
+ # during subsequent propagation.
+ # We fix both by: (a) clearing mask-only inputs, (b) rebuilding
+ # consolidated_frame_inds from the remaining inputs, excluding
+ # temp output frames (which preflight will add itself).
+
+ # (a) Clear detector-only mask inputs
+ for _obj_idx in list(sam2_state["mask_inputs_per_obj"].keys()):
+ _point_frames = set(
+ sam2_state["point_inputs_per_obj"].get(_obj_idx, {}).keys()
+ )
+ _mask_only_frames = [
+ f
+ for f in list(sam2_state["mask_inputs_per_obj"][_obj_idx].keys())
+ if f not in _point_frames
+ ]
+ for f in _mask_only_frames:
+ sam2_state["mask_inputs_per_obj"][_obj_idx].pop(f, None)
+
+ # (b) Rebuild consolidated_frame_inds from remaining inputs
+ _input_frames = set()
+ for _oi in sam2_state["point_inputs_per_obj"]:
+ _input_frames.update(sam2_state["point_inputs_per_obj"][_oi].keys())
+ for _oi in sam2_state["mask_inputs_per_obj"]:
+ _input_frames.update(sam2_state["mask_inputs_per_obj"][_oi].keys())
+ # Exclude temp output frames — preflight will consolidate those
+ _temp_frames = set()
+ for _obj_temp in sam2_state["temp_output_dict_per_obj"].values():
+ _temp_frames.update(_obj_temp["cond_frame_outputs"].keys())
+ _temp_frames.update(_obj_temp["non_cond_frame_outputs"].keys())
+ _prev_frames = _input_frames - _temp_frames
+ _cond = set()
+ _non_cond = set()
+ for f in _prev_frames:
+ if f in sam2_state["output_dict"].get("cond_frame_outputs", {}):
+ _cond.add(f)
+ else:
+ _non_cond.add(f)
+ sam2_state["consolidated_frame_inds"] = {
+ "cond_frame_outputs": _cond,
+ "non_cond_frame_outputs": _non_cond,
+ }
+ self.tracker.propagate_in_video_preflight(sam2_state, run_mem_encoder=True)
+ if not inference_state["is_image_only"]:
+ # Clear detector conditioning frames when user clicks are received to allow
+ # model updating masks on these frames. It is a noop if user is refining on the
+ # detector conditioning frames or adding new objects.
+ self.clear_detector_added_cond_frame_in_sam2(
+ sam2_state, obj_id, frame_idx
+ )
+
+ # fetch results from states and gather across GPUs
+ # Use optimized caching approach to avoid reprocessing unmodified objects
+ if self.rank == obj_rank and len(obj_ids) > 0:
+ new_mask_data = (video_res_masks[obj_ids.index(obj_id)] > 0.0).to(
+ torch.bool
+ )
+ else:
+ new_mask_data = None
+
+ # Broadcast the new mask data across all ranks for consistency
+ if self.world_size > 1:
+ data_list = [new_mask_data]
+ self.broadcast_python_obj_cpu(data_list, src=obj_rank)
+ new_mask_data = data_list[0]
+
+ if self.rank == 0:
+ obj_id_to_mask = self._build_sam2_output(
+ inference_state,
+ frame_idx,
+ {obj_id: new_mask_data} if new_mask_data is not None else None,
+ )
+ # post processing - remove suppressed obj_ids
+ obj_id_to_score = tracker_metadata["obj_id_to_score"]
+ suppressed_obj_ids = tracker_metadata["rank0_metadata"][
+ "suppressed_obj_ids"
+ ][frame_idx]
+ obj_id_to_sam2_score = tracker_metadata["obj_id_to_sam2_score_frame_wise"][
+ frame_idx
+ ]
+
+ out = {
+ "obj_id_to_mask": obj_id_to_mask,
+ "obj_id_to_score": obj_id_to_score,
+ "obj_id_to_sam2_score": obj_id_to_sam2_score,
+ }
+ self._cache_frame_outputs(
+ inference_state,
+ frame_idx,
+ obj_id_to_mask,
+ suppressed_obj_ids=suppressed_obj_ids,
+ )
+ return frame_idx, self._postprocess_output(
+ inference_state, out, suppressed_obj_ids=suppressed_obj_ids
+ )
+ else:
+ return frame_idx, None # no output on other GPUs
+
+ def _get_mask_input(self, inference_state, frame_idx, obj_id):
+ """Get the mask input for a specific object on a specific frame."""
+ obj_idx = self.tracker._obj_id_to_idx(inference_state, obj_id)
+ mask_inputs_per_frame = inference_state["mask_inputs_per_obj"][obj_idx]
+ if frame_idx not in mask_inputs_per_frame:
+ logger.info(
+ f"frame {frame_idx} not in mask_inputs_per_frame for obj_id {obj_id}"
+ )
+ return None
+
+ mask_inputs_orig = mask_inputs_per_frame[frame_idx].squeeze(0, 1) # (H, W)
+ return mask_inputs_orig
+
+ def _gather_obj_id_to_mask_across_gpus(self, inference_state, obj_id_to_mask_local):
+ """Gather obj_id_to_mask from all GPUs. Optionally resize the masks to the video resolution."""
+ tracker_metadata = inference_state["tracker_metadata"]
+
+ # concatenate the output masklets from all local inference states
+ H_mask = W_mask = self.tracker.low_res_mask_size
+ obj_ids_local = tracker_metadata["obj_ids_per_gpu"][self.rank]
+ low_res_masks_local = []
+ for obj_id in obj_ids_local:
+ if obj_id in obj_id_to_mask_local:
+ low_res_masks_local.append(obj_id_to_mask_local[obj_id])
+ else:
+ low_res_masks_local.append(
+ torch.full((H_mask, W_mask), -1024.0, device=self.device)
+ )
+ if len(low_res_masks_local) > 0:
+ low_res_masks_local = torch.stack(low_res_masks_local, dim=0) # (N, H, W)
+ assert low_res_masks_local.shape[1:] == (H_mask, W_mask)
+ else:
+ low_res_masks_local = torch.zeros(0, H_mask, W_mask, device=self.device)
+
+ # all-gather `low_res_masks_local` into `low_res_masks_global`
+ # - low_res_masks_global: Tensor -- (num_global_obj, H_mask, W_mask)
+ if self.world_size > 1:
+ low_res_masks_local = low_res_masks_local.float().contiguous()
+ low_res_masks_peers = [
+ low_res_masks_local.new_empty(num_obj, H_mask, W_mask)
+ for num_obj in tracker_metadata["num_obj_per_gpu"]
+ ]
+ dist.all_gather(low_res_masks_peers, low_res_masks_local)
+ low_res_masks_global = torch.cat(low_res_masks_peers, dim=0)
+ else:
+ low_res_masks_global = low_res_masks_local
+ return low_res_masks_global
+
+ def _convert_low_res_mask_to_video_res(self, low_res_mask, inference_state):
+ """
+ Convert a low-res mask to video resolution, matching the format expected by _build_sam2_output.
+
+ Args:
+ low_res_mask: Tensor of shape (H_low_res, W_low_res)
+ inference_state: Contains video dimensions
+
+ Returns:
+ video_res_mask: Tensor of shape (1, H_video, W_video) bool
+ """
+ if low_res_mask is None:
+ return None
+
+ # Convert to 3D for interpolation: (H_low_res, W_low_res) -> (1, H_low_res, W_low_res)
+ low_res_mask_3d = low_res_mask.unsqueeze(0).unsqueeze(0)
+
+ # Get video dimensions
+ H_video = inference_state["orig_height"]
+ W_video = inference_state["orig_width"]
+
+ video_res_mask = F.interpolate(
+ low_res_mask_3d.float(),
+ size=(H_video, W_video),
+ mode="bilinear",
+ align_corners=False,
+ ) # (1, H_video, W_video)
+
+ # Convert to boolean - already in the right shape!
+ return (video_res_mask.squeeze(0) > 0.0).to(torch.bool)
+
+ def clear_detector_added_cond_frame_in_sam2(
+ self, sam2_state, obj_id, refined_frame_idx
+ ):
+ """Clear detector added conditioning frame if it is within a predefined window
+ of the refined frame. This allow model to update masks on these frames."""
+ obj_idx = self.tracker._obj_id_to_idx(sam2_state, obj_id)
+
+ mask_only_cond_frame_indices = []
+ window = self.refinement_detector_cond_frame_removal_window
+ for frame_idx in sam2_state["mask_inputs_per_obj"][obj_idx]:
+ if frame_idx not in sam2_state["point_inputs_per_obj"][obj_idx]:
+ # clear conditioning frames within a window of the refined frame
+ if abs(frame_idx - refined_frame_idx) <= window:
+ mask_only_cond_frame_indices.append(frame_idx)
+
+ # clear
+ if len(mask_only_cond_frame_indices) > 0:
+ for frame_idx in mask_only_cond_frame_indices:
+ # obj_ids_on_this_frame is essentially all obj_ids in the state
+ # since they are bucket batched
+ obj_ids_on_this_frame = sam2_state["obj_id_to_idx"].keys()
+ for obj_id2 in obj_ids_on_this_frame:
+ self.tracker.clear_all_points_in_frame(
+ sam2_state, frame_idx, obj_id2, need_output=False
+ )
+ logger.info(
+ f"Cleared detector mask only conditioning frames ({mask_only_cond_frame_indices}) in SAM2."
+ )
+ return
+
+ def _extract_object_to_singleton_state(self, inference_state, obj_id, obj_rank):
+ """
+ Extract an object from a batched inference state into its own singleton state.
+ """
+ if self.rank != obj_rank:
+ return
+
+ tracker_states_local = inference_state["sam2_inference_states"]
+
+ # Find the inference state containing this object
+ source_state = None
+ source_state_idx = None
+ for idx, state in enumerate(tracker_states_local):
+ if obj_id in state["obj_ids"]:
+ source_state = state
+ source_state_idx = idx
+ break
+
+ assert source_state is not None
+
+ if len(source_state["obj_ids"]) <= 1:
+ # Object not found or already in singleton state
+ return
+
+ # Step 1: Extract all the object's state data before removing it
+ obj_idx_in_source = source_state["obj_id_to_idx"][obj_id]
+ multiplex_state = source_state.get("multiplex_state")
+
+ # Extract consolidated outputs (obj_ptr, maskmem_features, etc.) BEFORE
+ # remove_object modifies the source tensors.
+ singleton_consolidated_outputs = {
+ "cond_frame_outputs": {},
+ "non_cond_frame_outputs": {},
+ }
+ if "output_dict" in source_state:
+ for storage_key in ["cond_frame_outputs", "non_cond_frame_outputs"]:
+ source_outputs = source_state["output_dict"].get(storage_key, {})
+ for f_idx, source_frame_out in source_outputs.items():
+ if source_frame_out["pred_masks"].shape[0] < obj_idx_in_source + 1:
+ continue
+ singleton_frame_out = {
+ "pred_masks": source_frame_out["pred_masks"][
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone(),
+ "object_score_logits": source_frame_out["object_score_logits"][
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone(),
+ "image_features": source_frame_out.get("image_features"),
+ "image_pos_enc": source_frame_out.get("image_pos_enc"),
+ "local_obj_id_to_idx": {obj_id: 0},
+ }
+ # Extract maskmem_features (demux from multiplex space)
+ maskmem_features = source_frame_out.get("maskmem_features")
+ if maskmem_features is not None and multiplex_state is not None:
+ try:
+ demuxed = multiplex_state.demux(maskmem_features)
+ maskmem_features = demuxed[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ except (AssertionError, IndexError):
+ maskmem_features = None
+ elif maskmem_features is not None:
+ maskmem_features = maskmem_features[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ singleton_frame_out["maskmem_features"] = maskmem_features
+ # Extract maskmem_pos_enc (demux level by level)
+ maskmem_pos_enc = source_frame_out.get("maskmem_pos_enc")
+ if maskmem_pos_enc is not None:
+ remapped = []
+ for level_enc in maskmem_pos_enc:
+ if level_enc is None:
+ remapped.append(None)
+ continue
+ if multiplex_state is not None:
+ try:
+ demuxed = multiplex_state.demux(level_enc)
+ remapped.append(
+ demuxed[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ )
+ except (AssertionError, IndexError):
+ remapped.append(None)
+ else:
+ remapped.append(
+ level_enc[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ )
+ maskmem_pos_enc = remapped
+ singleton_frame_out["maskmem_pos_enc"] = maskmem_pos_enc
+ # Extract obj_ptr (demux from multiplex space)
+ if (
+ "obj_ptr" in source_frame_out
+ and self.tracker.use_obj_ptrs_in_encoder
+ ):
+ source_obj_ptr = source_frame_out["obj_ptr"]
+ if multiplex_state is not None:
+ obj_ptr_data = multiplex_state.demux(source_obj_ptr)
+ singleton_frame_out["obj_ptr"] = obj_ptr_data[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ else:
+ singleton_frame_out["obj_ptr"] = source_obj_ptr[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ # Extract conditioning_objects
+ if "conditioning_objects" in source_frame_out:
+ if (
+ obj_idx_in_source
+ in source_frame_out["conditioning_objects"]
+ ):
+ singleton_frame_out["conditioning_objects"] = {0}
+ else:
+ singleton_frame_out["conditioning_objects"] = set()
+ singleton_consolidated_outputs[storage_key][f_idx] = (
+ singleton_frame_out
+ )
+
+ # Extract point and mask inputs for this object
+ extracted_point_inputs = {}
+ extracted_mask_inputs = {}
+
+ if (
+ "point_inputs_per_obj" in source_state
+ and obj_idx_in_source in source_state["point_inputs_per_obj"]
+ ):
+ extracted_point_inputs = source_state["point_inputs_per_obj"][
+ obj_idx_in_source
+ ].copy()
+
+ if (
+ "mask_inputs_per_obj" in source_state
+ and obj_idx_in_source in source_state["mask_inputs_per_obj"]
+ ):
+ extracted_mask_inputs = source_state["mask_inputs_per_obj"][
+ obj_idx_in_source
+ ].copy()
+
+ # Extract per-object outputs - these are already properly sliced for the object
+ extracted_obj_cond_outputs = {}
+ extracted_obj_non_cond_outputs = {}
+ extracted_temp_cond_outputs = {}
+ extracted_temp_non_cond_outputs = {}
+
+ if (
+ "output_dict_per_obj" in source_state
+ and obj_idx_in_source in source_state["output_dict_per_obj"]
+ ):
+ obj_output_dict = source_state["output_dict_per_obj"][obj_idx_in_source]
+ extracted_obj_cond_outputs = obj_output_dict.get(
+ "cond_frame_outputs", {}
+ ).copy()
+ cond_input_keys = (
+ extracted_point_inputs.keys() | extracted_mask_inputs.keys()
+ )
+ # we may have obj cond outputs for other objects in a batch, so limit to cond inputs for only this object
+ extracted_obj_cond_outputs = {
+ k: v
+ for k, v in extracted_obj_cond_outputs.items()
+ if k in cond_input_keys
+ }
+
+ extracted_obj_non_cond_outputs = obj_output_dict.get(
+ "non_cond_frame_outputs", {}
+ ).copy()
+
+ if (
+ "temp_output_dict_per_obj" in source_state
+ and obj_idx_in_source in source_state["temp_output_dict_per_obj"]
+ ):
+ temp_obj_output_dict = source_state["temp_output_dict_per_obj"][
+ obj_idx_in_source
+ ]
+ extracted_temp_cond_outputs = temp_obj_output_dict.get(
+ "cond_frame_outputs", {}
+ ).copy()
+ extracted_temp_non_cond_outputs = temp_obj_output_dict.get(
+ "non_cond_frame_outputs", {}
+ ).copy()
+
+ # Step 2: Remove the object from the source state
+ remaining_obj_ids, _ = self.tracker.remove_object(
+ source_state, obj_id, strict=False, need_output=False
+ )
+
+ # Step 3: Create a new singleton inference state
+ new_sam2_state = self.tracker.init_state(
+ cached_features=inference_state["feature_cache"],
+ video_height=inference_state["orig_height"],
+ video_width=inference_state["orig_width"],
+ num_frames=inference_state["num_frames"],
+ )
+
+ # Step 4: Set up the singleton state structure for the extracted object
+ # Map the object to index 0 in the new singleton state
+ new_sam2_state["obj_id_to_idx"] = {obj_id: 0}
+ new_sam2_state["obj_idx_to_id"] = {0: obj_id}
+ new_sam2_state["obj_ids"] = [obj_id]
+
+ # Step 5: Restore all the extracted state
+ # Restore point and mask inputs
+ new_sam2_state["point_inputs_per_obj"] = {0: extracted_point_inputs}
+ new_sam2_state["mask_inputs_per_obj"] = {0: extracted_mask_inputs}
+
+ # Restore per-object output dictionaries (already properly sliced)
+ new_sam2_state["output_dict_per_obj"] = {
+ 0: {
+ "cond_frame_outputs": extracted_obj_cond_outputs,
+ "non_cond_frame_outputs": extracted_obj_non_cond_outputs,
+ }
+ }
+
+ # Restore temporary outputs
+ new_sam2_state["temp_output_dict_per_obj"] = {
+ 0: {
+ "cond_frame_outputs": extracted_temp_cond_outputs,
+ "non_cond_frame_outputs": extracted_temp_non_cond_outputs,
+ }
+ }
+
+ # Step 6: Rebuild the consolidated output_dict for the singleton state
+ # Use the extracted consolidated outputs which include obj_ptr,
+ # maskmem_features, maskmem_pos_enc (not just pred_masks/object_score_logits)
+
+ # Create singleton multiplex state and remux extracted tensors
+ new_multiplex_state = self.tracker.multiplex_controller.get_state(
+ num_valid_entries=1,
+ device=source_state.get("device", "cuda"),
+ dtype=torch.float32,
+ random=False,
+ object_ids=[obj_id],
+ )
+ new_sam2_state["multiplex_state"] = new_multiplex_state
+
+ for storage_key in ["cond_frame_outputs", "non_cond_frame_outputs"]:
+ for f_idx, frame_out in singleton_consolidated_outputs[storage_key].items():
+ if frame_out.get("maskmem_features") is not None:
+ frame_out["maskmem_features"] = frame_out[
+ "maskmem_features"
+ ].clone()
+ if frame_out.get("maskmem_pos_enc") is not None:
+ frame_out["maskmem_pos_enc"] = [
+ level.clone() if level is not None else None
+ for level in frame_out["maskmem_pos_enc"]
+ ]
+ if "obj_ptr" in frame_out and self.tracker.use_obj_ptrs_in_encoder:
+ frame_out["obj_ptr"] = new_multiplex_state.mux(frame_out["obj_ptr"])
+
+ new_sam2_state["output_dict"] = singleton_consolidated_outputs
+
+ # Step 7: Copy other important state if it exists
+ for key in [
+ "first_ann_frame_idx",
+ "tracking_has_started",
+ ]:
+ if key in source_state:
+ new_sam2_state[key] = source_state[key]
+
+ # Leave consolidated_frame_inds empty so preflight reconstructs from per-obj data
+ new_sam2_state["consolidated_frame_inds"] = {
+ "cond_frame_outputs": set(),
+ "non_cond_frame_outputs": set(),
+ }
+
+ # Step 8: Add the new singleton state to the list
+ tracker_states_local.append(new_sam2_state)
+
+ # Step 9: If the source state is now empty, remove it
+ if len(remaining_obj_ids) == 0:
+ tracker_states_local.pop(source_state_idx)
+ logger.info(
+ f"Removed empty inference state after extracting object {obj_id}"
+ )
+
+ logger.info(f"Object {obj_id} successfully extracted to singleton state")
diff --git a/sam3/model/sam3_multiplex_video_predictor.py b/sam3/model/sam3_multiplex_video_predictor.py
new file mode 100644
index 0000000..b86c7f1
--- /dev/null
+++ b/sam3/model/sam3_multiplex_video_predictor.py
@@ -0,0 +1,63 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
+
+# pyre-unsafe
+
+"""
+Sam3MultiplexVideoPredictor — user-facing entry point for SAM 3.1 multiplex.
+
+Ported from onevision Sam3Model (webdemo/ta/models/sam3_model.py).
+Handles warm-up compilation, bf16 autocast, and session management
+via the shared Sam3BasePredictor handle_request/handle_stream_request API.
+"""
+
+from typing import Dict, Optional
+
+import torch
+from sam3.logger import get_logger
+from sam3.model.sam3_base_predictor import Sam3BasePredictor
+
+logger = get_logger(__name__)
+
+
+class Sam3MultiplexVideoPredictor(Sam3BasePredictor):
+ """
+ User-facing predictor for SAM 3.1 multiplex video tracking.
+
+ Wraps Sam3MultiplexTrackingWithInteractivity with:
+ - bf16 autocast
+ - Warm-up compilation (when compile=True)
+ - Session expiration management
+ - handle_request / handle_stream_request dispatch API (from Sam3BasePredictor)
+ """
+
+ def __init__(
+ self,
+ model,
+ session_expiration_sec=1200,
+ default_output_prob_thresh=0.5,
+ async_loading_frames=True,
+ warm_up=False,
+ ):
+ super().__init__()
+ self.model = model
+ self.session_expiration_sec = session_expiration_sec
+ self.default_output_prob_thresh = default_output_prob_thresh
+ self.async_loading_frames = async_loading_frames
+
+ # turn on tfloat32 for Ampere GPUs
+ torch.backends.cuda.matmul.allow_tf32 = True
+ torch.backends.cudnn.allow_tf32 = True
+ # use bfloat16 inference for Flash Attention kernel
+ self.bf16_context = torch.autocast(device_type="cuda", dtype=torch.bfloat16)
+ self.bf16_context.__enter__()
+
+ if warm_up:
+ self.model._warm_up_complete = False
+ self.model.warm_up_compilation()
+ self.model._warm_up_complete = True
+
+ def _extend_expiration_time(self, session):
+ """Update last-use time and store session expiration timeout."""
+ super()._extend_expiration_time(session)
+ if self.session_expiration_sec:
+ session["expiration_sec"] = self.session_expiration_sec
diff --git a/sam3/model/sam3_tracker_utils.py b/sam3/model/sam3_tracker_utils.py
index e971dac..74e9e12 100644
--- a/sam3/model/sam3_tracker_utils.py
+++ b/sam3/model/sam3_tracker_utils.py
@@ -367,7 +367,17 @@ def get_best_gt_match_from_multimasks(pred_multimasks, gt_masks, pred_scores=Non
return best_pred_mask
-def fill_holes_in_mask_scores(mask, max_area, fill_holes=True, remove_sprinkles=True):
+def fill_holes_in_mask_scores(
+ mask,
+ max_area=None,
+ fill_holes=True,
+ remove_sprinkles=True,
+ fill_hole_area=None,
+ sprinkle_removal_area=None,
+):
+ # Support onevision-style keyword args
+ if fill_hole_area is not None and max_area is None:
+ max_area = fill_hole_area
"""
A post processor to fill small holes in mask scores with area under `max_area`.
Holes are those small connected components in either background or foreground.
diff --git a/sam3/model/sam3_video_base.py b/sam3/model/sam3_video_base.py
index 8780f1a..e3b40ca 100644
--- a/sam3/model/sam3_video_base.py
+++ b/sam3/model/sam3_video_base.py
@@ -8,8 +8,9 @@
import os
from collections import defaultdict
from copy import deepcopy
+from dataclasses import dataclass
from enum import Enum
-from typing import Any, Dict, List, Set
+from typing import Any, Dict, List, Optional, Set, Tuple
import numpy as np
import numpy.typing as npt
@@ -22,7 +23,7 @@
from sam3.model.data_misc import BatchedDatapoint
from sam3.model.sam3_tracker_utils import fill_holes_in_mask_scores, mask_to_box
from sam3.perflib.masks_ops import mask_iou
-from sam3.train.masks_ops import rle_encode
+from sam3.train.masks_ops import mask_iom, rle_encode
from torch import nn, Tensor
logger = get_logger(__name__)
@@ -33,6 +34,228 @@ class MaskletConfirmationStatus(Enum):
CONFIRMED = 2 # confirmed by at least one detection
+@dataclass
+class RealizedAssociateDetTrkresult:
+ new_det_fa_inds: np.array
+ unmatched_trk_obj_ids: np.array
+ det_to_matched_trk_obj_ids: Dict[int, np.array]
+ trk_id_to_max_iou_high_conf_det: Dict[int, int]
+ empty_trk_obj_ids: np.array
+ new_det_obj_ids: Optional[np.array] = None
+ new_det_gpu_ids: Optional[np.array] = None
+ num_obj_dropped_due_to_limit: Optional[int] = None
+
+ def get_new_det_gpu_ids(
+ self, tracker_metadata_prev, is_image_only, det_scores, tracking_obj
+ ):
+ with torch.profiler.record_function("get_new_det_gpu_ids"):
+ if self.new_det_obj_ids is None:
+ det_scores_np: np.ndarray = det_scores.cpu().numpy()
+ prev_obj_num = np.sum(tracker_metadata_prev["num_obj_per_gpu"])
+ new_det_num = len(self.new_det_fa_inds)
+ num_obj_dropped_due_to_limit = 0
+ if (
+ not is_image_only
+ and prev_obj_num + new_det_num > tracking_obj.max_num_objects
+ ):
+ logger.warning(
+ f"hitting {tracking_obj.max_num_objects=} with {new_det_num=} and {prev_obj_num=}"
+ )
+ new_det_num_to_keep = tracking_obj.max_num_objects - prev_obj_num
+ num_obj_dropped_due_to_limit = new_det_num - new_det_num_to_keep
+ self.new_det_fa_inds = tracking_obj._drop_new_det_with_obj_limit(
+ self.new_det_fa_inds, det_scores_np, new_det_num_to_keep
+ )
+ assert len(self.new_det_fa_inds) == new_det_num_to_keep
+ new_det_num = len(self.new_det_fa_inds)
+ new_det_start_obj_id = tracker_metadata_prev["max_obj_id"] + 1
+ new_det_obj_ids = new_det_start_obj_id + np.arange(new_det_num)
+ if tracking_obj.is_multiplex:
+ prev_workload_per_gpu = tracker_metadata_prev["num_buc_per_gpu"]
+ else:
+ prev_workload_per_gpu = tracker_metadata_prev["num_obj_per_gpu"]
+ new_det_gpu_ids = tracking_obj._assign_new_det_to_gpus(
+ new_det_num=new_det_num,
+ prev_workload_per_gpu=prev_workload_per_gpu,
+ )
+ self.new_det_obj_ids = new_det_obj_ids
+ self.new_det_gpu_ids = new_det_gpu_ids
+ self.num_obj_dropped_due_to_limit = num_obj_dropped_due_to_limit
+ return (
+ self.new_det_obj_ids,
+ self.new_det_gpu_ids,
+ self.num_obj_dropped_due_to_limit,
+ )
+
+
+def realize_adt_result(adt_lazy_result, tracker_metadata_prev, det_mask_preds):
+ if isinstance(adt_lazy_result, LazyAssociateDetTrkResult):
+ adt_lazy_result._convert_to_numpy()
+ return adt_lazy_result._create_cpu_metadata(
+ tracker_metadata_prev["obj_ids_all_gpu"], det_mask_preds
+ )
+ return adt_lazy_result
+
+
+class LazyAssociateDetTrkResult:
+ def __init__(
+ self,
+ trk_is_unmatched: Tensor,
+ trk_is_nonempty: Tensor,
+ is_new_det: Tensor,
+ det_to_max_iou_trk_idx: Tensor,
+ det_is_high_conf: Tensor,
+ det_is_high_iou: Tensor,
+ det_keep: Tensor,
+ im_mask: Tensor,
+ ):
+ self.trk_is_unmatched = trk_is_unmatched
+ self.trk_is_nonempty = trk_is_nonempty
+ self.is_new_det = is_new_det
+ self.det_to_max_iou_trk_idx = det_to_max_iou_trk_idx
+ self.det_is_high_conf = det_is_high_conf
+ self.det_is_high_iou = det_is_high_iou
+ self.det_keep = det_keep
+ self.im_mask = im_mask
+
+ def _convert_to_numpy(self):
+ with torch.profiler.record_function("Convert to numpy"):
+ self.trk_is_unmatched = self.trk_is_unmatched.cpu().numpy()
+ self.trk_is_nonempty = self.trk_is_nonempty.cpu().numpy()
+ self.is_new_det = self.is_new_det.cpu().numpy()
+ self.det_to_max_iou_trk_idx = self.det_to_max_iou_trk_idx.cpu().numpy()
+ self.det_is_high_conf = self.det_is_high_conf.cpu().numpy()
+ self.det_is_high_iou = self.det_is_high_iou.cpu().numpy()
+ self.det_keep = self.det_keep.cpu().numpy().tolist()
+ self.im_mask = self.im_mask.cpu().numpy()
+
+ def _create_cpu_metadata(self, trk_obj_ids, det_masks):
+ with torch.profiler.record_function("_create_cpu_metadata"):
+ unmatched_trk_obj_ids = trk_obj_ids[self.trk_is_unmatched]
+ empty_trk_obj_ids = trk_obj_ids[~self.trk_is_nonempty]
+ new_det_fa_inds = np.nonzero(self.is_new_det)[0]
+ det_is_high_conf_and_iou = set(
+ np.nonzero(self.det_is_high_conf & self.det_is_high_iou)[0]
+ )
+ det_to_matched_trk_obj_ids = {}
+ trk_id_to_max_iou_high_conf_det = {}
+ for d in range(det_masks.size(0)):
+ if self.det_keep[d]:
+ det_to_matched_trk_obj_ids[d] = trk_obj_ids[self.im_mask[d, :]]
+ if d in det_is_high_conf_and_iou:
+ trk_obj_id = trk_obj_ids[self.det_to_max_iou_trk_idx[d]].item()
+ trk_id_to_max_iou_high_conf_det[trk_obj_id] = d
+ return RealizedAssociateDetTrkresult(
+ new_det_fa_inds=new_det_fa_inds,
+ unmatched_trk_obj_ids=unmatched_trk_obj_ids,
+ det_to_matched_trk_obj_ids=det_to_matched_trk_obj_ids,
+ trk_id_to_max_iou_high_conf_det=trk_id_to_max_iou_high_conf_det,
+ empty_trk_obj_ids=empty_trk_obj_ids,
+ )
+
+
+def _associate_det_trk_compilable(
+ det_masks,
+ det_scores,
+ det_keep,
+ trk_masks,
+ new_det_thresh,
+ iou_threshold_trk,
+ iou_threshold,
+ HIGH_CONF_THRESH,
+ use_iom_recondition,
+ o2o_matching_masklets_enable,
+ iom_thresh_recondition,
+ iou_thresh_recondition,
+):
+ det_masks_binary = det_masks > 0
+ det_masks_binary[~det_keep] = 0
+ trk_masks_binary = trk_masks > 0
+ intersection_metric = None
+ if use_iom_recondition:
+ intersection_metric = mask_iom(det_masks_binary, trk_masks_binary) # (N, M)
+ else:
+ intersection_metric = mask_iou(det_masks_binary, trk_masks_binary) # (N, M)
+
+ assert not o2o_matching_masklets_enable, (
+ "Temporarily disabled support for o2o_matching_masklets_enable, due to optimizations."
+ )
+
+ if o2o_matching_masklets_enable:
+ intersection_metric_np = intersection_metric.cpu().numpy()
+ from scipy.optimize import linear_sum_assignment
+
+ cost_matrix = 1 - intersection_metric_np
+ row_ind, col_ind = linear_sum_assignment(cost_matrix)
+ trk_is_matched = np.zeros(trk_masks.size(0), dtype=bool)
+ for d, t in zip(row_ind, col_ind):
+ if intersection_metric_np[d, t] >= iou_threshold_trk:
+ trk_is_matched[t] = True
+ trk_is_matched = torch.from_numpy(trk_is_matched)
+ trk_is_matched = trk_is_matched.to(device=intersection_metric.device)
+ else:
+ trk_is_matched = (intersection_metric >= iou_threshold_trk).any(dim=0)
+ # Non-empty tracks not matched by Hungarian assignment above threshold are unmatched
+ trk_is_nonempty = trk_masks_binary.any(dim=(1, 2))
+ trk_is_unmatched = torch.logical_and(trk_is_nonempty, ~trk_is_matched)
+
+ # For detections: allow many tracks to match to the same detection (many-to-one)
+ # So, a detection is 'new' if it does not match any track above threshold
+ is_new_det = torch.logical_and(
+ torch.logical_and((det_scores >= new_det_thresh), (det_keep)),
+ torch.logical_not(torch.any(intersection_metric >= iou_threshold, dim=1)),
+ )
+
+ intersection_thresh_recond = (
+ iom_thresh_recondition if use_iom_recondition else iou_thresh_recondition
+ )
+ # if a detection matches to many tracks with high IoU or vice versa, we do not consider it for reconditioning as it might be ambiguous
+ det_match_to_many_trk = (intersection_metric >= intersection_thresh_recond).sum(
+ dim=1
+ ) > 1
+ trk_match_to_many_det = (intersection_metric >= intersection_thresh_recond).sum(
+ dim=0
+ ) > 1
+ # # zero out these ambiguous matches
+ # intersection_metric[:, trk_match_to_many_det] = (
+ # 0.0 # only consider unique matches
+ # )
+
+ # intersection_metric[det_match_to_many_trk, :] = (
+ # 0.0 # only consider unique matches
+ # )
+
+ intersection_metric = torch.where(
+ trk_match_to_many_det.unsqueeze(0),
+ torch.zeros_like(intersection_metric),
+ intersection_metric,
+ )
+
+ intersection_metric = torch.where(
+ det_match_to_many_trk.unsqueeze(1),
+ torch.zeros_like(intersection_metric),
+ intersection_metric,
+ )
+
+ det_to_max_iou_trk_idx = torch.argmax(intersection_metric, dim=1)
+ det_is_high_conf = ((det_scores >= HIGH_CONF_THRESH) & det_keep) & ~is_new_det
+ det_is_high_iou = (
+ torch.amax(intersection_metric, dim=1) >= intersection_thresh_recond
+ )
+ im_mask = intersection_metric >= iou_threshold
+
+ return (
+ trk_is_unmatched,
+ trk_is_nonempty,
+ is_new_det,
+ det_to_max_iou_trk_idx,
+ det_is_high_conf,
+ det_is_high_iou,
+ det_keep,
+ im_mask,
+ )
+
+
class Sam3VideoBase(nn.Module):
def __init__(
self,
@@ -516,17 +739,7 @@ def run_tracker_update_planning_phase(
is_image_only: bool = False,
):
# initialize new metadata from previous metadata (its values will be updated later)
- tracker_metadata_new = {
- "obj_ids_per_gpu": deepcopy(tracker_metadata_prev["obj_ids_per_gpu"]),
- "obj_ids_all_gpu": None, # will be filled later
- "num_obj_per_gpu": deepcopy(tracker_metadata_prev["num_obj_per_gpu"]),
- "obj_id_to_score": deepcopy(tracker_metadata_prev["obj_id_to_score"]),
- "obj_id_to_tracker_score_frame_wise": deepcopy(
- tracker_metadata_prev["obj_id_to_tracker_score_frame_wise"]
- ),
- "obj_id_to_last_occluded": {}, # will be filled later
- "max_obj_id": deepcopy(tracker_metadata_prev["max_obj_id"]),
- }
+ tracker_metadata_new = self._create_planning_metadata(tracker_metadata_prev)
# Initialize reconditioned_obj_ids early to avoid UnboundLocalError
reconditioned_obj_ids = set()
@@ -901,6 +1114,7 @@ def run_tracker_update_execution_phase(
orig_vid_height: int,
orig_vid_width: int,
feature_cache: Dict,
+ tracker_metadata_new=None,
):
# initialize tracking scores with detection scores
new_det_fa_inds: npt.NDArray = tracker_update_plan["new_det_fa_inds"]
@@ -931,8 +1145,31 @@ def run_tracker_update_execution_phase(
if len(obj_ids_newly_removed) > 0:
self._tracker_remove_objects(tracker_states_local, obj_ids_newly_removed)
+ self._post_execution_phase_hook(tracker_states_local, tracker_metadata_new)
return tracker_states_local
+ def _create_planning_metadata(self, tracker_metadata_prev):
+ """Create the metadata dict for the planning phase from previous metadata."""
+ from copy import deepcopy
+
+ score_key = "obj_id_to_tracker_score_frame_wise"
+ if score_key not in tracker_metadata_prev:
+ score_key = "obj_id_to_sam2_score_frame_wise"
+ metadata = {
+ "obj_ids_per_gpu": deepcopy(tracker_metadata_prev["obj_ids_per_gpu"]),
+ "obj_ids_all_gpu": None,
+ "num_obj_per_gpu": deepcopy(tracker_metadata_prev["num_obj_per_gpu"]),
+ "obj_id_to_score": deepcopy(tracker_metadata_prev["obj_id_to_score"]),
+ score_key: deepcopy(tracker_metadata_prev[score_key]),
+ "obj_id_to_last_occluded": {},
+ "max_obj_id": deepcopy(tracker_metadata_prev["max_obj_id"]),
+ }
+ return metadata
+
+ def _post_execution_phase_hook(self, tracker_states_local, tracker_metadata_new):
+ """Hook for subclasses to add post-execution logic. Default: no-op."""
+ pass
+
def build_outputs(
self,
frame_idx: int,
@@ -1180,6 +1417,7 @@ def _associate_det_trk(
to any detections on this frame (for unmatched, we only count masklets with >0 area)
- det_to_matched_trk_obj_ids: dict[int, npt.NDArray]: mapping from detector's detection indices
to the list of matched tracklet object IDs
+ - trk_id_to_max_iou_high_conf_det: dict mapping track obj_id to the highest-IoU high-conf detection idx
- empty_trk_obj_ids: array of existing masklet object IDs with zero area in SAM2 prediction
"""
iou_threshold = self.assoc_iou_thresh
@@ -1239,61 +1477,40 @@ def _associate_det_trk(
align_corners=False,
).squeeze(1)
- det_masks_binary = det_masks > 0
- trk_masks_binary = trk_masks > 0
- ious = mask_iou(det_masks_binary, trk_masks_binary) # (N, M)
-
- ious_np = ious.cpu().numpy()
- if self.o2o_matching_masklets_enable:
- from scipy.optimize import linear_sum_assignment
-
- # Hungarian matching for tracks (one-to-one: each track matches at most one detection)
- cost_matrix = 1 - ious_np # Hungarian solves for minimum cost
- row_ind, col_ind = linear_sum_assignment(cost_matrix)
- trk_is_matched = np.zeros(trk_masks.size(0), dtype=bool)
- for d, t in zip(row_ind, col_ind):
- if ious_np[d, t] >= iou_threshold_trk:
- trk_is_matched[t] = True
- else:
- trk_is_matched = (ious_np >= iou_threshold_trk).any(axis=0)
- # Non-empty tracks not matched by Hungarian assignment above threshold are unmatched
- trk_is_nonempty = trk_masks_binary.any(dim=(1, 2)).cpu().numpy()
- trk_is_unmatched = np.logical_and(trk_is_nonempty, ~trk_is_matched)
- unmatched_trk_obj_ids = trk_obj_ids[trk_is_unmatched]
- # also record masklets that have zero area in SAM 2 prediction
- empty_trk_obj_ids = trk_obj_ids[~trk_is_nonempty]
-
- # For detections: allow many tracks to match to the same detection (many-to-one)
- # So, a detection is 'new' if it does not match any track above threshold
- is_new_det = np.logical_and(
- det_scores_np >= new_det_thresh,
- np.logical_not(np.any(ious_np >= iou_threshold, axis=1)),
+ # Convert numpy scores to tensor for the compilable function
+ det_scores = torch.from_numpy(det_scores_np).to(det_masks.device)
+ det_keep = torch.ones(
+ det_masks.size(0), dtype=torch.bool, device=det_masks.device
)
- new_det_fa_inds = np.nonzero(is_new_det)[0]
-
- # for each detection, which tracks it matched to (above threshold)
- det_to_matched_trk_obj_ids = {}
- trk_id_to_max_iou_high_conf_det = {} # trk id --> exactly one detection idx
- HIGH_CONF_THRESH = 0.8
- HIGH_IOU_THRESH = 0.8
- det_to_max_iou_trk_idx = np.argmax(ious_np, axis=1)
- det_is_high_conf = (det_scores_np >= HIGH_CONF_THRESH) & ~is_new_det
- det_is_high_iou = np.max(ious_np, axis=1) >= HIGH_IOU_THRESH
- det_is_high_conf_and_iou = set(
- np.nonzero(det_is_high_conf & det_is_high_iou)[0]
+
+ # Call the GPU-native compilable function
+ adt_result_tensors = _associate_det_trk_compilable(
+ det_masks=det_masks,
+ det_scores=det_scores,
+ det_keep=det_keep,
+ trk_masks=trk_masks,
+ new_det_thresh=new_det_thresh,
+ iou_threshold_trk=iou_threshold_trk,
+ iou_threshold=iou_threshold,
+ HIGH_CONF_THRESH=0.8,
+ use_iom_recondition=getattr(self, "use_iom_recondition", False),
+ o2o_matching_masklets_enable=self.o2o_matching_masklets_enable,
+ iom_thresh_recondition=getattr(self, "iom_thresh_recondition", 0.8),
+ iou_thresh_recondition=getattr(self, "iou_thresh_recondition", 0.8),
)
- for d in range(det_masks.size(0)):
- det_to_matched_trk_obj_ids[d] = trk_obj_ids[ious_np[d, :] >= iou_threshold]
- if d in det_is_high_conf_and_iou:
- trk_obj_id = trk_obj_ids[det_to_max_iou_trk_idx[d]].item()
- trk_id_to_max_iou_high_conf_det[trk_obj_id] = d
+
+ # Wrap in LazyAssociateDetTrkResult and immediately realize to numpy
+ # for backward compatibility with existing callers
+ lazy_result = LazyAssociateDetTrkResult(*adt_result_tensors)
+ lazy_result._convert_to_numpy()
+ realized = lazy_result._create_cpu_metadata(trk_obj_ids, det_masks)
return (
- new_det_fa_inds,
- unmatched_trk_obj_ids,
- det_to_matched_trk_obj_ids,
- trk_id_to_max_iou_high_conf_det,
- empty_trk_obj_ids,
+ realized.new_det_fa_inds,
+ realized.unmatched_trk_obj_ids,
+ realized.det_to_matched_trk_obj_ids,
+ realized.trk_id_to_max_iou_high_conf_det,
+ realized.empty_trk_obj_ids,
)
def _assign_new_det_to_gpus(self, new_det_num, prev_workload_per_gpu):
@@ -1601,21 +1818,35 @@ def _tracker_remove_objects(
def _initialize_metadata(self):
"""Initialize metadata for the masklets."""
+ is_multiplex = getattr(self, "is_multiplex", False)
+ score_key = (
+ "obj_id_to_sam2_score_frame_wise"
+ if is_multiplex
+ else "obj_id_to_tracker_score_frame_wise"
+ )
tracker_metadata = {
"obj_ids_per_gpu": [np.array([], np.int64) for _ in range(self.world_size)],
"obj_ids_all_gpu": np.array([], np.int64),
"num_obj_per_gpu": np.zeros(self.world_size, np.int64),
"max_obj_id": -1,
"obj_id_to_score": {},
- "obj_id_to_tracker_score_frame_wise": defaultdict(dict),
+ score_key: defaultdict(dict),
"obj_id_to_last_occluded": {},
}
- if self.rank == 0:
- # "rank0_metadata" contains metadata that is only stored on (and accessible to) GPU 0
- # - obj_first_frame_idx: obj_id --> first frame index where the object was detected
- # - unmatched_frame_inds: obj_id --> [mismatched frame indices]
- # - overlap_pair_to_frame_inds: (first_appear_obj_id, obj_id) --> [overlap frame indices]
- # - removed_obj_ids: object IDs that are suppressed via hot-start
+ if is_multiplex:
+ tracker_metadata["gpu_metadata"] = {
+ "N_obj": 0
+ } # GPU-side metadata for sync-free hotstart
+ tracker_metadata["num_buc_per_gpu"] = np.zeros(self.world_size, np.int64)
+
+ # "rank0_metadata" contains metadata that is only stored on (and accessible to) GPU 0
+ # - obj_first_frame_idx: obj_id --> first frame index where the object was detected
+ # - unmatched_frame_inds: obj_id --> [mismatched frame indices]
+ # - overlap_pair_to_frame_inds: (first_appear_obj_id, obj_id) --> [overlap frame indices]
+ # - removed_obj_ids: object IDs that are suppressed via hot-start
+ # In multiplex mode, rank0_metadata is always included (all GPUs need it).
+ # In non-multiplex mode, only rank 0 stores it.
+ if is_multiplex or self.rank == 0:
rank0_metadata = {
"obj_first_frame_idx": {},
"unmatched_frame_inds": defaultdict(list),
diff --git a/sam3/model/sam3_video_inference.py b/sam3/model/sam3_video_inference.py
index 6f031be..375d9ec 100644
--- a/sam3/model/sam3_video_inference.py
+++ b/sam3/model/sam3_video_inference.py
@@ -56,6 +56,7 @@ def init_state(
self,
resource_path,
offload_video_to_cpu=False,
+ offload_state_to_cpu=False,
async_loading_frames=False,
video_loader_type="cv2",
):
@@ -71,6 +72,7 @@ def init_state(
)
inference_state = {}
inference_state["image_size"] = self.image_size
+ inference_state["offload_state_to_cpu"] = offload_state_to_cpu
inference_state["num_frames"] = len(images)
# the original video height and width, used for resizing final output scores
inference_state["orig_height"] = orig_height
@@ -551,17 +553,14 @@ def _cache_frame_outputs(
def _build_tracker_output(
self, inference_state, frame_idx, refined_obj_id_to_mask=None
):
- assert (
+ if (
"cached_frame_outputs" in inference_state
and frame_idx in inference_state["cached_frame_outputs"]
- ), (
- "No cached outputs found. Ensure normal propagation has run first to populate the cache."
- )
- cached_outputs = inference_state["cached_frame_outputs"][frame_idx]
-
- obj_id_to_mask = cached_outputs.copy()
+ ):
+ obj_id_to_mask = inference_state["cached_frame_outputs"][frame_idx].copy()
+ else:
+ obj_id_to_mask = {}
- # Update with refined masks if provided
if refined_obj_id_to_mask is not None:
for obj_id, refined_mask in refined_obj_id_to_mask.items():
assert refined_mask is not None, (
@@ -627,7 +626,7 @@ def _compile_model(self):
## Compile Tracker model components
self.tracker.maskmem_backbone.forward = compile_wrapper(
self.tracker.maskmem_backbone.forward,
- mode="max-autotune",
+ mode="max-autotune-no-cudagraphs",
fullgraph=True,
dynamic=False,
)
@@ -990,6 +989,7 @@ def _init_new_tracker_state(self, inference_state):
video_height=inference_state["orig_height"],
video_width=inference_state["orig_width"],
num_frames=inference_state["num_frames"],
+ offload_state_to_cpu=inference_state.get("offload_state_to_cpu", False),
)
@torch.inference_mode()
diff --git a/sam3/model/sam3_video_predictor.py b/sam3/model/sam3_video_predictor.py
index 13b1448..a7660af 100644
--- a/sam3/model/sam3_video_predictor.py
+++ b/sam3/model/sam3_video_predictor.py
@@ -17,14 +17,12 @@
import psutil
import torch
from sam3.logger import get_logger
+from sam3.model.sam3_base_predictor import Sam3BasePredictor
logger = get_logger(__name__)
-class Sam3VideoPredictor:
- # a global dictionary that holds all inference states for this model (key is session_id)
- _ALL_INFERENCE_STATES = {}
-
+class Sam3VideoPredictor(Sam3BasePredictor):
def __init__(
self,
checkpoint_path=None,
@@ -35,7 +33,9 @@ def __init__(
async_loading_frames=False,
video_loader_type="cv2",
apply_temporal_disambiguation: bool = True,
+ compile: bool = False,
):
+ super().__init__()
self.async_loading_frames = async_loading_frames
self.video_loader_type = video_loader_type
from sam3.model_builder import build_sam3_video_model
@@ -48,129 +48,20 @@ def __init__(
geo_encoder_use_img_cross_attn=geo_encoder_use_img_cross_attn,
strict_state_dict_loading=strict_state_dict_loading,
apply_temporal_disambiguation=apply_temporal_disambiguation,
+ compile=compile,
)
.cuda()
.eval()
)
- @torch.inference_mode()
- def handle_request(self, request):
- """Dispatch a request based on its type."""
- request_type = request["type"]
- if request_type == "start_session":
- return self.start_session(
- resource_path=request["resource_path"],
- session_id=request.get("session_id", None),
- )
- elif request_type == "add_prompt":
- return self.add_prompt(
- session_id=request["session_id"],
- frame_idx=request["frame_index"],
- text=request.get("text", None),
- points=request.get("points", None),
- point_labels=request.get("point_labels", None),
- bounding_boxes=request.get("bounding_boxes", None),
- bounding_box_labels=request.get("bounding_box_labels", None),
- obj_id=request.get("obj_id", None),
- )
- elif request_type == "remove_object":
- return self.remove_object(
- session_id=request["session_id"],
- obj_id=request["obj_id"],
- is_user_action=request.get("is_user_action", True),
- )
- elif request_type == "reset_session":
- return self.reset_session(session_id=request["session_id"])
- elif request_type == "close_session":
- return self.close_session(session_id=request["session_id"])
- else:
- raise RuntimeError(f"invalid request type: {request_type}")
-
- @torch.inference_mode()
- def handle_stream_request(self, request):
- """Dispatch a stream request based on its type."""
- request_type = request["type"]
- if request_type == "propagate_in_video":
- yield from self.propagate_in_video(
- session_id=request["session_id"],
- propagation_direction=request.get("propagation_direction", "both"),
- start_frame_idx=request.get("start_frame_index", None),
- max_frame_num_to_track=request.get("max_frame_num_to_track", None),
- )
- else:
- raise RuntimeError(f"invalid request type: {request_type}")
-
- def start_session(self, resource_path, session_id=None):
- """
- Start a new inference session on an image or a video. Here `resource_path`
- can be either a path to an image file (for image inference) or an MP4 file
- or directory with JPEG video frames (for video inference).
-
- If `session_id` is defined, it will be used as identifier for the
- session. If it is not defined, the start_session function will create
- a session id and return it.
- """
- # get an initial inference_state from the model
- inference_state = self.model.init_state(
- resource_path=resource_path,
- async_loading_frames=self.async_loading_frames,
- video_loader_type=self.video_loader_type,
- )
- if not session_id:
- session_id = str(uuid.uuid4())
- self._ALL_INFERENCE_STATES[session_id] = {
- "state": inference_state,
- "session_id": session_id,
- "start_time": time.time(),
- }
- logger.debug(
- f"started new session {session_id}; {self._get_session_stats()}; "
- f"{self._get_torch_and_gpu_properties()}"
- )
- return {"session_id": session_id}
-
- def add_prompt(
- self,
- session_id: str,
- frame_idx: int,
- text: Optional[str] = None,
- points: Optional[List[List[float]]] = None,
- point_labels: Optional[List[int]] = None,
- bounding_boxes: Optional[List[List[float]]] = None,
- bounding_box_labels: Optional[List[int]] = None,
- obj_id: Optional[int] = None,
- ):
- """Add text, box and/or point prompt on a specific video frame."""
- logger.debug(
- f"add prompt on frame {frame_idx} in session {session_id}: "
- f"{text=}, {points=}, {point_labels=}, "
- f"{bounding_boxes=}, {bounding_box_labels=}"
- )
- session = self._get_session(session_id)
- inference_state = session["state"]
-
- frame_idx, outputs = self.model.add_prompt(
- inference_state=inference_state,
- frame_idx=frame_idx,
- text_str=text,
- points=points,
- point_labels=point_labels,
- boxes_xywh=bounding_boxes,
- box_labels=bounding_box_labels,
- obj_id=obj_id,
- )
- return {"frame_index": frame_idx, "outputs": outputs}
-
def remove_object(
self,
session_id: str,
- obj_id: int,
+ frame_idx: int = 0,
+ obj_id: int = 0,
is_user_action: bool = True,
):
- """Remove an object from tracking."""
- logger.debug(
- f"remove object {obj_id} in session {session_id}: {is_user_action=}"
- )
+ """Remove an object from tracking (SAM3 uses a simpler remove_object API)."""
session = self._get_session(session_id)
inference_state = session["state"]
@@ -181,111 +72,29 @@ def remove_object(
)
return {"is_success": True}
- def propagate_in_video(
- self,
- session_id,
- propagation_direction,
- start_frame_idx,
- max_frame_num_to_track,
- ):
- """Propagate the added prompts to get grounding results on all video frames."""
- logger.debug(
- f"propagate in video in session {session_id}: "
- f"{propagation_direction=}, {start_frame_idx=}, {max_frame_num_to_track=}"
- )
- try:
- session = self._get_session(session_id)
- inference_state = session["state"]
- if propagation_direction not in ["both", "forward", "backward"]:
- raise ValueError(
- f"invalid propagation direction: {propagation_direction}"
- )
-
- # First doing the forward propagation
- if propagation_direction in ["both", "forward"]:
- for frame_idx, outputs in self.model.propagate_in_video(
- inference_state=inference_state,
- start_frame_idx=start_frame_idx,
- max_frame_num_to_track=max_frame_num_to_track,
- reverse=False,
- ):
- yield {"frame_index": frame_idx, "outputs": outputs}
- # Then doing the backward propagation (reverse in time)
- if propagation_direction in ["both", "backward"]:
- for frame_idx, outputs in self.model.propagate_in_video(
- inference_state=inference_state,
- start_frame_idx=start_frame_idx,
- max_frame_num_to_track=max_frame_num_to_track,
- reverse=True,
- ):
- yield {"frame_index": frame_idx, "outputs": outputs}
- finally:
- # Log upon completion (so that e.g. we can see if two propagations happen in parallel).
- # Using `finally` here to log even when the tracking is aborted with GeneratorExit.
- logger.debug(
- f"propagation ended in session {session_id}; {self._get_session_stats()}"
- )
-
- def reset_session(self, session_id):
- """Reset the session to its initial state (as when it's initial opened)."""
- logger.debug(f"reset session {session_id}")
- session = self._get_session(session_id)
- inference_state = session["state"]
- self.model.reset_state(inference_state)
- return {"is_success": True}
-
- def close_session(self, session_id):
- """
- Close a session. This method is idempotent and can be called multiple
- times on the same "session_id".
- """
- session = self._ALL_INFERENCE_STATES.pop(session_id, None)
- if session is None:
- logger.warning(
- f"cannot close session {session_id} as it does not exist (it might have expired); "
- f"{self._get_session_stats()}"
- )
- else:
- del session
- gc.collect()
- logger.info(f"removed session {session_id}; {self._get_session_stats()}")
- return {"is_success": True}
-
- def _get_session(self, session_id):
- session = self._ALL_INFERENCE_STATES.get(session_id, None)
- if session is None:
- raise RuntimeError(
- f"Cannot find session {session_id}; it might have expired"
- )
- return session
-
def _get_session_stats(self):
"""Get a statistics string for live sessions and their GPU usage."""
- # print both the session ids and their video frame numbers
- live_session_strs = [
- f"'{session_id}' ({session['state']['num_frames']} frames)"
- for session_id, session in self._ALL_INFERENCE_STATES.items()
- ]
- session_stats_str = (
- f"live sessions: [{', '.join(live_session_strs)}], GPU memory: "
- f"{torch.cuda.memory_allocated() // 1024**2} MiB used and "
- f"{torch.cuda.memory_reserved() // 1024**2} MiB reserved"
- f" (max over time: {torch.cuda.max_memory_allocated() // 1024**2} MiB used "
- f"and {torch.cuda.max_memory_reserved() // 1024**2} MiB reserved)"
+ live_session_strs = []
+ for sid, s in self._all_inference_states.items():
+ nf = s["state"]["num_frames"]
+ live_session_strs.append(f"'{sid}' ({nf} frames)")
+ joined = ", ".join(live_session_strs)
+ mem_alloc = torch.cuda.memory_allocated() // 1024**2
+ mem_res = torch.cuda.memory_reserved() // 1024**2
+ max_alloc = torch.cuda.max_memory_allocated() // 1024**2
+ max_res = torch.cuda.max_memory_reserved() // 1024**2
+ return (
+ f"live sessions: [{joined}], GPU memory: "
+ f"{mem_alloc} MiB used and {mem_res} MiB reserved"
+ f" (max over time: {max_alloc} MiB used and {max_res} MiB reserved)"
)
- return session_stats_str
def _get_torch_and_gpu_properties(self):
- """Get a string for PyTorch and GPU properties (for logging and debugging)."""
- torch_and_gpu_str = (
+ """Get a string for PyTorch and GPU properties."""
+ return (
f"torch: {torch.__version__} with CUDA arch {torch.cuda.get_arch_list()}, "
f"GPU device: {torch.cuda.get_device_properties(torch.cuda.current_device())}"
)
- return torch_and_gpu_str
-
- def shutdown(self):
- """Shutdown the predictor and clear all sessions."""
- self._ALL_INFERENCE_STATES.clear()
class Sam3VideoPredictorMultiGPU(Sam3VideoPredictor):
diff --git a/sam3/model/video_tracking_multiplex.py b/sam3/model/video_tracking_multiplex.py
new file mode 100644
index 0000000..8706d79
--- /dev/null
+++ b/sam3/model/video_tracking_multiplex.py
@@ -0,0 +1,3655 @@
+from collections import defaultdict
+
+"""
+Video tracking model with multiplexing support.
+
+This file extends the base video tracking with prompt functionality to add:
+ - Multiplexing: Support for processing multiple objects simultaneously
+ - Recording image features in memory to support the decoupled transformer for memory reading
+"""
+
+import logging
+from copy import deepcopy
+
+try:
+ from typing import Iterable, Literal, NotRequired, Optional, Required, TypedDict
+except ImportError:
+ from typing_extensions import (
+ Iterable,
+ Literal,
+ NotRequired, # not available in Python 3.10
+ Optional,
+ Required, # not available in Python 3.10
+ TypedDict,
+ )
+
+import numpy as np
+import torch
+import torch.distributed
+import torch.nn as nn
+import torch.nn.functional as F
+from sam3.model.data_misc import BatchedDatapoint, NestedTensor
+from sam3.model.memory import SimpleMaskEncoder
+from sam3.model.multiplex_mask_decoder import MLP, MultiplexMaskDecoder
+from sam3.model.multiplex_utils import MultiplexController, MultiplexState
+from sam3.model.sam3_tracker_utils import (
+ get_1d_sine_pe,
+ get_next_point,
+ sample_box_points,
+ select_closest_cond_frames,
+)
+from sam3.sam.mask_decoder import MaskDecoder
+from sam3.sam.prompt_encoder import PositionEmbeddingRandom, PromptEncoder
+from sam3.sam.transformer import TwoWayTransformer
+from timm.models.layers import trunc_normal_
+
+
+# a large negative value as a placeholder score for missing objects
+NO_OBJ_SCORE = -1024.0
+
+neck_outs = ["interactive", "sam2_backbone_out"]
+
+
+class SAMOutput(TypedDict, total=True):
+ # Outputs from a single SAM head forward
+ low_res_multimasks: torch.Tensor
+ high_res_multimasks: torch.Tensor
+ ious: torch.Tensor
+ low_res_masks: torch.Tensor
+ high_res_masks: torch.Tensor
+ object_score_logits: torch.Tensor
+ obj_ptr: NotRequired[torch.Tensor] # [num_objects, C], in data space
+
+
+class StageOutput(TypedDict, total=False):
+ # metadata
+ conditioning_objects: Required[set[int]]
+
+ # The outputs from a single stage; could be used as memory
+ pred_masks: torch.Tensor
+ pred_masks_high_res: torch.Tensor
+ point_inputs: dict[str, torch.Tensor]
+ mask_inputs: torch.Tensor
+ object_score_logits: torch.Tensor
+ obj_ptr: torch.Tensor # [num_buckets, multiplex_count, C], in mux space
+ maskmem_features: torch.Tensor
+ maskmem_pos_enc: list[torch.Tensor]
+ image_features: torch.Tensor
+ image_pos_enc: torch.Tensor
+
+ # for memory filtering
+ iou_score: torch.Tensor
+ eff_iou_score: torch.Tensor
+
+ # Multi-step prediction fields for state tracking or training
+ multistep_pred_masks: torch.Tensor
+ multistep_pred_masks_high_res: torch.Tensor
+ multistep_pred_multimasks: list[torch.Tensor]
+ multistep_pred_multimasks_high_res: list[torch.Tensor]
+ multistep_pred_ious: list[torch.Tensor]
+ multistep_point_inputs: list[dict]
+ multistep_object_score_logits: list[torch.Tensor]
+
+
+class VideoTrackingMultiplex(nn.Module):
+ def __init__(
+ self,
+ backbone: nn.Module,
+ transformer: nn.Module,
+ maskmem_backbone: nn.Module,
+ multiplex_controller: MultiplexController,
+ num_maskmem: int = 7, # default 1 input frame + 6 previous frames as in CAE
+ image_size: int = 512,
+ backbone_stride: int = 16, # default to 16 as in CAE (truncated Hiera backbone)
+ prob_to_use_pt_input_for_train: float = 0.0,
+ prob_to_use_pt_input_for_eval: float = 0.0,
+ prob_to_use_box_input_for_train: float = 0.0,
+ prob_to_use_box_input_for_eval: float = 0.0,
+ # always_keep_first_frame_mem=True, # this option is removed (we've always set it to True)
+ apply_sigmoid_to_mask_logits_for_mem_enc: bool = False,
+ sigmoid_scale_for_mem_enc: float = 1.0, # scale factor for mask sigmoid prob, only effective when `apply_sigmoid_to_mask_logits_for_mem_enc` is True
+ sigmoid_bias_for_mem_enc: float = 0.0, # bias factor for mask sigmoid prob, only effective when `apply_sigmoid_to_mask_logits_for_mem_enc` is True
+ # During evaluation, whether to binarize the sigmoid mask logits on interacted frames with clicks, only effective when `apply_sigmoid_to_mask_logits_for_mem_enc` is True
+ binarize_mask_from_pts_for_mem_enc: bool = False,
+ use_mask_input_as_output_without_sam: bool = False, # on frames with mask input, whether to directly output the input mask without using a SAM prompt encoder + mask decoder
+ # how many frames for interactive point sampling (only effective when using point inputs per video; the first frame is always used)
+ # - if `num_frames_to_correct` below is True, we randomly sample 1~num_frames_to_correct frames for interactive point sampling
+ # - otherwise we used a fixed number of num_frames_to_correct frames for interactive point sampling
+ # if it is 1, we do interactive point sampling only on the 1st frame
+ # if it is greater than 1, we interactive point sampling in the 1st frame and other randomly selected frames
+ num_frames_to_correct_for_train: int = 1, # default: only iteratively sample on first frame
+ num_frames_to_correct_for_eval: int = 1, # default: only iteratively sample on first frame
+ rand_frames_to_correct_for_train: bool = False,
+ rand_frames_to_correct_for_eval: bool = False,
+ prob_correct_all_objects_for_train: float = 0.0,
+ ratio_of_objects_to_correct_for_train: float = 1.0,
+ force_correct_all_for_conditional_inputs: bool = False,
+ rand_objects_to_correct_for_train: bool = True,
+ # how many frames to use as initial conditioning frames (for both point input and mask input; the first frame is always used as an initial conditioning frame)
+ # - if `rand_init_cond_frames` below is True, we randomly sample 1~num_init_cond_frames initial conditioning frames
+ # - otherwise we sample a fixed number of num_init_cond_frames initial conditioning frames
+ # note: for point input, we sample correction points on all such initial conditioning frames, and we require that `num_frames_to_correct` >= `num_init_cond_frames`;
+ # these are initial conditioning frames because as we track the video, more conditioning frames might be added
+ # when a frame receives correction clicks under point input if `add_all_frames_to_correct_as_cond=True`
+ num_init_cond_frames_for_train: int = 1, # default: only use the first frame as initial conditioning frame
+ num_init_cond_frames_for_eval: int = 1, # default: only use the first frame as initial conditioning frame
+ rand_init_cond_frames_for_train: bool = True, # default: random 1~num_init_cond_frames_for_train cond frames (to be constent w/ previous TA data loader)
+ rand_init_cond_frames_for_eval: bool = False,
+ # The maximum number of conditioning frames to participate in the memory attention (-1 means no limit; if there are more conditioning frames than this limit,
+ # we only cross-attend to the temporally closest `max_cond_frames_in_attn` conditioning frames in the encoder when tracking each frame). This gives the model
+ # a temporal locality when handling a large number of annotated frames (since closer frames should be more important) and also avoids GPU OOM.
+ max_cond_frames_in_attn: int = -1,
+ # Whether to always keep the first conditioning frame in case we exceed the maximum number of conditioning frames allowed
+ keep_first_cond_frame=False,
+ # if `add_all_frames_to_correct_as_cond` is True, we also append to the conditioning frame list any frame that receives a later correction click
+ # if `add_all_frames_to_correct_as_cond` is False, we conditioning frame list to only use those initial conditioning frames
+ add_all_frames_to_correct_as_cond: bool = False,
+ # how many additional correction points to sample (on each frame selected to be corrected)
+ # note that the first frame receives an initial input click (in addition to any correction clicks)
+ num_correction_pt_per_frame: int = 7,
+ # method for point sampling during evaluation
+ # "uniform" (sample uniformly from error region) or "center" (use the point with the largest distance to error region boundary)
+ # default to "center" to be consistent with evaluation in the SAM paper
+ pt_sampling_for_eval: Literal["uniform", "center"] = "center",
+ # During training, we optionally allow sampling the correction points from GT regions
+ # instead of the prediction error regions with a small probability. This might allow the
+ # model to overfit less to the error regions in training datasets
+ prob_to_sample_from_gt_for_train: float = 0.0,
+ # on the first frame, whether to directly add the no-memory embedding to the image feature
+ # (instead of using the transformer encoder)
+ directly_add_no_mem_embed: bool = False,
+ # whether to use high-resolution feature maps in the SAM mask decoder
+ use_high_res_features_in_sam: bool = False,
+ # whether to output multiple (3) masks for the first click on initial conditioning frames
+ multimask_output_in_sam: bool = False,
+ # the minimum and maximum number of clicks to use multimask_output_in_sam (only relevant when `multimask_output_in_sam=True`;
+ # default is 1 for both, meaning that only the first click gives multimask output; also note that a box counts as two points)
+ multimask_min_pt_num: int = 1,
+ multimask_max_pt_num: int = 1,
+ # whether to also use multimask output for tracking (not just for the first click on initial conditioning frames; only relevant when `multimask_output_in_sam=True`)
+ multimask_output_for_tracking: bool = False,
+ # Whether to use multimask tokens for obj ptr; Only relevant when both
+ # use_obj_ptrs_in_encoder=True and multimask_output_for_tracking=True
+ use_multimask_token_for_obj_ptr: bool = False,
+ # if the last output is multimask during training, whether to select the mask w/ highest IoU to the ground-truth for memory encoder
+ # (instead of the mask with the highest prediction score; this resembles teacher-forcing for multi-mask prediction in tracking)
+ use_best_iou_mask_for_mem_enc: bool = False,
+ # whether to use sigmoid to restrict ious prediction to [0-1]
+ iou_prediction_use_sigmoid: bool = False,
+ # whether to feed the previously predicted low-res mask logits as a mask prompt into the SAM mask decoder during iterative point sampling
+ iter_use_prev_mask_pred: bool = False,
+ # whether to forward image features per frame (as it's being tracked) during evaluation, instead of forwarding image features
+ # of all frames at once. This avoids backbone OOM errors on very long videos in evaluation, but could be slightly slower.
+ forward_backbone_per_frame_for_eval: bool = False,
+ # The memory bank's temporal stride during evaluation (i.e. the `r` parameter in XMem and Cutie; XMem and Cutie use r=5).
+ # For r>1, the (self.num_maskmem - 1) non-conditioning memory frames consist of
+ # (self.num_maskmem - 2) nearest frames from every r-th frames, plus the last frame.
+ memory_temporal_stride_for_eval: int = 1,
+ # whether to offload outputs to CPU memory during evaluation, to avoid GPU OOM on very long videos or very large resolutions or too many objects
+ # (it's recommended to use `forward_backbone_per_frame_for_eval=True` first before setting this option to True)
+ offload_output_to_cpu_for_eval: bool = False,
+ # whether to trim the output of past non-conditioning frames (num_maskmem frames before the current frame) during evaluation
+ # (this helps save GPU or CPU memory on very long videos for semi-supervised VOS eval, where only the first frame receives prompts)
+ trim_past_non_cond_mem_for_eval: bool = False,
+ # whether to apply non-overlapping constraints on the object masks in the memory encoder during evaluation (to avoid/alleviate superposing masks)
+ non_overlap_masks_for_mem_enc: bool = False,
+ # whether to cross-attend to object pointers from other frames (based on SAM output tokens) in the encoder
+ use_obj_ptrs_in_encoder: bool = False,
+ # the maximum number of object pointers from other frames in encoder cross attention (only relevant when `use_obj_ptrs_in_encoder=True`)
+ max_obj_ptrs_in_encoder: int = 16,
+ # whether to add temporal positional encoding to the object pointers in the encoder (only relevant when `use_obj_ptrs_in_encoder=True`)
+ add_tpos_enc_to_obj_ptrs: bool = True,
+ # whether to add an extra linear projection layer for the temporal positional encoding in the object pointers to avoid potential interference
+ # with spatial positional encoding (only relevant when both `use_obj_ptrs_in_encoder=True` and `add_tpos_enc_to_obj_ptrs=True`)
+ proj_tpos_enc_in_obj_ptrs: bool = False,
+ # whether to use signed distance (instead of unsigned absolute distance) in the temporal positional encoding in the object pointers
+ # (only relevant when both `use_obj_ptrs_in_encoder=True` and `add_tpos_enc_to_obj_ptrs=True`)
+ use_signed_tpos_enc_to_obj_ptrs: bool = False,
+ # whether to only attend to object pointers in the past (before the current frame) in the encoder during evaluation
+ # (only relevant when `use_obj_ptrs_in_encoder=True`; this might avoid pointer information too far in the future to distract the initial tracking)
+ only_obj_ptrs_in_the_past_for_eval: bool = False,
+ # Whether to predict if there is an object in the frame
+ pred_obj_scores: bool = False,
+ # Whether to use an MLP to predict object scores
+ pred_obj_scores_mlp: bool = False,
+ # Only relevant if pred_obj_scores=True and use_obj_ptrs_in_encoder=True;
+ # Whether to have a fixed no obj pointer when there is no object present
+ # or to use it as an additive embedding with obj_ptr produced by decoder
+ fixed_no_obj_ptr: bool = False,
+ use_no_obj_ptr: bool = True,
+ use_mlp_for_obj_ptr_proj: bool = False,
+ # replace per-slot static no-obj embeddings with linear projections of object embeddings
+ use_linear_no_obj_ptr: bool = False,
+ # add no obj embedding to spatial frames
+ no_obj_embed_spatial: bool = False,
+ # does not apply to spatial memories (only to obj ptrs), unless unified_tpos_enc=True
+ sincos_tpos_enc: bool = True,
+ # extra arguments used to construct the SAM mask decoder; if not None, it should be a dict of kwargs to be passed into `MaskDecoder` class.
+ sam_mask_decoder_extra_args: Optional[dict] = None,
+ # whether to compile all the model compoents
+ compile_all_components: bool = False,
+ # save and use image features in the memory
+ save_image_features: bool = False,
+ # number of multimask outputs in the SAM mask decoder
+ num_multimask_outputs: int = 3,
+ # use a single mask token to predict all masks
+ decode_mask_with_shared_tokens: bool = False,
+ # use the mask token for predicting ious and object scores
+ decode_mask_attribute_with_shared_tokens: bool = False,
+ share_necks: bool = False, # share the interactive and sam2_backbone necks
+ # if enabled, use a different rng generator for operations that differ between GPUs,
+ # such that the base rng that controls flow does not go out-of-sync among GPUs
+ # There will be a slight performance penalty when turned off due to uneven workload but it's minor
+ randomness_fix: bool = False,
+ # add a learnable embeddings to the object queries that corresponding to paddings/removed objects
+ add_output_suppression_embeddings: bool = False,
+ # add a per-object embedding to the spatial memory features if that object is a conditioning input
+ add_object_conditional_embeddings: bool = False,
+ # if None, follow add_object_conditional_embeddings
+ add_object_unconditional_embeddings: Optional[bool] = None,
+ # for each object, add an additional channel in the mask encoder to indicate conditional/unconditional objects
+ condition_as_mask_input: bool = False,
+ condition_as_mask_input_fg: float = 1.0,
+ condition_as_mask_input_bg: float = 0.0,
+ # use v2 memory positional encodings
+ # in v2, the last slot in the positional encoding no longer refers to the conditional frame
+ # it now refers to "out-of-bound" frames.
+ # The motivation is to shift all encodings of "conditioning" to the object_conditional embeddings
+ use_maskmem_tpos_v2: bool = False,
+ # select the frame with object existence
+ use_memory_selection: bool = False,
+ # when using memory selection, the threshold to determine if the frame is good
+ mf_threshold: float = 0.01,
+ # this is a flag for demo purposes; it does not need to be explicitly set
+ is_dynamic_model: bool = False,
+ object_score_logit_threshold: float = 0.0,
+ stability_score_attentuation: bool = False, # select from multimask based on iou*stability_score
+ ):
+ super().__init__()
+
+ # the interactive sam mask deocder can use dynamic_multimask_via_stability
+ interactive_sam_mask_decoder_extra_args = deepcopy(sam_mask_decoder_extra_args)
+ if sam_mask_decoder_extra_args is not None:
+ dynamic_multimask_via_stability = sam_mask_decoder_extra_args.get(
+ "dynamic_multimask_via_stability", False
+ )
+ if dynamic_multimask_via_stability:
+ sam_mask_decoder_extra_args["dynamic_multimask_via_stability"] = False
+ print(
+ "dynamic_multimask_via_stability is reset to False in the multiplex model"
+ )
+
+ # Part 1: the image backbone
+ self.backbone = backbone
+ # Use level 0, 1, 2 for high-res setting, or just level 2 for the default setting
+ self.use_high_res_features_in_sam = use_high_res_features_in_sam
+ self.num_feature_levels = 3 if use_high_res_features_in_sam else 1
+ self.use_obj_ptrs_in_encoder = use_obj_ptrs_in_encoder
+ self.max_obj_ptrs_in_encoder = max_obj_ptrs_in_encoder
+ if use_obj_ptrs_in_encoder:
+ # A conv layer to downsample the GT mask prompt to stride 4 (the same stride as
+ # low-res SAM mask logits) and to change its scales from 0~1 to SAM logit scale,
+ # so that it can be fed into the SAM mask decoder to generate a pointer.
+ self.interactive_mask_downsample = torch.nn.Conv2d(
+ 1, 1, kernel_size=4, stride=4
+ )
+
+ self.add_tpos_enc_to_obj_ptrs = add_tpos_enc_to_obj_ptrs
+ if proj_tpos_enc_in_obj_ptrs:
+ assert add_tpos_enc_to_obj_ptrs # these options need to be used together
+ self.proj_tpos_enc_in_obj_ptrs = proj_tpos_enc_in_obj_ptrs
+ self.use_signed_tpos_enc_to_obj_ptrs = use_signed_tpos_enc_to_obj_ptrs
+ self.only_obj_ptrs_in_the_past_for_eval = only_obj_ptrs_in_the_past_for_eval
+ self.multiplex_controller = multiplex_controller
+ self.save_image_features = save_image_features
+ self.multiplex_count = self.multiplex_controller.multiplex_count
+
+ # Part 2: encoder-only transformer to fuse current frame's visual features
+ # with memories from past frames
+ assert transformer.decoder is None, "transformer should be encoder-only"
+ self.transformer = transformer
+ self.hidden_dim: int = transformer.d_model
+
+ # Part 3: memory encoder for the previous frame's outputs
+ self.maskmem_backbone = maskmem_backbone
+ self.mem_dim = self.hidden_dim
+ if hasattr(self.maskmem_backbone, "out_proj") and hasattr(
+ self.maskmem_backbone.out_proj, "weight"
+ ):
+ # if there is compression of memories along channel dim
+ mem_dim = self.maskmem_backbone.out_proj.weight.shape[0]
+ assert mem_dim == self.hidden_dim, (
+ "there should be no compression of memory embeddings"
+ )
+ self.num_maskmem = num_maskmem # Number of memories accessible
+ # Temporal encoding of the memories
+ self.sincos_tpos_enc = sincos_tpos_enc
+ self.use_maskmem_tpos_v2 = use_maskmem_tpos_v2
+ # tpos specific to spatial memories only
+ # last token actually corresponds to conditioning
+ # frame embedding, indep of temporal position
+ self.maskmem_tpos_enc = torch.nn.Parameter(
+ torch.zeros(num_maskmem, 1, 1, self.mem_dim)
+ )
+ trunc_normal_(self.maskmem_tpos_enc, std=0.02)
+
+ # a single token to indicate no memory embedding from previous frames
+ self.interactivity_no_mem_embed = torch.nn.Parameter(
+ torch.zeros(1, 1, self.hidden_dim)
+ )
+ trunc_normal_(self.interactivity_no_mem_embed, std=0.02)
+ self.directly_add_no_mem_embed = directly_add_no_mem_embed
+
+ # Whether to apply sigmoid to the output raw mask logits (to turn them from
+ # range (-inf, +inf) to range (0, 1)) before feeding them into the memory encoder
+ self.apply_sigmoid_to_mask_logits_for_mem_enc = (
+ apply_sigmoid_to_mask_logits_for_mem_enc
+ )
+ if apply_sigmoid_to_mask_logits_for_mem_enc:
+ self.sigmoid_scale_for_mem_enc = sigmoid_scale_for_mem_enc
+ self.sigmoid_bias_for_mem_enc = sigmoid_bias_for_mem_enc
+
+ if binarize_mask_from_pts_for_mem_enc:
+ logging.warning(
+ """
+ The current model is not trained with binarize_mask_from_pts_for_mem_enc;
+ We force it to False here because external callers often hardcoded this
+ to True, ignoring the config.
+ Re-training should be possible.
+ """
+ )
+ binarize_mask_from_pts_for_mem_enc = False
+
+ self.binarize_mask_from_pts_for_mem_enc = binarize_mask_from_pts_for_mem_enc
+ self.non_overlap_masks_for_mem_enc = non_overlap_masks_for_mem_enc
+ self.memory_temporal_stride_for_eval = memory_temporal_stride_for_eval
+ # On frames with mask input, whether to directly output the input mask without
+ # using a SAM prompt encoder + mask decoder
+ self.use_mask_input_as_output_without_sam = use_mask_input_as_output_without_sam
+ self.multimask_output_in_sam = multimask_output_in_sam
+ self.multimask_min_pt_num = multimask_min_pt_num
+ self.multimask_max_pt_num = multimask_max_pt_num
+ self.multimask_output_for_tracking = multimask_output_for_tracking
+ self.use_multimask_token_for_obj_ptr = use_multimask_token_for_obj_ptr
+ self.use_best_iou_mask_for_mem_enc = use_best_iou_mask_for_mem_enc
+ self.iou_prediction_use_sigmoid = iou_prediction_use_sigmoid
+ self.object_score_logit_threshold = object_score_logit_threshold
+ self.stability_score_attentuation = stability_score_attentuation
+ if iter_use_prev_mask_pred:
+ # In this case, we are feeding the previously predicted SAM mask logits
+ # as mask prompt into the SAM mask decoder, which has a different format
+ # and magnitude from GT mask input in VOS. Therefore in this case, the GT
+ # mask input must be encoded directly (not through the SAM mask decoder).
+ if min(prob_to_use_pt_input_for_train, prob_to_use_pt_input_for_eval) < 1:
+ assert use_mask_input_as_output_without_sam
+ self.iter_use_prev_mask_pred = iter_use_prev_mask_pred
+
+ # Part 4: SAM-style prompt encoder (for both mask and point inputs)
+ # and SAM-style mask decoder for the final mask output
+ self.image_size = image_size
+ self.backbone_stride = backbone_stride
+ self.low_res_mask_size = self.image_size // self.backbone_stride * 4
+ # we resize the mask if it doesn't match `self.input_mask_size` (which is always 4x
+ # the low-res mask size, regardless of the actual input image size); this is because
+ # `_use_mask_as_output` always downsamples the input masks by 4x
+ self.input_mask_size = self.low_res_mask_size * 4
+ self.forward_backbone_per_frame_for_eval = forward_backbone_per_frame_for_eval
+ self.offload_output_to_cpu_for_eval = offload_output_to_cpu_for_eval
+ if trim_past_non_cond_mem_for_eval:
+ assert num_frames_to_correct_for_eval <= 1, (
+ "trim_past_non_cond_mem_for_eval=True requires that only the first frame receives prompts"
+ )
+ self.trim_past_non_cond_mem_for_eval = trim_past_non_cond_mem_for_eval
+ self.sam_mask_decoder_extra_args = sam_mask_decoder_extra_args
+ self.interactive_sam_mask_decoder_extra_args = (
+ interactive_sam_mask_decoder_extra_args
+ )
+ self.pred_obj_scores = pred_obj_scores
+ self.pred_obj_scores_mlp = pred_obj_scores_mlp
+ self.fixed_no_obj_ptr = fixed_no_obj_ptr
+ self.use_no_obj_ptr = use_no_obj_ptr
+ self.use_linear_no_obj_ptr = use_linear_no_obj_ptr
+
+ if self.fixed_no_obj_ptr:
+ assert self.pred_obj_scores
+ assert self.use_obj_ptrs_in_encoder
+ if (
+ self.pred_obj_scores
+ and self.use_obj_ptrs_in_encoder
+ and self.use_no_obj_ptr
+ ):
+ if self.use_linear_no_obj_ptr:
+ self.no_obj_ptr_linear = nn.Linear(self.hidden_dim, self.hidden_dim)
+ else:
+ self.no_obj_ptr = torch.nn.Parameter(
+ torch.zeros(self.multiplex_count, self.hidden_dim)
+ )
+ trunc_normal_(self.no_obj_ptr, std=0.02)
+
+ self.use_mlp_for_obj_ptr_proj = use_mlp_for_obj_ptr_proj
+ self.no_obj_embed_spatial = None
+ if no_obj_embed_spatial:
+ self.no_obj_embed_spatial = torch.nn.Parameter(
+ torch.zeros(self.multiplex_count, self.hidden_dim)
+ )
+ trunc_normal_(self.no_obj_embed_spatial, std=0.02)
+ self.num_multimask_outputs = num_multimask_outputs
+ self.decode_mask_with_shared_tokens = decode_mask_with_shared_tokens
+ self.decode_mask_attribute_with_shared_tokens = (
+ decode_mask_attribute_with_shared_tokens
+ )
+ self.share_necks = share_necks
+
+ self.add_output_suppression_embeddings = add_output_suppression_embeddings
+ if self.add_output_suppression_embeddings:
+ self.output_valid_embed = torch.nn.Parameter(
+ torch.zeros(self.multiplex_count, self.hidden_dim)
+ )
+ self.output_invalid_embed = torch.nn.Parameter(
+ torch.zeros(self.multiplex_count, self.hidden_dim)
+ )
+ trunc_normal_(self.output_valid_embed, std=0.02)
+ trunc_normal_(self.output_invalid_embed, std=0.02)
+ self.add_object_conditional_embeddings = add_object_conditional_embeddings
+ if add_object_unconditional_embeddings is None:
+ add_object_unconditional_embeddings = add_object_conditional_embeddings
+ self.add_object_unconditional_embeddings = add_object_unconditional_embeddings
+ if add_object_unconditional_embeddings:
+ assert add_object_conditional_embeddings
+ if self.add_object_conditional_embeddings:
+ # have embeddings for both conditional and non-conditional objects
+ # such that the features are more "balanced"
+ # these three sets should be disjoint and their union should cover all objects
+ # for conditioning objects
+ self.obj_cond_embed = torch.nn.Parameter(
+ torch.zeros(self.multiplex_count, self.hidden_dim)
+ )
+ trunc_normal_(self.obj_cond_embed, std=0.02)
+ if self.add_object_unconditional_embeddings:
+ # for non-conditioning objects
+ self.obj_non_cond_embed = torch.nn.Parameter(
+ torch.zeros(self.multiplex_count, self.hidden_dim)
+ )
+ trunc_normal_(self.obj_non_cond_embed, std=0.02)
+
+ self.condition_as_mask_input = condition_as_mask_input
+ self.condition_as_mask_input_fg = condition_as_mask_input_fg
+ self.condition_as_mask_input_bg = condition_as_mask_input_bg
+
+ self.is_dynamic_model = is_dynamic_model
+
+ self._build_sam_heads()
+
+ # Point sampler and conditioning frames
+ self.prob_to_use_pt_input_for_train = prob_to_use_pt_input_for_train
+ self.prob_to_use_box_input_for_train = prob_to_use_box_input_for_train
+ self.prob_to_use_pt_input_for_eval = prob_to_use_pt_input_for_eval
+ self.prob_to_use_box_input_for_eval = prob_to_use_box_input_for_eval
+ if prob_to_use_pt_input_for_train > 0 or prob_to_use_pt_input_for_eval > 0:
+ logging.info("Using points (sampled from masks) as inputs")
+ assert num_frames_to_correct_for_train >= num_init_cond_frames_for_train
+ assert num_frames_to_correct_for_eval >= num_init_cond_frames_for_eval
+ self.num_frames_to_correct_for_train = num_frames_to_correct_for_train
+ self.num_frames_to_correct_for_eval = num_frames_to_correct_for_eval
+ self.rand_frames_to_correct_for_train = rand_frames_to_correct_for_train
+ self.rand_frames_to_correct_for_eval = rand_frames_to_correct_for_eval
+ self.prob_correct_all_objects_for_train = prob_correct_all_objects_for_train
+ self.ratio_of_objects_to_correct_for_train = (
+ ratio_of_objects_to_correct_for_train
+ )
+ self.rand_objects_to_correct_for_train = rand_objects_to_correct_for_train
+ self.force_correct_all_for_conditional_inputs = (
+ force_correct_all_for_conditional_inputs
+ )
+ # Initial multi-conditioning frames
+ self.num_init_cond_frames_for_train = num_init_cond_frames_for_train
+ self.num_init_cond_frames_for_eval = num_init_cond_frames_for_eval
+ self.rand_init_cond_frames_for_train = rand_init_cond_frames_for_train
+ self.rand_init_cond_frames_for_eval = rand_init_cond_frames_for_eval
+ self.max_cond_frames_in_attn = max_cond_frames_in_attn
+ self.keep_first_cond_frame = keep_first_cond_frame
+ self.add_all_frames_to_correct_as_cond = add_all_frames_to_correct_as_cond
+ self.num_correction_pt_per_frame = num_correction_pt_per_frame
+ self.pt_sampling_for_eval = pt_sampling_for_eval
+ self.prob_to_sample_from_gt_for_train = prob_to_sample_from_gt_for_train
+ # A random number generator with a fixed initial seed across GPUs
+ self.rng = np.random.default_rng(seed=42)
+ if randomness_fix:
+ self.rng2 = np.random.default_rng(seed=42)
+ else:
+ self.rng2 = self.rng
+
+ # Use frame filtering according to SAM2Long
+ self.use_memory_selection = use_memory_selection
+ self.mf_threshold = mf_threshold
+
+ # Compile all components of the model
+ self.compile_all_components = compile_all_components
+ if self.compile_all_components:
+ self._compile_all_components()
+
+ def _get_tpos_enc(self, rel_pos_list, device, max_abs_pos=None, dummy=False):
+ if dummy:
+ return torch.zeros(len(rel_pos_list), self.mem_dim, device=device)
+
+ t_diff_max = max_abs_pos - 1 if max_abs_pos is not None else 1
+ pos_enc = (
+ torch.tensor(rel_pos_list).pin_memory().to(device=device, non_blocking=True)
+ / t_diff_max
+ )
+ if self.sincos_tpos_enc:
+ tpos_dim = (
+ self.hidden_dim if self.proj_tpos_enc_in_obj_ptrs else self.mem_dim
+ )
+ pos_enc = get_1d_sine_pe(pos_enc, dim=tpos_dim)
+ else:
+ raise NotImplementedError
+ pos_enc = self.obj_ptr_tpos_proj(pos_enc)
+
+ return pos_enc
+
+ def _build_sam_heads(self):
+ """Build SAM-style prompt encoder and mask decoder."""
+ self.sam_prompt_embed_dim = self.hidden_dim
+ self.sam_image_embedding_size = self.image_size // self.backbone_stride
+
+ self.image_pe_layer = PositionEmbeddingRandom(self.hidden_dim // 2)
+
+ # build PromptEncoder and MaskDecoder from SAM
+ # (their hyperparameters like `mask_in_chans=16` are from SAM code)
+ self.interactive_sam_prompt_encoder = PromptEncoder(
+ embed_dim=self.sam_prompt_embed_dim,
+ image_embedding_size=(
+ self.sam_image_embedding_size,
+ self.sam_image_embedding_size,
+ ),
+ input_image_size=(self.image_size, self.image_size),
+ mask_in_chans=16,
+ )
+
+ self.interactive_sam_mask_decoder = MaskDecoder(
+ num_multimask_outputs=3,
+ transformer=TwoWayTransformer(
+ depth=2,
+ embedding_dim=self.sam_prompt_embed_dim,
+ mlp_dim=2048,
+ num_heads=8,
+ ),
+ transformer_dim=self.sam_prompt_embed_dim,
+ iou_head_depth=3,
+ iou_head_hidden_dim=256,
+ use_high_res_features=self.use_high_res_features_in_sam,
+ iou_prediction_use_sigmoid=self.iou_prediction_use_sigmoid,
+ pred_obj_scores=self.pred_obj_scores,
+ pred_obj_scores_mlp=self.pred_obj_scores_mlp,
+ use_multimask_token_for_obj_ptr=self.use_multimask_token_for_obj_ptr,
+ **(self.interactive_sam_mask_decoder_extra_args or {}),
+ )
+ if self.share_necks:
+ # we will use self.sam_mask_decoder's convs
+ del self.interactive_sam_mask_decoder.conv_s0
+ del self.interactive_sam_mask_decoder.conv_s1
+
+ self.sam_mask_decoder = MultiplexMaskDecoder(
+ multiplex_count=self.multiplex_count,
+ num_multimask_outputs=self.num_multimask_outputs,
+ transformer=TwoWayTransformer(
+ depth=2,
+ embedding_dim=self.hidden_dim,
+ mlp_dim=2048,
+ num_heads=8,
+ ),
+ transformer_dim=self.hidden_dim,
+ iou_head_depth=3,
+ iou_head_hidden_dim=256,
+ use_high_res_features=self.use_high_res_features_in_sam,
+ iou_prediction_use_sigmoid=self.iou_prediction_use_sigmoid,
+ pred_obj_scores=self.pred_obj_scores,
+ pred_obj_scores_mlp=self.pred_obj_scores_mlp,
+ use_multimask_token_for_obj_ptr=self.use_multimask_token_for_obj_ptr,
+ decode_mask_with_shared_tokens=self.decode_mask_with_shared_tokens,
+ decode_mask_attribute_with_shared_tokens=self.decode_mask_attribute_with_shared_tokens,
+ multimask_outputs_only=self.num_multimask_outputs > 0
+ and self.multimask_output_in_sam,
+ **(self.sam_mask_decoder_extra_args or {}),
+ )
+
+ if self.use_obj_ptrs_in_encoder:
+ # a linear projection on SAM output tokens to turn them into object pointers
+ self.obj_ptr_proj = torch.nn.Linear(self.hidden_dim, self.hidden_dim)
+ self.interactive_obj_ptr_proj = torch.nn.Linear(
+ self.hidden_dim, self.hidden_dim
+ )
+ if self.use_mlp_for_obj_ptr_proj:
+ self.obj_ptr_proj = MLP(
+ self.hidden_dim, self.hidden_dim, self.hidden_dim, 3
+ )
+ self.interactive_obj_ptr_proj = MLP(
+ self.hidden_dim, self.hidden_dim, self.hidden_dim, 3
+ )
+ else:
+ self.obj_ptr_proj = torch.nn.Identity()
+ self.interactive_obj_ptr_proj = torch.nn.Identity()
+ if self.proj_tpos_enc_in_obj_ptrs:
+ # a linear projection on temporal positional encoding in object pointers to
+ # avoid potential interference with spatial positional encoding
+ self.obj_ptr_tpos_proj = torch.nn.Linear(self.hidden_dim, self.mem_dim)
+ else:
+ self.obj_ptr_tpos_proj = torch.nn.Identity()
+
+ def _get_interactive_pix_mem(
+ self, features: torch.Tensor, feat_sizes: list[tuple]
+ ) -> torch.Tensor:
+ assert self.directly_add_no_mem_embed
+ pix_feat_with_mem = features[-1] + self.interactivity_no_mem_embed
+ B = features[-1].size(1) # batch size on this frame
+ C = self.hidden_dim
+ H, W = feat_sizes[-1] # top-level (lowest-resolution) feature size
+ pix_feat_with_mem = pix_feat_with_mem.permute(1, 2, 0).view(B, C, H, W)
+ return pix_feat_with_mem
+
+ def _forward_sam_heads(
+ self,
+ backbone_features: torch.Tensor,
+ *,
+ point_inputs: Optional[dict[str, torch.Tensor]] = None,
+ mask_inputs: Optional[torch.Tensor] = None,
+ interactive_high_res_features: Optional[list[torch.Tensor]] = None,
+ propagation_high_res_features: Optional[list[torch.Tensor]] = None,
+ multimask_output: bool = False,
+ gt_masks=None,
+ multiplex_state: MultiplexState,
+ objects_to_interact: Optional[list[int]] = None,
+ ) -> SAMOutput:
+ """
+ Forward SAM prompt encoders and mask heads.
+ We run the propagation head, the interactive head, or both, based on the inputs.
+
+ Inputs:
+ - backbone_features: image features of [B, C, H, W] shape
+ - point_inputs: a dictionary with "point_coords" and "point_labels", where
+ 1) "point_coords" has [B, P, 2] shape and float32 dtype and contains the
+ absolute pixel-unit coordinate in (x, y) format of the P input points
+ 2) "point_labels" has shape [B, P] and int32 dtype, where 1 means
+ positive clicks, 0 means negative clicks, and -1 means padding
+ - mask_inputs: a mask of [B, 1, H*16, W*16] shape, float or bool, with the
+ same spatial size as the image.
+ - high_res_features: either 1) None or 2) a list of length 2 containing
+ two feature maps of [B, C, 4*H, 4*W] and [B, C, 2*H, 2*W] shapes respectively,
+ which will be used as high-resolution feature maps for SAM decoder.
+ - multimask_output: if it's True, we output 3 candidate masks and their 3
+ corresponding IoU estimates, and if it's False, we output only 1 mask and
+ its corresponding IoU estimate.
+
+ Outputs:
+ - low_res_multimasks: [B, M, H*4, W*4] shape (where M = 3 if
+ `multimask_output=True` and M = 1 if `multimask_output=False`), the SAM
+ output mask logits (before sigmoid) for the low-resolution masks, with 4x
+ the resolution (1/4 stride) of the input backbone_features.
+ - high_res_multimasks: [B, M, H*16, W*16] shape (where M = 3
+ if `multimask_output=True` and M = 1 if `multimask_output=False`),
+ upsampled from the low-resolution masks, with shape size as the image
+ (stride is 1 pixel).
+ - ious: [B, M] shape (where M = 3 if `multimask_output=True` and M = 1
+ if `multimask_output=False`), the estimated IoU of each output mask.
+ - low_res_masks: [B, 1, H*4, W*4] shape, the best mask in `low_res_multimasks`.
+ If `multimask_output=True`, it's the mask with the highest IoU estimate.
+ If `multimask_output=False`, it's the same as `low_res_multimasks`.
+ - high_res_masks: [B, 1, H*16, W*16] shape, the best mask in `high_res_multimasks`.
+ If `multimask_output=True`, it's the mask with the highest IoU estimate.
+ If `multimask_output=False`, it's the same as `high_res_multimasks`.
+ - obj_ptr: [num_buckets, multiplex_count, C] shape, the object pointer vector for
+ the output mask, extracted based on the output token from the SAM mask decoder.
+ """
+
+ device = backbone_features.device
+ assert backbone_features.size(1) == self.hidden_dim
+ assert backbone_features.size(2) == self.sam_image_embedding_size
+ assert backbone_features.size(3) == self.sam_image_embedding_size
+
+ is_interactive = point_inputs is not None or mask_inputs is not None
+
+ if is_interactive:
+ """
+ Image-level, per-object interactive path
+ """
+ assert interactive_high_res_features is not None
+ assert objects_to_interact is not None
+
+ # a) Handle point prompts
+ if point_inputs is not None:
+ sam_point_coords = point_inputs["point_coords"]
+ sam_point_labels = point_inputs["point_labels"]
+ else:
+ assert mask_inputs is not None
+ # If no points are provided, pad with an empty point (with label -1)
+ sam_point_coords = torch.zeros(
+ mask_inputs.shape[0], 1, 2, device=device
+ )
+ sam_point_labels = -torch.ones(
+ mask_inputs.shape[0], 1, dtype=torch.int32, device=device
+ )
+
+ # b) Handle mask prompts
+ if mask_inputs is not None:
+ # If mask_inputs is provided, downsize it into low-res mask input if needed
+ # and feed it as a dense mask prompt into the SAM mask encoder
+ assert len(mask_inputs.shape) == 4
+ if (
+ mask_inputs.shape[-2:]
+ != self.interactive_sam_prompt_encoder.mask_input_size
+ ):
+ sam_mask_prompt = F.interpolate(
+ mask_inputs.float(),
+ size=self.interactive_sam_prompt_encoder.mask_input_size,
+ align_corners=False,
+ mode="bilinear",
+ antialias=True, # use antialias for downsampling
+ )
+ else:
+ sam_mask_prompt = mask_inputs
+ else:
+ # Otherwise, simply feed None (and SAM's prompt encoder will add
+ # a learned `no_mask_embed` to indicate no mask input in this case).
+ sam_mask_prompt = None
+
+ sparse_embeddings, dense_embeddings = self.interactive_sam_prompt_encoder(
+ points=(sam_point_coords, sam_point_labels),
+ boxes=None,
+ masks=sam_mask_prompt,
+ )
+
+ # Clone image_pe and the outputs of sam_prompt_encoder
+ # to enable compilation
+ sparse_embeddings = self._maybe_clone(sparse_embeddings)
+ dense_embeddings = self._maybe_clone(dense_embeddings)
+ image_pe = self._maybe_clone(
+ self.interactive_sam_prompt_encoder.get_dense_pe()
+ )
+ (
+ low_res_multimasks,
+ ious,
+ sam_output_tokens,
+ object_score_logits,
+ ) = self.interactive_sam_mask_decoder(
+ image_embeddings=backbone_features,
+ image_pe=image_pe,
+ sparse_prompt_embeddings=sparse_embeddings,
+ dense_prompt_embeddings=dense_embeddings,
+ multimask_output=multimask_output,
+ repeat_image=True,
+ high_res_features=interactive_high_res_features,
+ )
+
+ else:
+ """
+ Multiplexed propagation path
+ """
+ assert propagation_high_res_features is not None
+ assert multiplex_state is not None
+
+ if self.add_output_suppression_embeddings:
+ # the suppression embeddings inform the mask decoder the objects that should be decoded
+ output_valid_embed = self.output_valid_embed.unsqueeze(0)
+ output_invalid_embed = self.output_invalid_embed.unsqueeze(0)
+ valid_object_mask = (
+ multiplex_state.get_valid_object_mask().unsqueeze(-1).float()
+ )
+ output_merged_embed = (
+ valid_object_mask * output_valid_embed
+ + (1 - valid_object_mask) * output_invalid_embed
+ )
+ else:
+ output_merged_embed = None
+
+ # Clone image_pe to enable compilation
+ image_pe = self._maybe_clone(self.get_propagation_dense_pe())
+ out = self.sam_mask_decoder(
+ image_embeddings=backbone_features,
+ image_pe=image_pe,
+ high_res_features=propagation_high_res_features,
+ multimask_output=multimask_output,
+ extra_per_object_embeddings=output_merged_embed,
+ )
+ low_res_multimasks = out["masks"] # [B, M, 3/1, H*4, W*4]
+ ious = out["iou_pred"] # [B, M, 3/1]
+ sam_output_tokens = out["sam_tokens_out"] # [B, M, 3/1, C]
+ object_score_logits = out["object_score_logits"]
+
+ low_res_multimasks = multiplex_state.demux(low_res_multimasks)
+ ious = multiplex_state.demux(ious)
+ object_score_logits = multiplex_state.demux(object_score_logits)
+ sam_output_tokens = multiplex_state.demux(sam_output_tokens)
+
+ """
+ The interactive and the propagation paths converge here
+ """
+ # Clone the output of sam_mask_decoder
+ # to enable compilation
+ low_res_multimasks = self._maybe_clone(low_res_multimasks)
+ ious = self._maybe_clone(ious)
+ object_score_logits = self._maybe_clone(object_score_logits)
+ sam_output_tokens = self._maybe_clone(sam_output_tokens)
+
+ if self.pred_obj_scores:
+ is_obj_appearing = object_score_logits > self.object_score_logit_threshold
+
+ # Mask used for spatial memories is always a *hard* choice between obj and no obj,
+ # consistent with the actual mask prediction
+ low_res_multimasks = torch.where(
+ is_obj_appearing[:, None, None],
+ low_res_multimasks,
+ NO_OBJ_SCORE,
+ )
+
+ # convert masks from possibly bfloat16 (or float16) to float32
+ # (older PyTorch versions before 2.1 don't support `interpolate` on bf16)
+ low_res_multimasks = low_res_multimasks.float()
+ high_res_multimasks = F.interpolate(
+ low_res_multimasks,
+ size=(self.image_size, self.image_size),
+ mode="bilinear",
+ align_corners=False,
+ )
+
+ sam_output_token = sam_output_tokens[:, 0]
+ if multimask_output and (
+ not self.decode_mask_with_shared_tokens or is_interactive
+ ):
+ # take the best mask prediction (with the highest IoU estimation)
+ if self.stability_score_attentuation:
+ # prefer selecting masks with high stability score
+ stability_score = self.sam_mask_decoder._get_stability_scores(
+ low_res_multimasks
+ )
+ ious = ious * stability_score
+
+ best_iou_inds = torch.argmax(ious, dim=-1)
+ batch_inds = torch.arange(ious.shape[0], device=device)
+
+ low_res_masks = low_res_multimasks[batch_inds, best_iou_inds].unsqueeze(1)
+ high_res_masks = high_res_multimasks[batch_inds, best_iou_inds].unsqueeze(1)
+ if sam_output_tokens.size(1) > 1:
+ sam_output_token = sam_output_tokens[batch_inds, best_iou_inds]
+ else:
+ if multimask_output and not is_interactive:
+ assert self.decode_mask_with_shared_tokens
+ low_res_masks = low_res_multimasks[:, 0:1]
+ high_res_masks = high_res_multimasks[:, 0:1]
+ else:
+ low_res_masks = low_res_multimasks
+ high_res_masks = high_res_multimasks
+
+ # Extract object pointer from the SAM output token
+ if self.use_obj_ptrs_in_encoder:
+ if is_interactive:
+ obj_ptr = self.interactive_obj_ptr_proj(sam_output_token)
+ else:
+ obj_ptr = self.obj_ptr_proj(sam_output_token)
+
+ if self.pred_obj_scores and self.use_no_obj_ptr:
+ lambda_is_obj_appearing = is_obj_appearing.float()
+ if self.use_linear_no_obj_ptr:
+ obj_ptr = lambda_is_obj_appearing * obj_ptr + (
+ 1 - lambda_is_obj_appearing
+ ) * self.no_obj_ptr_linear(obj_ptr)
+ else:
+ if self.fixed_no_obj_ptr:
+ obj_ptr = lambda_is_obj_appearing * obj_ptr
+
+ # use demux to locate the corresponding no_obj_ptr entries
+ selected_no_obj_ptr = self.no_obj_ptr.unsqueeze(0).repeat(
+ multiplex_state.num_buckets, 1, 1
+ )
+ selected_no_obj_ptr = multiplex_state.demux(selected_no_obj_ptr)
+ if is_interactive:
+ # if is_interactive, the object pointers are in the data space
+ selected_no_obj_ptr = selected_no_obj_ptr[objects_to_interact]
+
+ obj_ptr = (
+ obj_ptr + (1 - lambda_is_obj_appearing) * selected_no_obj_ptr
+ )
+
+ outputs: SAMOutput = {
+ "low_res_multimasks": low_res_multimasks,
+ "high_res_multimasks": high_res_multimasks,
+ "ious": ious,
+ "low_res_masks": low_res_masks,
+ "high_res_masks": high_res_masks,
+ "object_score_logits": object_score_logits,
+ }
+ if self.use_obj_ptrs_in_encoder:
+ outputs["obj_ptr"] = obj_ptr # [num_objects, C], in data space
+ return outputs
+
+ def _use_mask_as_output(
+ self,
+ backbone_features: torch.Tensor,
+ high_res_features: list[torch.Tensor],
+ mask_inputs: torch.Tensor,
+ multiplex_state: MultiplexState,
+ objects_in_mask: Optional[list[int]] = None,
+ ) -> SAMOutput:
+ """
+ Directly turn binary `mask_inputs` into a output mask logits without using SAM.
+ (same input and output shapes as in _forward_sam_heads above).
+ """
+ if objects_in_mask is None:
+ objects_in_mask = list(range(multiplex_state.total_valid_entries))
+
+ # Use -10/+10 as logits for neg/pos pixels (very close to 0/1 in prob after sigmoid).
+ out_scale, out_bias = 20.0, -10.0 # sigmoid(-10.0)=4.5398e-05
+ mask_inputs_float = mask_inputs.to(backbone_features.dtype)
+ assert mask_inputs.shape[0] == len(objects_in_mask), (
+ f"{mask_inputs.shape[0]} != {len(objects_in_mask)}"
+ )
+ high_res_masks = mask_inputs_float * out_scale + out_bias
+ low_res_masks = F.interpolate(
+ high_res_masks,
+ size=(high_res_masks.size(-2) // 4, high_res_masks.size(-1) // 4),
+ align_corners=False,
+ mode="bilinear",
+ antialias=True, # use antialias for downsampling
+ )
+ # a dummy IoU prediction of all 1's under mask input
+ ious = mask_inputs.new_ones(
+ mask_inputs.size(0), 1, dtype=backbone_features.dtype
+ )
+
+ if self.use_obj_ptrs_in_encoder:
+ # produce an object pointer using the SAM decoder from the mask input
+ sam_outputs = self._forward_sam_heads(
+ backbone_features=backbone_features,
+ mask_inputs=self.interactive_mask_downsample(mask_inputs_float),
+ interactive_high_res_features=high_res_features,
+ gt_masks=mask_inputs,
+ objects_to_interact=objects_in_mask,
+ multiplex_state=multiplex_state,
+ )
+ obj_ptr = sam_outputs["obj_ptr"]
+
+ # In this method, we are treating mask_input as output, e.g. using it directly to create spatial mem;
+ # Below, we follow the same design axiom to use mask_input to decide if obj appears or not instead of relying
+ # on the object_scores from the SAM decoder.
+ is_obj_appearing = torch.any(mask_inputs.flatten(1).float() > 0.0, dim=1)
+ is_obj_appearing = is_obj_appearing[..., None]
+ lambda_is_obj_appearing = is_obj_appearing.float()
+ object_score_logits = out_scale * lambda_is_obj_appearing + out_bias
+ # Note that although this logic has already been applied in _forward_sam_heads
+ # it is ok because lambda_is_obj_appearing is binary
+ # when it is zero it forces no_obj_ptr
+ # when it is one it keeps the output from _forward_sam_heads
+ if self.pred_obj_scores and self.use_no_obj_ptr:
+ if self.use_linear_no_obj_ptr:
+ obj_ptr = lambda_is_obj_appearing * obj_ptr + (
+ 1 - lambda_is_obj_appearing
+ ) * self.no_obj_ptr_linear(obj_ptr)
+ else:
+ if self.fixed_no_obj_ptr:
+ obj_ptr = lambda_is_obj_appearing * obj_ptr
+ # use demux to locate the corresponding no_obj_ptr entries
+ selected_no_obj_ptr = self.no_obj_ptr.unsqueeze(0).repeat(
+ multiplex_state.num_buckets, 1, 1
+ )
+ selected_no_obj_ptr = multiplex_state.demux(selected_no_obj_ptr)
+ selected_no_obj_ptr = selected_no_obj_ptr[objects_in_mask]
+ obj_ptr = (
+ obj_ptr + (1 - lambda_is_obj_appearing) * selected_no_obj_ptr
+ )
+
+ outputs: SAMOutput = {
+ "low_res_multimasks": low_res_masks,
+ "high_res_multimasks": high_res_masks,
+ "ious": ious,
+ "low_res_masks": low_res_masks,
+ "high_res_masks": high_res_masks,
+ "object_score_logits": object_score_logits,
+ }
+ if self.use_obj_ptrs_in_encoder:
+ outputs["obj_ptr"] = obj_ptr # [num_objects, C], in data space
+ return outputs
+
+ def forward(self, input: BatchedDatapoint, is_inference=False):
+ if self.training or not self.forward_backbone_per_frame_for_eval:
+ # precompute image features on all frames before tracking
+ backbone_out = self.forward_image(
+ input.img_batch, need_interactive_out=True, need_propagation_out=True
+ )
+ else:
+ # defer image feature computation on a frame until it's being tracked
+ backbone_out = {}
+ backbone_out = self.prepare_prompt_inputs(backbone_out, input)
+ previous_stages_out = self.forward_tracking(backbone_out, input)
+
+ # "None" for get_queries to be compatible with the trainer
+ return previous_stages_out, None
+
+ def forward_image(
+ self,
+ img_batch,
+ *,
+ need_sam3_out: bool = False,
+ need_interactive_out: bool = False,
+ need_propagation_out: bool = False,
+ ):
+ """Get the image feature on the input batch."""
+ if self.share_necks:
+ need_propagation_out = need_interactive_out or need_propagation_out
+ need_interactive_out = False
+ # this also means that convs for backbone_fpn are shared
+ backbone_out = self.backbone.forward_image(
+ img_batch,
+ need_sam3_out=need_sam3_out,
+ need_sam2_out=need_propagation_out,
+ )
+ backbone_out["interactive"] = backbone_out["sam2_backbone_out"]
+ else:
+ backbone_out = self.backbone.forward_image(
+ img_batch,
+ need_sam3_out=need_sam3_out,
+ need_interactive_out=need_interactive_out,
+ need_propagation_out=need_propagation_out,
+ )
+ if self.use_high_res_features_in_sam:
+ # precompute projected level 0 and level 1 features in SAM decoder
+ # to avoid running it again on every SAM click
+ if need_interactive_out:
+ backbone_out["interactive"]["backbone_fpn"][
+ 0
+ ].tensors = self.interactive_sam_mask_decoder.conv_s0(
+ backbone_out["interactive"]["backbone_fpn"][0].tensors
+ )
+ backbone_out["interactive"]["backbone_fpn"][
+ 1
+ ].tensors = self.interactive_sam_mask_decoder.conv_s1(
+ backbone_out["interactive"]["backbone_fpn"][1].tensors
+ )
+ if need_propagation_out:
+ backbone_out["sam2_backbone_out"]["backbone_fpn"][
+ 0
+ ].tensors = self.sam_mask_decoder.conv_s0(
+ backbone_out["sam2_backbone_out"]["backbone_fpn"][0].tensors
+ )
+ backbone_out["sam2_backbone_out"]["backbone_fpn"][
+ 1
+ ].tensors = self.sam_mask_decoder.conv_s1(
+ backbone_out["sam2_backbone_out"]["backbone_fpn"][1].tensors
+ )
+ # Clone to help torch.compile
+ for out_type in backbone_out.keys():
+ for i in range(len(backbone_out[out_type]["backbone_fpn"])):
+ backbone_out[out_type]["backbone_fpn"][i].tensors = self._maybe_clone(
+ backbone_out[out_type]["backbone_fpn"][i].tensors
+ )
+ backbone_out[out_type]["vision_pos_enc"][i] = self._maybe_clone(
+ backbone_out[out_type]["vision_pos_enc"][i]
+ )
+ return backbone_out
+
+ def _prepare_prompt_inputs_meta(self, backbone_out, input, start_frame_idx=0):
+ # Load the ground-truth masks on all frames (so that we can later
+ # sample correction points from them)
+ gt_masks_per_frame = {
+ stage_id: targets.segments.unsqueeze(1) # [B, 1, H_im, W_im]
+ for stage_id, targets in enumerate(input.find_targets)
+ }
+ backbone_out["gt_masks_per_frame"] = gt_masks_per_frame
+ num_frames = len(input.find_targets)
+ backbone_out["num_frames"] = num_frames
+
+ # Randomly decide whether to use point inputs or mask inputs
+ if self.training:
+ prob_to_use_pt_input = self.prob_to_use_pt_input_for_train
+ num_frames_to_correct = self.num_frames_to_correct_for_train
+ rand_frames_to_correct = self.rand_frames_to_correct_for_train
+ num_init_cond_frames = self.num_init_cond_frames_for_train
+ rand_init_cond_frames = self.rand_init_cond_frames_for_train
+ else:
+ prob_to_use_pt_input = self.prob_to_use_pt_input_for_eval
+ num_frames_to_correct = self.num_frames_to_correct_for_eval
+ rand_frames_to_correct = self.rand_frames_to_correct_for_eval
+ num_init_cond_frames = self.num_init_cond_frames_for_eval
+ rand_init_cond_frames = self.rand_init_cond_frames_for_eval
+ if num_frames == 1:
+ # here we handle a special case for mixing video + SAM on image training,
+ # where we force using point input for the SAM task on static images
+ prob_to_use_pt_input = 1.0
+ num_frames_to_correct = 1
+ num_init_cond_frames = 1
+ assert num_init_cond_frames >= 1
+ # (here `self.rng.random()` returns value in range 0.0 <= X < 1.0)
+ use_pt_input = self.rng.random() < prob_to_use_pt_input
+ if rand_init_cond_frames and num_init_cond_frames > 1:
+ # randomly select 1 to `num_init_cond_frames` frames as initial conditioning frames
+ num_init_cond_frames = self.rng.integers(
+ 1, num_init_cond_frames, endpoint=True
+ )
+ if (
+ use_pt_input
+ and rand_frames_to_correct
+ and num_frames_to_correct > num_init_cond_frames
+ ):
+ # randomly select `num_init_cond_frames` to `num_frames_to_correct` frames to sample
+ # correction clicks (only for the case of point input)
+ num_frames_to_correct = self.rng.integers(
+ num_init_cond_frames, num_frames_to_correct, endpoint=True
+ )
+ backbone_out["use_pt_input"] = use_pt_input
+
+ # Sample initial conditioning frames
+ if num_init_cond_frames == 1:
+ init_cond_frames = [start_frame_idx] # starting frame
+ else:
+ # starting frame + randomly selected remaining frames (without replacement)
+ init_cond_frames = [start_frame_idx] + self.rng.choice(
+ range(start_frame_idx + 1, num_frames),
+ num_init_cond_frames - 1,
+ replace=False,
+ ).tolist()
+ backbone_out["init_cond_frames"] = init_cond_frames
+ backbone_out["frames_not_in_init_cond"] = [
+ t for t in range(start_frame_idx, num_frames) if t not in init_cond_frames
+ ]
+
+ # Sample frames where we will add correction clicks on the fly
+ # based on the error between prediction and ground-truth masks
+ if not use_pt_input:
+ # no correction points will be sampled when using mask inputs
+ frames_to_add_correction_pt = []
+ elif num_frames_to_correct == num_init_cond_frames:
+ frames_to_add_correction_pt = init_cond_frames
+ else:
+ assert num_frames_to_correct > num_init_cond_frames
+ # initial cond frame + randomly selected remaining frames (without replacement)
+ extra_num = num_frames_to_correct - num_init_cond_frames
+ frames_to_add_correction_pt = (
+ init_cond_frames
+ + self.rng.choice(
+ backbone_out["frames_not_in_init_cond"], extra_num, replace=False
+ ).tolist()
+ )
+ backbone_out["frames_to_add_correction_pt"] = frames_to_add_correction_pt
+
+ return backbone_out
+
+ def _prepare_conditional_frames(self, backbone_out):
+ init_cond_frames = backbone_out["init_cond_frames"]
+ gt_masks_per_frame = backbone_out["gt_masks_per_frame"]
+ use_pt_input = backbone_out["use_pt_input"]
+
+ if self.training:
+ prob_to_use_box_input = self.prob_to_use_box_input_for_train
+ else:
+ prob_to_use_box_input = self.prob_to_use_box_input_for_eval
+
+ # Prepare mask or point inputs on initial conditioning frames
+ backbone_out["mask_inputs_per_frame"] = {} # {frame_idx: }
+ backbone_out["point_inputs_per_frame"] = {} # {frame_idx: }
+ for t in init_cond_frames:
+ if not use_pt_input:
+ backbone_out["mask_inputs_per_frame"][t] = gt_masks_per_frame[t]
+ else:
+ # During training # P(box) = prob_to_use_pt_input * prob_to_use_box_input
+ use_box_input = self.rng.random() < prob_to_use_box_input
+ if use_box_input:
+ points, labels = sample_box_points(
+ gt_masks_per_frame[t],
+ )
+ else:
+ # (here we only sample **one initial point** on initial conditioning frames from the
+ # ground-truth mask; we may sample more correction points on the fly)
+ points, labels = get_next_point(
+ gt_masks=gt_masks_per_frame[t],
+ pred_masks=None,
+ method=(
+ "uniform" if self.training else self.pt_sampling_for_eval
+ ),
+ )
+
+ point_inputs = {"point_coords": points, "point_labels": labels}
+ backbone_out["point_inputs_per_frame"][t] = point_inputs
+
+ return backbone_out
+
+ def prepare_prompt_inputs(self, backbone_out, input, start_frame_idx=0):
+ """
+ Prepare input mask, point or box prompts. Optionally, we allow tracking from
+ a custom `start_frame_idx` to the end of the video (for evaluation purposes).
+ """
+ backbone_out = self._prepare_prompt_inputs_meta(
+ backbone_out, input, start_frame_idx
+ )
+ backbone_out = self._prepare_conditional_frames(backbone_out)
+ return backbone_out
+
+ def _prepare_backbone_features(self, backbone_out):
+ """Prepare and flatten visual features (same as in MDETR_API model)."""
+
+ backbone_features = {}
+
+ for neck_k in neck_outs:
+ if neck_k not in backbone_out:
+ continue
+ neck_out = backbone_out[neck_k]
+ assert len(neck_out["backbone_fpn"]) == len(neck_out["vision_pos_enc"])
+ assert len(neck_out["backbone_fpn"]) >= self.num_feature_levels
+
+ feature_maps = neck_out["backbone_fpn"][-self.num_feature_levels :]
+ vision_pos_embeds = neck_out["vision_pos_enc"][-self.num_feature_levels :]
+
+ feat_sizes = [(x.shape[-2], x.shape[-1]) for x in vision_pos_embeds]
+ # flatten NxCxHxW to HWxNxC
+ vision_feats = [x.tensors.flatten(2).permute(2, 0, 1) for x in feature_maps]
+ vision_pos_embeds = [
+ x.flatten(2).permute(2, 0, 1) for x in vision_pos_embeds
+ ]
+ vision_masks = [x.mask for x in feature_maps]
+
+ for i, vision_mask in enumerate(vision_masks):
+ if vision_mask is not None:
+ vision_masks[i] = vision_mask.flatten(1)
+
+ backbone_features[neck_k] = {
+ "vision_feats": vision_feats,
+ "vision_pos_embeds": vision_pos_embeds,
+ "vision_masks": vision_masks,
+ "feat_sizes": feat_sizes,
+ }
+
+ return backbone_features
+
+ def _prepare_backbone_features_per_frame(
+ self,
+ img_batch,
+ img_ids,
+ *,
+ need_interactive_out: bool = False,
+ need_propagation_out: bool = False,
+ ):
+ """Compute the image backbone features on the fly for the given img_ids."""
+ # all image ids should be the same
+ assert img_ids.numel() == 1
+ unique_img_ids = img_ids
+
+ # Compute the image features on those unique image ids
+ image = img_batch.tensors[unique_img_ids]
+ image_mask = (
+ img_batch.mask[unique_img_ids] if img_batch.mask is not None else None
+ )
+
+ backbone_out = self.forward_image(
+ NestedTensor(tensors=image, mask=image_mask),
+ need_interactive_out=need_interactive_out,
+ need_propagation_out=need_propagation_out,
+ )
+
+ backbone_features = self._prepare_backbone_features(backbone_out)
+ return image, backbone_features
+
+ def _prepare_memory_conditioned_features(
+ self,
+ *,
+ frame_idx,
+ is_init_cond_frame,
+ current_vision_feats,
+ current_vision_masks,
+ current_vision_pos_embeds,
+ feat_sizes,
+ output_dict,
+ num_frames,
+ track_in_reverse=False, # tracking in reverse time order (for demo usage)
+ use_prev_mem_frame=True, # whether to condition on previous memory frames
+ multiplex_state: MultiplexState,
+ ):
+ """Fuse the current frame's visual feature map with previous memory."""
+ B = multiplex_state.num_buckets
+ # B = current_vision_feats[-1].size(1) # batch size on this frame
+ vision_feat = current_vision_feats[-1].expand(-1, B, -1)
+ vision_mask = (
+ current_vision_masks[-1].expand(-1, B, -1)
+ if current_vision_masks[-1] is not None
+ else None
+ )
+ vision_pos_embed = current_vision_pos_embeds[-1].expand(-1, B, -1)
+
+ C = self.hidden_dim
+ H, W = feat_sizes[-1] # top-level (lowest-resolution) feature size
+ device = current_vision_feats[-1].device
+ # The case of `self.num_maskmem == 0` below is primarily used for reproducing SAM on images.
+ # In this case, we skip the fusion with any memory.
+ if self.num_maskmem == 0: # Disable memory and skip fusion
+ pix_feat = vision_feat.permute(1, 2, 0).view(B, C, H, W)
+ return pix_feat
+
+ num_obj_ptr_tokens = 0
+ tpos_sign_mul = -1 if track_in_reverse else 1
+ # Step 1: condition the visual features of the current frame on previous memories
+ if not is_init_cond_frame and use_prev_mem_frame:
+ # Retrieve the memories encoded with the maskmem backbone
+ # to_cat_prompt, to_cat_prompt_mask, to_cat_prompt_pos_embed = [], [], []
+ to_cat_prompt, to_cat_prompt_pos_embed = [], []
+ if self.save_image_features:
+ to_cat_image_feat, to_cat_image_pos_embed = [], []
+ # Add conditioning frames's output first (all cond frames have t_pos=0 for
+ # when getting temporal positional embedding below)
+ assert len(output_dict["cond_frame_outputs"]) > 0
+ # Select a maximum number of temporally closest cond frames for cross attention
+ cond_outputs = output_dict["cond_frame_outputs"]
+ selected_cond_outputs, unselected_cond_outputs = select_closest_cond_frames(
+ frame_idx,
+ cond_outputs,
+ self.max_cond_frames_in_attn,
+ keep_first_cond_frame=self.keep_first_cond_frame,
+ )
+
+ t_pos_and_prevs = [
+ ((frame_idx - t) * tpos_sign_mul, out, True)
+ for t, out in selected_cond_outputs.items()
+ ]
+ # Add last (self.num_maskmem - 1) frames before current frame for non-conditioning memory
+ # the earliest one has t_pos=1 and the latest one has t_pos=self.num_maskmem-1
+ # We also allow taking the memory frame non-consecutively (with r>1), in which case
+ # we take (self.num_maskmem - 2) frames among every r-th frames plus the last frame.
+ r = 1 if self.training else self.memory_temporal_stride_for_eval
+
+ if self.use_memory_selection:
+ valid_indices = self.frame_filter(
+ output_dict, track_in_reverse, frame_idx, num_frames, r
+ )
+
+ for t_pos in range(1, self.num_maskmem):
+ t_rel = self.num_maskmem - t_pos # how many frames before current frame
+ if self.use_memory_selection:
+ if t_rel > len(valid_indices):
+ continue
+ prev_frame_idx = valid_indices[-t_rel]
+ else:
+ if t_rel == 1:
+ # for t_rel == 1, we take the last frame (regardless of r)
+ if not track_in_reverse:
+ # the frame immediately before this frame (i.e. frame_idx - 1)
+ prev_frame_idx = frame_idx - t_rel
+ else:
+ # the frame immediately after this frame (i.e. frame_idx + 1)
+ prev_frame_idx = frame_idx + t_rel
+ else:
+ # for t_rel >= 2, we take the memory frame from every r-th frames
+ if not track_in_reverse:
+ # first find the nearest frame among every r-th frames before this frame
+ # for r=1, this would be (frame_idx - 2)
+ prev_frame_idx = ((frame_idx - 2) // r) * r
+ # then seek further among every r-th frames
+ prev_frame_idx = prev_frame_idx - (t_rel - 2) * r
+ else:
+ # first find the nearest frame among every r-th frames after this frame
+ # for r=1, this would be (frame_idx + 2)
+ prev_frame_idx = -(-(frame_idx + 2) // r) * r
+ # then seek further among every r-th frames
+ prev_frame_idx = prev_frame_idx + (t_rel - 2) * r
+ out = output_dict["non_cond_frame_outputs"].get(prev_frame_idx, None)
+ if out is None:
+ # If an unselected conditioning frame is among the last (self.num_maskmem - 1)
+ # frames, we still attend to it as if it's a non-conditioning frame.
+ out = unselected_cond_outputs.get(prev_frame_idx, None)
+ t_pos_and_prevs.append((t_pos, out, False))
+
+ for t_pos, prev, is_selected_cond_frame in t_pos_and_prevs:
+ if prev is None:
+ continue # skip padding frames
+
+ feats = prev.get("maskmem_features")
+ if feats is None:
+ continue
+ # "maskmem_features" might have been offloaded to CPU in demo use cases,
+ # so we load it back to GPU (it's a no-op if it's already on GPU).
+ feats = feats.cuda(non_blocking=True)
+ if feats.dim() == 5:
+ feats = multiplex_state.demux(feats).contiguous()
+ prev["maskmem_features"] = (
+ feats.cpu() if not feats.is_cuda else feats
+ )
+
+ if feats.shape[0] == 0:
+ continue
+
+ to_cat_prompt.append(feats.flatten(2).permute(2, 0, 1))
+ # to_cat_prompt_mask.append(None)
+ # Spatial positional encoding (it might have been offloaded to CPU in eval)
+ maskmem_pos_list = prev.get("maskmem_pos_enc")
+ if not maskmem_pos_list:
+ continue
+ maskmem_enc = maskmem_pos_list[-1]
+ if maskmem_enc is None:
+ continue
+ maskmem_enc = maskmem_enc.cuda(non_blocking=True)
+ if maskmem_enc.dim() == 5:
+ maskmem_enc = multiplex_state.demux(maskmem_enc).contiguous()
+ prev["maskmem_pos_enc"][-1] = (
+ maskmem_enc.cpu() if not maskmem_enc.is_cuda else maskmem_enc
+ )
+ maskmem_enc = maskmem_enc.flatten(2).permute(2, 0, 1)
+
+ if self.use_maskmem_tpos_v2:
+ # the last of maskmem_tpos_enc is an "out-of-range" embedding
+ if t_pos <= 0 or t_pos >= self.num_maskmem:
+ tpos_enc = self.maskmem_tpos_enc[self.num_maskmem - 1]
+ else:
+ tpos_enc = self.maskmem_tpos_enc[self.num_maskmem - t_pos - 1]
+ else:
+ # cond_frame NOT temporally encoded in this setting
+ # and last of the maskmem_tpos_enc is actually an
+ # indicator for being a cond_frame
+ t = t_pos if not is_selected_cond_frame else 0
+ tpos_enc = self.maskmem_tpos_enc[self.num_maskmem - t - 1]
+
+ maskmem_enc = maskmem_enc + tpos_enc
+
+ if self.save_image_features:
+ # image features are in (HW)BC
+ image_feat = prev["image_features"].cuda()
+ image_pos_embed = prev["image_pos_enc"].cuda() + tpos_enc
+ to_cat_image_feat.append(image_feat)
+ to_cat_image_pos_embed.append(image_pos_embed)
+
+ to_cat_prompt_pos_embed.append(maskmem_enc)
+
+ # Construct the list of past object pointers
+ if self.use_obj_ptrs_in_encoder:
+ max_obj_ptrs_in_encoder = min(num_frames, self.max_obj_ptrs_in_encoder)
+ # First add those object pointers from selected conditioning frames
+ # (optionally, only include object pointers in the past during evaluation)
+ if not self.training and self.only_obj_ptrs_in_the_past_for_eval:
+ ptr_cond_outputs = {
+ t: out
+ for t, out in selected_cond_outputs.items()
+ if (t >= frame_idx if track_in_reverse else t <= frame_idx)
+ }
+ else:
+ ptr_cond_outputs = selected_cond_outputs
+ pos_and_outs_for_ptr = [
+ # Temporal pos encoding contains how far away each pointer is from current frame
+ (
+ (
+ (frame_idx - t) * tpos_sign_mul
+ if self.use_signed_tpos_enc_to_obj_ptrs
+ else abs(frame_idx - t)
+ ),
+ out,
+ True, # is_selected_cond_frame
+ )
+ for t, out in ptr_cond_outputs.items()
+ ]
+
+ # Add up to (max_obj_ptrs_in_encoder - 1) non-conditioning frames before current frame
+ for t_diff in range(1, max_obj_ptrs_in_encoder):
+ if not self.use_memory_selection:
+ t = (
+ frame_idx + t_diff
+ if track_in_reverse
+ else frame_idx - t_diff
+ )
+ if t < 0 or (num_frames is not None and t >= num_frames):
+ break
+ else:
+ if -t_diff <= -len(valid_indices):
+ break
+ t = valid_indices[-t_diff]
+
+ out = output_dict["non_cond_frame_outputs"].get(
+ t, unselected_cond_outputs.get(t, None)
+ )
+ if out is not None:
+ pos_and_outs_for_ptr.append((t_diff, out, False))
+
+ # If we have at least one object pointer, add them to the across attention
+ if len(pos_and_outs_for_ptr) > 0:
+ pos_list, out_list, is_selected_cond_frame_list = zip(
+ *pos_and_outs_for_ptr
+ )
+ # Filter out outputs that don't have obj_ptr (e.g., when object has empty mask)
+ filtered_data = [
+ (pos, out, is_cond)
+ for pos, out, is_cond in zip(
+ pos_list, out_list, is_selected_cond_frame_list
+ )
+ if "obj_ptr" in out
+ ]
+
+ # Only proceed if we have at least one valid obj_ptr
+ if len(filtered_data) > 0:
+ pos_list, out_list, is_selected_cond_frame_list = zip(
+ *filtered_data
+ )
+ # each out["obj_ptr"] is a tensor of shape (num_buckets, seq_len, C)
+ # cat object pointers along dim=0 into [ptr_seq_len, B, C] shape
+ obj_ptrs = torch.cat(
+ [out["obj_ptr"] for out in out_list], dim=1
+ ).transpose(0, 1)
+
+ # a temporal positional embedding based on how far each object pointer is from
+ # the current frame (sine embedding normalized by the max pointer num).
+ if self.add_tpos_enc_to_obj_ptrs:
+ obj_pos = self._get_tpos_enc(
+ pos_list,
+ max_abs_pos=max_obj_ptrs_in_encoder,
+ device=device,
+ )
+ else:
+ obj_pos = self._get_tpos_enc(
+ pos_list, device=device, dummy=True
+ )
+ # expand to batch size
+ obj_pos = obj_pos.unsqueeze(1).expand(-1, B, -1)
+
+ assert self.mem_dim == C, (
+ f"obj_ptrs.shape = {obj_ptrs.shape}, C = {C}"
+ )
+
+ # each frame has [bucket_size] pointers, except the first frame
+ obj_pos = obj_pos.repeat_interleave(
+ multiplex_state.multiplex_count, dim=0
+ )
+
+ to_cat_prompt.append(obj_ptrs)
+ to_cat_prompt_pos_embed.append(obj_pos)
+ # number of object pointer tokens for the encoder
+ num_obj_ptr_tokens = obj_ptrs.shape[0]
+ else:
+ # All outputs were filtered out (empty masks), no obj_ptrs available
+ num_obj_ptr_tokens = 0
+ else:
+ num_obj_ptr_tokens = 0
+ else:
+ # for initial conditioning frames, encode them without using any previous memory
+ raise NotImplementedError(
+ "Any init cond frame should have gone to _use_mask_as_output instead"
+ )
+
+ # Step 2: Concatenate the memories and forward through the transformer encoder
+ if len(to_cat_prompt) == 0:
+ # No available memory features (e.g. mask was cleared). Skip fusion and
+ # fall back to the current frame features so the object can continue to
+ # propagate as empty without raising errors.
+ pix_feat = vision_feat.permute(1, 2, 0).view(B, C, H, W)
+ return pix_feat
+
+ prompt = torch.cat(to_cat_prompt, dim=0)
+ prompt_mask = None # For now, we always masks are zeros anyways
+ prompt_pos_embed = torch.cat(to_cat_prompt_pos_embed, dim=0)
+
+ if self.save_image_features:
+ assert prompt_mask is None
+ assert vision_mask is None
+ if len(to_cat_image_feat) == 0 or len(to_cat_image_pos_embed) == 0:
+ # Memory image features were cleared; fall back to current-frame features.
+ pix_feat = vision_feat.permute(1, 2, 0).view(B, C, H, W)
+ return pix_feat
+ image_feat = torch.cat(to_cat_image_feat, dim=0)
+ image_pos_embed = torch.cat(to_cat_image_pos_embed, dim=0)
+
+ encoder_out = self.transformer.encoder(
+ image=current_vision_feats[-1],
+ src=vision_feat,
+ memory_image=image_feat,
+ memory=prompt,
+ image_pos=current_vision_pos_embeds[-1],
+ src_pos=vision_pos_embed,
+ memory_image_pos=image_pos_embed,
+ memory_pos=prompt_pos_embed,
+ num_obj_ptr_tokens=num_obj_ptr_tokens,
+ )
+ else:
+ encoder_out = self.transformer.encoder(
+ src=vision_feat,
+ src_key_padding_mask=vision_mask,
+ src_pos=vision_pos_embed,
+ prompt=prompt,
+ prompt_pos=prompt_pos_embed,
+ prompt_key_padding_mask=prompt_mask,
+ feat_sizes=feat_sizes,
+ num_obj_ptr_tokens=num_obj_ptr_tokens,
+ )
+ # reshape the output (HW)BC => BCHW
+ pix_feat_with_mem = encoder_out["memory"].permute(1, 2, 0).view(B, C, H, W)
+ return pix_feat_with_mem
+
+ def _encode_new_memory(
+ self,
+ image,
+ current_vision_feats,
+ feat_sizes,
+ pred_masks_high_res,
+ object_score_logits,
+ is_mask_from_pts,
+ *,
+ conditioning_objects: Optional[Iterable[int]] = None,
+ multiplex_state: MultiplexState,
+ ):
+ """Encode the current image and its prediction into a memory feature."""
+ B = current_vision_feats[-1].size(1) # batch size on this frame
+ C = self.hidden_dim
+ H, W = feat_sizes[-1] # top-level (lowest-resolution) feature size
+ # top-level feature, (HW)BC => BCHW
+ pix_feat = current_vision_feats[-1].permute(1, 2, 0).view(B, C, H, W)
+ if self.non_overlap_masks_for_mem_enc and not self.training:
+ # optionally, apply non-overlapping constraints to the masks (it's applied
+ # in the batch dimension and should only be used during eval, where all
+ # the objects come from the same video under batch size 1).
+ pred_masks_high_res = self._apply_non_overlapping_constraints(
+ pred_masks_high_res
+ )
+ if self.apply_sigmoid_to_mask_logits_for_mem_enc:
+ # scale the raw mask logits with a temperature before applying sigmoid
+ assert not self.binarize_mask_from_pts_for_mem_enc, (
+ "haven't been trained this way; beware of hardcoded config override"
+ )
+ binarize = self.binarize_mask_from_pts_for_mem_enc and is_mask_from_pts
+ if binarize and not self.training:
+ mask_for_mem = (pred_masks_high_res > 0).float()
+ else:
+ # apply sigmoid on the raw mask logits to turn them into range (0, 1)
+ mask_for_mem = torch.sigmoid(pred_masks_high_res)
+ # apply scale and bias terms to the sigmoid probabilities
+ if self.sigmoid_scale_for_mem_enc != 1.0:
+ mask_for_mem = mask_for_mem * self.sigmoid_scale_for_mem_enc
+ if self.sigmoid_bias_for_mem_enc != 0.0:
+ mask_for_mem = mask_for_mem + self.sigmoid_bias_for_mem_enc
+ else:
+ mask_for_mem = pred_masks_high_res
+
+ if self.add_object_conditional_embeddings or self.condition_as_mask_input:
+ # figure out the set of objects that are "conditional" on this frame
+ if conditioning_objects is None:
+ conditioning_objects = []
+ unconditioning_objects = sorted(
+ list(multiplex_state.get_all_valid_object_idx())
+ )
+ else:
+ conditioning_objects = sorted(list(conditioning_objects))
+ all_objects_idx = multiplex_state.get_all_valid_object_idx()
+ unconditioning_objects = sorted(
+ [i for i in all_objects_idx if i not in conditioning_objects]
+ )
+
+ mux_mask_for_mem = multiplex_state.mux(mask_for_mem).squeeze(2)
+
+ if self.condition_as_mask_input:
+ # create num_objects channels spatial features that encode the
+ # list of objects that are conditional with fg and bg values
+ num_objects = mask_for_mem.shape[0]
+ # Create a 1D conditioning mask on GPU and broadcast it
+ cond_values = torch.full(
+ (num_objects,),
+ self.condition_as_mask_input_bg,
+ device=mask_for_mem.device,
+ dtype=mask_for_mem.dtype,
+ )
+ if len(conditioning_objects) > 0:
+ cond_values[conditioning_objects] = self.condition_as_mask_input_fg
+ # Broadcast to full spatial dimensions: [N] -> [N, 1, H, W]
+ embedded_conditions = cond_values.view(-1, 1, 1, 1).expand_as(mask_for_mem)
+ embedded_conditions = multiplex_state.mux(embedded_conditions).squeeze(2)
+
+ mux_mask_for_mem = torch.cat([mux_mask_for_mem, embedded_conditions], dim=1)
+
+ if isinstance(self.maskmem_backbone, SimpleMaskEncoder):
+ maskmem_out = self.maskmem_backbone(
+ pix_feat,
+ mux_mask_for_mem,
+ skip_mask_sigmoid=True,
+ )
+ else:
+ maskmem_out = self.maskmem_backbone(image, pix_feat, mux_mask_for_mem)
+ # Clone the feats and pos_enc to enable compilation
+ maskmem_features = self._maybe_clone(maskmem_out["vision_features"])
+ maskmem_pos_enc = [self._maybe_clone(m) for m in maskmem_out["vision_pos_enc"]]
+
+ if self.no_obj_embed_spatial is not None:
+ # since maskmem_features are deeply detangled between objects
+ # we simply add a projected embedding for each empty object
+ # num_buckets * multiplex_count * C
+ no_obj_embed_spatial = self.no_obj_embed_spatial.unsqueeze(0).repeat(
+ multiplex_state.num_buckets, 1, 1
+ )
+ # Align object_score_logits length to multiplex expectations before mux
+ if object_score_logits is not None:
+ obj_expected = multiplex_state.total_valid_entries
+ obj_current = object_score_logits.shape[0]
+ if obj_current != obj_expected:
+ if obj_current < obj_expected:
+ pad_shape = (obj_expected - obj_current,) + tuple(
+ object_score_logits.shape[1:]
+ )
+ obj_pad = object_score_logits.new_zeros(pad_shape)
+ object_score_logits = torch.cat(
+ [object_score_logits, obj_pad], dim=0
+ )
+ else:
+ object_score_logits = object_score_logits[:obj_expected]
+ object_score_logits = multiplex_state.mux(object_score_logits)
+ is_obj_appearing = (
+ object_score_logits > self.object_score_logit_threshold
+ ).float()
+
+ no_obj_embed = ((1 - is_obj_appearing) * no_obj_embed_spatial).sum(dim=1)
+ maskmem_features += no_obj_embed[..., None, None].expand_as(
+ maskmem_features
+ )
+
+ if self.add_object_conditional_embeddings:
+ # add object conditional embeddings to the maskmem_features
+ # num_buckets * multiplex_count * C
+ obj_cond_embed = self.obj_cond_embed.unsqueeze(0).repeat(
+ multiplex_state.num_buckets, 1, 1
+ )
+ obj_cond_embed = multiplex_state.demux(obj_cond_embed)
+ obj_merged_embed = obj_cond_embed
+
+ if self.add_object_unconditional_embeddings:
+ obj_non_cond_embed = self.obj_non_cond_embed.unsqueeze(0).repeat(
+ multiplex_state.num_buckets, 1, 1
+ )
+ obj_non_cond_embed = multiplex_state.demux(obj_non_cond_embed)
+ if self.training:
+ obj_merged_embed = obj_merged_embed.clone()
+ obj_merged_embed[unconditioning_objects] = obj_non_cond_embed[
+ unconditioning_objects
+ ]
+
+ obj_merged_embed = multiplex_state.mux(obj_merged_embed).sum(dim=1)
+ maskmem_features = maskmem_features + obj_merged_embed[
+ ..., None, None
+ ].expand_as(maskmem_features)
+
+ if maskmem_features.dim() == 5:
+ maskmem_features = multiplex_state.demux(maskmem_features).contiguous()
+
+ demuxed_pos_enc = []
+ for pos_enc in maskmem_pos_enc:
+ pos_enc_clone = pos_enc
+ if pos_enc_clone is not None and pos_enc_clone.dim() == 5:
+ pos_enc_clone = multiplex_state.demux(pos_enc_clone).contiguous()
+ demuxed_pos_enc.append(pos_enc_clone)
+ maskmem_pos_enc = demuxed_pos_enc
+
+ return maskmem_features, maskmem_pos_enc
+
+ def forward_tracking(
+ self,
+ backbone_out,
+ input,
+ return_dict=False,
+ objects_to_interact: Optional[list[int]] = None,
+ ):
+ """Forward video tracking on each frame (and sample correction clicks)."""
+ img_feats_already_computed = (
+ "interactive" in backbone_out or "sam2_backbone_out" in backbone_out
+ )
+ if img_feats_already_computed:
+ # Prepare the backbone features
+ # - vision_feats and vision_pos_embeds are in (HW)BC format
+ # - vision_masks are in B(HW) format, dtype=bool (False is valid, True is padding)
+ backbone_features = self._prepare_backbone_features(backbone_out)
+
+ # Starting the stage loop
+ num_frames = backbone_out["num_frames"]
+ init_cond_frames = backbone_out["init_cond_frames"]
+ frames_to_add_correction_pt = backbone_out["frames_to_add_correction_pt"]
+ # first process all the initial conditioning frames to encode them as memory,
+ # and then conditioning on them to track the remaining frames
+ processing_order = init_cond_frames + backbone_out["frames_not_in_init_cond"]
+
+ cond_frame_outputs: dict[int, StageOutput] = {}
+ non_cond_frame_outputs: dict[int, StageOutput] = {}
+ output_dict = {
+ "cond_frame_outputs": cond_frame_outputs,
+ "non_cond_frame_outputs": non_cond_frame_outputs,
+ }
+
+ multiplex_state = self.multiplex_controller.get_state(
+ backbone_out["gt_masks_per_frame"][0].shape[0],
+ device=backbone_out["gt_masks_per_frame"][0].device,
+ dtype=torch.float,
+ random=self.training,
+ )
+
+ for stage_id in processing_order:
+ # Get the image features for the current frames
+ img_ids = input.find_inputs[stage_id].img_ids
+ # the image ids are for the entire batch
+ assert all(
+ [img_id == img_ids[0] for img_id in img_ids]
+ ) # should be all the same
+ # force this to have a batch size of 1
+ img_ids = torch.tensor(
+ [img_ids[0]], device=img_ids.device, dtype=img_ids.dtype
+ )
+
+ if img_feats_already_computed:
+ # Retrieve image features according to img_ids (if they are already computed).
+ current_image = input.img_batch.tensors[img_ids]
+ current_backbone_features = {}
+ for neck_k, neck_out in backbone_features.items():
+ current_backbone_features[neck_k] = {
+ "vision_feats": [
+ x[:, img_ids] for x in neck_out["vision_feats"]
+ ],
+ "vision_masks": [
+ x[img_ids] if x is not None else None
+ for x in neck_out["vision_masks"]
+ ],
+ "vision_pos_embeds": [
+ x[:, img_ids] for x in neck_out["vision_pos_embeds"]
+ ],
+ "feat_sizes": neck_out["feat_sizes"],
+ }
+ else:
+ # Otherwise, compute the image features on the fly for the given img_ids
+ # (this might be used for evaluation on long videos to avoid backbone OOM).
+ need_interactive_out = (stage_id in frames_to_add_correction_pt) or (
+ stage_id in init_cond_frames
+ )
+ (current_image, current_backbone_features) = (
+ self._prepare_backbone_features_per_frame(
+ input.img_batch,
+ img_ids,
+ need_interactive_out=need_interactive_out,
+ need_propagation_out=True,
+ )
+ )
+
+ # Get output masks based on this frame's prompts and previous memory
+ current_out = self.track_step(
+ frame_idx=stage_id,
+ is_init_cond_frame=stage_id in init_cond_frames,
+ backbone_features_interactive=current_backbone_features.get(
+ "interactive"
+ ),
+ backbone_features_propagation=current_backbone_features.get(
+ "sam2_backbone_out"
+ ),
+ image=current_image,
+ point_inputs=backbone_out["point_inputs_per_frame"].get(stage_id, None),
+ mask_inputs=backbone_out["mask_inputs_per_frame"].get(stage_id, None),
+ gt_masks=backbone_out["gt_masks_per_frame"].get(stage_id, None),
+ frames_to_add_correction_pt=frames_to_add_correction_pt,
+ output_dict=output_dict,
+ num_frames=num_frames,
+ multiplex_state=multiplex_state,
+ objects_to_interact=objects_to_interact,
+ )
+ # Append the output, depending on whether it's a conditioning frame
+ add_output_as_cond_frame = stage_id in init_cond_frames or (
+ self.add_all_frames_to_correct_as_cond
+ and stage_id in frames_to_add_correction_pt
+ )
+ if add_output_as_cond_frame:
+ output_dict["cond_frame_outputs"][stage_id] = current_out
+ else:
+ output_dict["non_cond_frame_outputs"][stage_id] = current_out
+
+ output_dict["multiplex_state"] = multiplex_state
+
+ if return_dict:
+ return output_dict
+ # turn `output_dict` into a list for loss function
+ all_frame_outputs = {}
+ all_frame_outputs.update(output_dict["cond_frame_outputs"])
+ all_frame_outputs.update(output_dict["non_cond_frame_outputs"])
+ all_frame_outputs = [all_frame_outputs[t] for t in range(num_frames)]
+ # Make DDP happy with activation checkpointing by removing unused keys
+ all_frame_outputs = [
+ {k: v for k, v in d.items() if k != "obj_ptr"} for d in all_frame_outputs
+ ]
+
+ return all_frame_outputs
+
+ def _track_step_aux(
+ self,
+ *,
+ frame_idx,
+ is_init_cond_frame,
+ backbone_features_interactive,
+ backbone_features_propagation,
+ image,
+ point_inputs,
+ mask_inputs,
+ gt_masks,
+ frames_to_add_correction_pt,
+ output_dict,
+ num_frames,
+ track_in_reverse=False, # tracking in reverse time order (for demo usage)
+ run_mem_encoder=True,
+ prev_sam_mask_logits=None,
+ multiplex_state: MultiplexState,
+ objects_to_interact: Optional[list[int]] = None,
+ need_aux_output: bool = False,
+ ) -> tuple[StageOutput, dict]:
+ """
+ There are four different modes that track_step might enter, based on the inputs
+ 1. Mask-as-output. This is when mask_inputs is not None.
+ The input mask is returned directly. This case is for FA/VOS initialization.
+ 2. Propagation-only. This is when mask_inputs and point_inputs are empty.
+ We propagate masks using the memory only. This case is for VOS propagation.
+ 3. Interaction-only. This is when mask_inputs is None, point_inputs is not None,
+ and one of the followings is satisified:
+ a) prev_sam_mask_logits is not None. In this case, we refine prev_sam_mask_logits
+ with additional interactions, updating only the objects specified in objects_to_interact.
+ objects_to_interact must not be None.
+ This occurs when we refine the same frame with multiple point inputs iteratively.
+ b) prev_sam_mask_logits is None, and is_init_cond_frame is True.
+ This case is for initializing the first frame. All objects will have point inputs.
+ This mostly happens during training/interactive eval.
+ 4. Propagation-and-interaction. This is when mask_inputs is None, point_inputs is not None,
+ prev_sam_mask_logits is None, and objects_to_interact is not None.
+ This is when we are propagating to a new frame that has point inputs (from previous interactions).
+ This is more of an edge case that could happen in offline interactive eval.
+ We first propagate the mask to the current frame, and then perform interaction on the selected
+ objects. Finally, we replace the masks of the interacted objects in the propagated output
+ with the masks from the interaction output.
+ """
+ current_out: StageOutput = {
+ "conditioning_objects": set(),
+ "point_inputs": point_inputs,
+ "mask_inputs": mask_inputs,
+ }
+
+ mode = None
+ if mask_inputs is not None:
+ mode = "mask_as_output"
+ elif point_inputs is None:
+ mode = "propagation_only"
+ elif point_inputs is not None:
+ # Case 3a: Refining existing predictions
+ if prev_sam_mask_logits is not None:
+ assert objects_to_interact is not None, (
+ "objects_to_interact must be specified when refining with prev_sam_mask_logits"
+ )
+ mode = "interaction_only"
+ # Case 3b: Initial conditioning frame
+ elif is_init_cond_frame:
+ mode = "interaction_only"
+ # Case 4: Propagation then interaction
+ elif objects_to_interact is not None and prev_sam_mask_logits is None:
+ assert not self.training
+ mode = "propagation_and_interaction"
+
+ if mode is None:
+ raise ValueError(
+ f"Unable to determine tracking case. "
+ f"mask_inputs={mask_inputs is not None}, "
+ f"point_inputs={point_inputs is not None}, "
+ f"prev_sam_mask_logits={prev_sam_mask_logits is not None}, "
+ f"objects_to_interact={objects_to_interact}, "
+ f"is_init_cond_frame={is_init_cond_frame}"
+ )
+ # partition the backbone features
+ interactive_high_res_features = interactive_vision_feats = None
+ interactive_feat_sizes = None
+ if backbone_features_interactive is not None:
+ interactive_vision_feats = backbone_features_interactive["vision_feats"]
+ interactive_feat_sizes = backbone_features_interactive["feat_sizes"]
+
+ # High-resolution feature maps for the SAM head, reshape (HW)BC => BCHW
+ if len(interactive_vision_feats) > 1:
+ interactive_high_res_features = [
+ x.permute(1, 2, 0).view(x.size(1), x.size(2), *s)
+ for x, s in zip(
+ interactive_vision_feats[:-1], interactive_feat_sizes[:-1]
+ )
+ ]
+ else:
+ # cannot do point interaction without interactive features
+ assert mode not in ["interaction_only", "propagation_and_interaction"]
+
+ propagation_high_res_features = propagation_vision_feats = None
+ propagation_vision_masks = None
+ propagation_vision_pos_embeds = propagation_feat_sizes = None
+ if backbone_features_propagation is not None:
+ propagation_vision_feats = backbone_features_propagation["vision_feats"]
+ propagation_vision_masks = backbone_features_propagation["vision_masks"]
+ propagation_vision_pos_embeds = backbone_features_propagation[
+ "vision_pos_embeds"
+ ]
+ propagation_feat_sizes = backbone_features_propagation["feat_sizes"]
+
+ # High-resolution feature maps for the SAM head, reshape (HW)BC => BCHW
+ if len(propagation_vision_feats) > 1:
+ propagation_high_res_features = [
+ x.permute(1, 2, 0).view(x.size(1), x.size(2), *s)
+ for x, s in zip(
+ propagation_vision_feats[:-1], propagation_feat_sizes[:-1]
+ )
+ ]
+ else:
+ # we can get away without propagation features if we are interacting and not encoding new memory
+ assert mode not in ["propagation_only", "propagation_and_interaction"]
+ assert not run_mem_encoder
+
+ interactive_pix_feat = None
+ if mode == "mask_as_output":
+ # simple encoding
+ assert self.use_mask_input_as_output_without_sam
+ # pix_feat = interactive_vision_feats[-1].permute(1, 2, 0)
+ # pix_feat = pix_feat.view(-1, self.hidden_dim, *interactive_feat_sizes[-1])
+ # use no_mem_embed here as well to better align first-frame mask input vs point input
+ interactive_pix_feat = self._get_interactive_pix_mem(
+ interactive_vision_feats, interactive_feat_sizes
+ )
+ sam_outputs = self._use_mask_as_output(
+ backbone_features=interactive_pix_feat,
+ high_res_features=interactive_high_res_features,
+ mask_inputs=mask_inputs,
+ multiplex_state=multiplex_state,
+ )
+ # all the objects are conditional here
+ current_out["conditioning_objects"].update(range(mask_inputs.shape[0]))
+ else:
+ # propagation, interaction, or both
+ propagation_out = None
+ if mode in ["propagation_only", "propagation_and_interaction"]:
+ # gather the memory
+ assert backbone_features_propagation is not None
+ assert propagation_vision_feats is not None
+ assert propagation_vision_masks is not None
+ assert propagation_vision_pos_embeds is not None
+ assert propagation_feat_sizes is not None
+ pix_feat_with_mem = self._prepare_memory_conditioned_features(
+ frame_idx=frame_idx,
+ is_init_cond_frame=is_init_cond_frame,
+ current_vision_feats=propagation_vision_feats[-1:],
+ current_vision_masks=propagation_vision_masks[-1:],
+ current_vision_pos_embeds=propagation_vision_pos_embeds[-1:],
+ feat_sizes=propagation_feat_sizes[-1:],
+ output_dict=output_dict,
+ num_frames=num_frames,
+ track_in_reverse=track_in_reverse,
+ multiplex_state=multiplex_state,
+ )
+
+ # propagate the mask
+ # this is the propagation step; do not consider point_inputs here
+ multimask_output = self._use_multimask(
+ is_init_cond_frame, point_inputs=None
+ )
+ propagation_out = self._forward_sam_heads(
+ backbone_features=pix_feat_with_mem,
+ propagation_high_res_features=propagation_high_res_features,
+ multimask_output=multimask_output,
+ objects_to_interact=list(
+ range(multiplex_state.total_valid_entries)
+ ),
+ multiplex_state=multiplex_state,
+ )
+
+ interaction_out = None
+ if mode in ["interaction_only", "propagation_and_interaction"]:
+ assert backbone_features_interactive is not None
+ assert interactive_vision_feats is not None
+ assert interactive_feat_sizes is not None
+ interactive_pix_feat = self._get_interactive_pix_mem(
+ interactive_vision_feats, interactive_feat_sizes
+ )
+
+ # apply SAM-style segmentation head
+ # here we might feed previously predicted low-res SAM mask logits into the SAM mask decoder,
+ # e.g. in demo where such logits come from earlier interaction instead of correction sampling
+ # (in this case, the SAM mask decoder should have `self.iter_use_prev_mask_pred=True`, and
+ # any `mask_inputs` shouldn't reach here as they are sent to _use_mask_as_output instead)
+ assert mask_inputs is None and point_inputs is not None
+ if prev_sam_mask_logits is not None:
+ assert objects_to_interact is not None
+ assert self.iter_use_prev_mask_pred
+ assert mode != "propagation_and_interaction"
+ mask_inputs = prev_sam_mask_logits[objects_to_interact]
+ elif mode == "propagation_and_interaction":
+ # use propagated masks as mask input
+ assert objects_to_interact is not None
+ assert propagation_out is not None
+ mask_inputs = propagation_out["low_res_masks"][objects_to_interact]
+
+ if objects_to_interact is not None:
+ assert point_inputs["point_coords"].shape[0] == len(
+ objects_to_interact
+ )
+ assert point_inputs["point_labels"].shape[0] == len(
+ objects_to_interact
+ )
+
+ multimask_output = self._use_multimask(
+ is_init_cond_frame, point_inputs=point_inputs
+ )
+ interaction_out = self._forward_sam_heads(
+ backbone_features=interactive_pix_feat,
+ point_inputs=point_inputs,
+ mask_inputs=mask_inputs,
+ interactive_high_res_features=interactive_high_res_features,
+ multimask_output=multimask_output,
+ objects_to_interact=(
+ objects_to_interact
+ if objects_to_interact is not None
+ else list(range(multiplex_state.total_valid_entries))
+ ),
+ multiplex_state=multiplex_state,
+ )
+ if objects_to_interact is None:
+ current_out["conditioning_objects"].update(
+ multiplex_state.get_all_valid_object_idx()
+ )
+ else:
+ current_out["conditioning_objects"].update(objects_to_interact)
+
+ if propagation_out is None and interaction_out is not None:
+ sam_outputs = interaction_out
+ elif interaction_out is None and propagation_out is not None:
+ sam_outputs = propagation_out
+ else:
+ # merge the output
+ assert propagation_out is not None and interaction_out is not None
+ keys_to_merge = [
+ "low_res_multimasks",
+ "high_res_multimasks",
+ "low_res_masks",
+ "high_res_masks",
+ "ious",
+ "object_score_logits",
+ "obj_ptr",
+ ]
+ for k in keys_to_merge:
+ src = interaction_out[k]
+ dst = propagation_out[k]
+ # Align dtype for floating tensors before indexed assignment
+ if torch.is_tensor(src) and torch.is_tensor(dst):
+ if torch.is_floating_point(src) and src.dtype != dst.dtype:
+ src = src.to(dtype=dst.dtype)
+ propagation_out[k][objects_to_interact] = src
+ sam_outputs = propagation_out
+
+ low_res_multimasks = sam_outputs["low_res_multimasks"]
+ high_res_multimasks = sam_outputs["high_res_multimasks"]
+ ious = sam_outputs["ious"]
+ low_res_masks = sam_outputs["low_res_masks"]
+ high_res_masks = sam_outputs["high_res_masks"]
+ object_score_logits = sam_outputs["object_score_logits"]
+
+ current_out["multistep_pred_masks"] = low_res_masks
+ current_out["multistep_pred_masks_high_res"] = high_res_masks
+ current_out["multistep_pred_multimasks"] = [low_res_multimasks]
+ current_out["multistep_pred_multimasks_high_res"] = [high_res_multimasks]
+ current_out["multistep_pred_ious"] = [ious]
+ current_out["multistep_point_inputs"] = [point_inputs]
+ current_out["multistep_object_score_logits"] = [object_score_logits]
+
+ if self.use_obj_ptrs_in_encoder:
+ obj_ptr = sam_outputs["obj_ptr"]
+
+ # Optionally, sample correction points iteratively to correct the mask
+ if frame_idx in frames_to_add_correction_pt:
+ assert gt_masks is not None
+ assert interactive_vision_feats is not None
+ assert interactive_feat_sizes is not None
+ all_pred_masks = [low_res_masks]
+ all_pred_high_res_masks = [high_res_masks]
+ all_pred_multimasks = [low_res_multimasks]
+ all_pred_high_res_multimasks = [high_res_multimasks]
+ all_pred_ious = [ious]
+ all_point_inputs = [point_inputs]
+ all_object_score_logits = [object_score_logits]
+
+ # select a subset of objects to interact with
+ if self.training:
+ assert objects_to_interact is None
+
+ interact_with_all_objects = (
+ self.rng.random() < self.prob_correct_all_objects_for_train
+ ) or (
+ self.force_correct_all_for_conditional_inputs and is_init_cond_frame
+ )
+
+ if interact_with_all_objects:
+ num_objects_to_correct = gt_masks.shape[0]
+ elif self.rand_objects_to_correct_for_train:
+ num_objects_to_correct = self.rng2.integers(
+ 1,
+ int(
+ gt_masks.shape[0]
+ * self.ratio_of_objects_to_correct_for_train
+ )
+ + 1,
+ )
+ else:
+ num_objects_to_correct = max(
+ 1,
+ int(
+ gt_masks.shape[0]
+ * self.ratio_of_objects_to_correct_for_train
+ ),
+ )
+
+ objects_to_interact = self.rng2.choice(
+ range(gt_masks.shape[0]),
+ size=num_objects_to_correct,
+ replace=False,
+ ).tolist()
+
+ if point_inputs is not None:
+ # don't modify the point inputs in-place
+ point_inputs = {
+ "point_coords": point_inputs["point_coords"][
+ objects_to_interact
+ ],
+ "point_labels": point_inputs["point_labels"][
+ objects_to_interact
+ ],
+ }
+ else:
+ assert objects_to_interact is not None
+ # the point inputs should have been preselected, i.e., the following assertion should hold
+
+ if point_inputs is not None:
+ assert point_inputs["point_coords"].shape[0] == len(objects_to_interact)
+ assert point_inputs["point_labels"].shape[0] == len(objects_to_interact)
+
+ for _ in range(self.num_correction_pt_per_frame):
+ # sample a new point from the error between prediction and ground-truth
+ # (with a small probability, directly sample from GT masks instead of errors)
+ if self.training and self.prob_to_sample_from_gt_for_train > 0:
+ sample_from_gt = (
+ self.rng.random() < self.prob_to_sample_from_gt_for_train
+ )
+ else:
+ sample_from_gt = False
+ # if `pred_for_new_pt` is None, only GT masks will be used for point sampling
+ pred_for_new_pt = None if sample_from_gt else (high_res_masks > 0)
+ new_points, new_labels = get_next_point(
+ gt_masks=gt_masks[objects_to_interact],
+ pred_masks=(
+ pred_for_new_pt[objects_to_interact]
+ if pred_for_new_pt is not None
+ else None
+ ),
+ method="uniform" if self.training else self.pt_sampling_for_eval,
+ )
+ point_inputs = concat_points(point_inputs, new_points, new_labels)
+ assert low_res_masks.shape[0] > max(objects_to_interact), (
+ f"interacting {objects_to_interact} in {low_res_masks.shape}?"
+ )
+ if self.iter_use_prev_mask_pred:
+ # Feed the mask logits of the previous SAM outputs in the next SAM decoder step.
+ # For tracking, this means that when the user adds a correction click, we also feed
+ # the tracking output mask logits along with the click as input to the SAM decoder.
+ mask_inputs = low_res_masks[objects_to_interact]
+ multimask_output = self._use_multimask(is_init_cond_frame, point_inputs)
+ pix_feat_with_mem = self._get_interactive_pix_mem(
+ interactive_vision_feats, interactive_feat_sizes
+ )
+ sam_outputs = self._forward_sam_heads(
+ backbone_features=pix_feat_with_mem,
+ point_inputs=point_inputs,
+ mask_inputs=mask_inputs,
+ interactive_high_res_features=interactive_high_res_features,
+ propagation_high_res_features=propagation_high_res_features,
+ multimask_output=multimask_output,
+ gt_masks=gt_masks,
+ objects_to_interact=objects_to_interact,
+ multiplex_state=multiplex_state,
+ )
+ interact_low_res_multimasks = sam_outputs["low_res_multimasks"]
+ interact_high_res_multimasks = sam_outputs["high_res_multimasks"]
+ interact_ious = sam_outputs["ious"]
+ interact_low_res_masks = sam_outputs["low_res_masks"]
+ interact_high_res_masks = sam_outputs["high_res_masks"]
+ interact_object_score_logits = sam_outputs["object_score_logits"]
+ if self.use_obj_ptrs_in_encoder:
+ interact_obj_ptr = sam_outputs["obj_ptr"]
+
+ if self.training:
+ # combine the masks from the interacted and non-interacted objects
+ low_res_masks = low_res_masks.clone()
+ high_res_masks = high_res_masks.clone()
+ low_res_multimasks = low_res_multimasks.clone()
+ high_res_multimasks = high_res_multimasks.clone()
+ ious = ious.clone()
+ object_score_logits = object_score_logits.clone()
+ obj_ptr = obj_ptr.clone() if self.use_obj_ptrs_in_encoder else None
+
+ # Update masks for the interacted objects
+ if (
+ torch.is_floating_point(interact_low_res_masks)
+ and interact_low_res_masks.dtype != low_res_masks.dtype
+ ):
+ interact_low_res_masks = interact_low_res_masks.to(
+ dtype=low_res_masks.dtype
+ )
+ low_res_masks[objects_to_interact] = interact_low_res_masks
+ if (
+ torch.is_floating_point(interact_high_res_masks)
+ and interact_high_res_masks.dtype != high_res_masks.dtype
+ ):
+ interact_high_res_masks = interact_high_res_masks.to(
+ dtype=high_res_masks.dtype
+ )
+ high_res_masks[objects_to_interact] = interact_high_res_masks
+ if (
+ torch.is_floating_point(interact_low_res_multimasks)
+ and interact_low_res_multimasks.dtype != low_res_multimasks.dtype
+ ):
+ interact_low_res_multimasks = interact_low_res_multimasks.to(
+ dtype=low_res_multimasks.dtype
+ )
+ low_res_multimasks[objects_to_interact] = interact_low_res_multimasks
+ if (
+ torch.is_floating_point(interact_high_res_multimasks)
+ and interact_high_res_multimasks.dtype != high_res_multimasks.dtype
+ ):
+ interact_high_res_multimasks = interact_high_res_multimasks.to(
+ dtype=high_res_multimasks.dtype
+ )
+ high_res_multimasks[objects_to_interact] = interact_high_res_multimasks
+ if (
+ torch.is_floating_point(interact_ious)
+ and interact_ious.dtype != ious.dtype
+ ):
+ interact_ious = interact_ious.to(dtype=ious.dtype)
+ ious[objects_to_interact] = interact_ious
+ if (
+ torch.is_floating_point(interact_object_score_logits)
+ and interact_object_score_logits.dtype != object_score_logits.dtype
+ ):
+ interact_object_score_logits = interact_object_score_logits.to(
+ dtype=object_score_logits.dtype
+ )
+ object_score_logits[objects_to_interact] = interact_object_score_logits
+ if self.use_obj_ptrs_in_encoder:
+ obj_ptr[objects_to_interact] = interact_obj_ptr
+
+ all_pred_masks.append(low_res_masks)
+ all_pred_high_res_masks.append(high_res_masks)
+ all_pred_multimasks.append(low_res_multimasks)
+ all_pred_high_res_multimasks.append(high_res_multimasks)
+ all_pred_ious.append(ious)
+ all_point_inputs.append(point_inputs)
+ all_object_score_logits.append(object_score_logits)
+
+ # Concatenate the masks along channel (to compute losses on all of them,
+ # using `onevision.losses.loss_fns.MultiStepIteractiveMasks`)
+ current_out["multistep_pred_masks"] = torch.cat(all_pred_masks, dim=1)
+ current_out["multistep_pred_masks_high_res"] = torch.cat(
+ all_pred_high_res_masks, dim=1
+ )
+ current_out["multistep_pred_multimasks"] = all_pred_multimasks
+ current_out["multistep_pred_multimasks_high_res"] = (
+ all_pred_high_res_multimasks
+ )
+ current_out["multistep_pred_ious"] = all_pred_ious
+ current_out["multistep_point_inputs"] = all_point_inputs
+ current_out["multistep_object_score_logits"] = all_object_score_logits
+
+ if self.add_all_frames_to_correct_as_cond:
+ if objects_to_interact is None:
+ current_out["conditioning_objects"].update(
+ multiplex_state.get_all_valid_object_idx()
+ )
+ else:
+ current_out["conditioning_objects"].update(set(objects_to_interact))
+
+ # Use the final prediction (after all correction steps for output and eval)
+ current_out["pred_masks"] = low_res_masks
+ current_out["pred_masks_high_res"] = high_res_masks
+ if self.use_obj_ptrs_in_encoder:
+ # similar to spatial memory, the object pointers are stored with multiplex
+ current_out["obj_ptr"] = multiplex_state.mux(obj_ptr)
+ if self.use_memory_selection:
+ current_out["object_score_logits"] = object_score_logits
+ iou_score = current_out["multistep_pred_ious"][-1].max(-1)[0]
+ current_out["iou_score"] = iou_score
+ current_out["eff_iou_score"] = self.cal_mem_score(
+ object_score_logits, iou_score
+ )
+ # we need to return this for encoding new masks in the dynamic mode
+ current_out["object_score_logits"] = object_score_logits
+
+ # Finally run the memory encoder on the predicted mask to encode
+ # it into a new memory feature (that can be used in future frames)
+ # (note that `self.num_maskmem == 0` is primarily used for reproducing SAM on
+ # images, in which case we'll just skip memory encoder to save compute).
+ if run_mem_encoder and self.num_maskmem > 0:
+ high_res_masks_for_mem_enc = high_res_masks
+ maskmem_features, maskmem_pos_enc = self._encode_new_memory(
+ image=image,
+ current_vision_feats=propagation_vision_feats,
+ feat_sizes=propagation_feat_sizes,
+ pred_masks_high_res=high_res_masks_for_mem_enc,
+ object_score_logits=object_score_logits,
+ is_mask_from_pts=(point_inputs is not None),
+ conditioning_objects=current_out["conditioning_objects"],
+ multiplex_state=multiplex_state,
+ )
+ current_out["maskmem_features"] = maskmem_features
+ current_out["maskmem_pos_enc"] = maskmem_pos_enc
+
+ if self.save_image_features:
+ current_out["image_features"] = propagation_vision_feats[-1]
+ current_out["image_pos_enc"] = propagation_vision_pos_embeds[-1]
+
+ # this is to avoid recomputing some of these features for add_new_masks_to_existing_state
+ aux_output = {}
+ if need_aux_output:
+ if interactive_pix_feat is None:
+ interactive_pix_feat = self._get_interactive_pix_mem(
+ interactive_vision_feats, interactive_feat_sizes
+ )
+ aux_output["interactive_pix_feat"] = interactive_pix_feat
+ aux_output["interactive_high_res_features"] = interactive_high_res_features
+ aux_output["propagation_vision_feats"] = propagation_vision_feats
+ aux_output["propagation_feat_sizes"] = propagation_feat_sizes
+
+ return current_out, aux_output
+
+ def _trim_output_and_memory(
+ self,
+ frame_idx: int,
+ output_dict: dict[str, dict[int, StageOutput]],
+ current_out: StageOutput,
+ memory_encoder_was_used: bool,
+ ) -> StageOutput:
+ # Optionally, offload the outputs to CPU memory during evaluation to avoid
+ # GPU OOM on very long videos or very large resolution or too many objects
+ if self.offload_output_to_cpu_for_eval and not self.training:
+ # Here we only keep those keys needed for evaluation to get a compact output
+ trimmed_out: StageOutput = {
+ "conditioning_objects": current_out["conditioning_objects"],
+ "pred_masks": current_out["pred_masks"].cpu(),
+ "pred_masks_high_res": current_out["pred_masks_high_res"].cpu(),
+ # other items for evaluation (these are small tensors so we keep them on GPU)
+ "object_score_logits": current_out["object_score_logits"],
+ "multistep_point_inputs": current_out["multistep_point_inputs"],
+ }
+ if self.use_obj_ptrs_in_encoder:
+ trimmed_out["obj_ptr"] = current_out["obj_ptr"]
+ if memory_encoder_was_used and self.num_maskmem > 0:
+ trimmed_out["maskmem_features"] = current_out["maskmem_features"].cpu()
+ trimmed_out["maskmem_pos_enc"] = [
+ x.cpu() for x in current_out["maskmem_pos_enc"]
+ ]
+ if self.save_image_features:
+ trimmed_out["image_features"] = current_out["image_features"].cpu()
+ trimmed_out["image_pos_enc"] = current_out["image_pos_enc"].cpu()
+ current_out = trimmed_out
+
+ # Optionally, trim the output of past non-conditioning frame (r * num_maskmem frames
+ # before the current frame) during evaluation. This is intended to save GPU or CPU
+ # memory for semi-supervised VOS eval, where only the first frame receives prompts.
+ def _trim_past_out(
+ past_out: StageOutput, current_out: StageOutput
+ ) -> Optional[StageOutput]:
+ if past_out is None:
+ return None
+ trimmed_past_out: StageOutput = {
+ "conditioning_objects": past_out["conditioning_objects"],
+ "pred_masks": past_out["pred_masks"],
+ "object_score_logits": past_out["object_score_logits"],
+ # Why would this be current_out?
+ # "multistep_point_inputs": current_out["multistep_point_inputs"],
+ "multistep_point_inputs": past_out["multistep_point_inputs"],
+ }
+ if self.use_obj_ptrs_in_encoder:
+ trimmed_past_out["obj_ptr"] = past_out["obj_ptr"]
+ return trimmed_past_out
+
+ if self.trim_past_non_cond_mem_for_eval and not self.training:
+ r = self.memory_temporal_stride_for_eval
+ past_frame_idx = frame_idx - r * self.num_maskmem
+ past_out = output_dict["non_cond_frame_outputs"].get(past_frame_idx, None)
+
+ if past_out is not None:
+ if (
+ self.use_memory_selection
+ and past_out.get("eff_iou_score", 0) < self.mf_threshold
+ ) or not self.use_memory_selection:
+ output_dict["non_cond_frame_outputs"][past_frame_idx] = (
+ _trim_past_out(past_out, current_out)
+ )
+
+ if (
+ self.use_memory_selection and not self.offload_output_to_cpu_for_eval
+ ): # design for memory selection, trim too old frames to save memory
+ far_old_frame_idx = frame_idx - 20 * self.max_obj_ptrs_in_encoder
+ past_out = output_dict["non_cond_frame_outputs"].get(
+ far_old_frame_idx, None
+ )
+ if past_out is not None:
+ output_dict["non_cond_frame_outputs"][far_old_frame_idx] = (
+ _trim_past_out(past_out, current_out)
+ )
+
+ return current_out
+
+ def track_step(
+ self,
+ *,
+ frame_idx,
+ is_init_cond_frame,
+ backbone_features_interactive,
+ backbone_features_propagation,
+ image,
+ point_inputs,
+ mask_inputs,
+ gt_masks,
+ frames_to_add_correction_pt,
+ output_dict,
+ num_frames,
+ track_in_reverse=False, # tracking in reverse time order (for demo usage)
+ # Whether to run the memory encoder on the predicted masks. Sometimes we might want
+ # to skip the memory encoder with `run_mem_encoder=False`. For example,
+ # in demo we might call `track_step` multiple times for each user click,
+ # and only encode the memory when the user finalizes their clicks. And in ablation
+ # settings like SAM training on static images, we don't need the memory encoder.
+ run_mem_encoder=True,
+ # The previously predicted SAM mask logits (which can be fed together with new clicks in demo).
+ prev_sam_mask_logits=None,
+ multiplex_state: MultiplexState,
+ # The list of object idx that point_inputs correspond to; only this set of objects will
+ # be interacted with in the correction stage
+ objects_to_interact: Optional[list[int]] = None,
+ ) -> StageOutput:
+ current_out, _ = self._track_step_aux(
+ frame_idx=frame_idx,
+ is_init_cond_frame=is_init_cond_frame,
+ backbone_features_interactive=backbone_features_interactive,
+ backbone_features_propagation=backbone_features_propagation,
+ image=image,
+ point_inputs=point_inputs,
+ mask_inputs=mask_inputs,
+ gt_masks=gt_masks,
+ frames_to_add_correction_pt=frames_to_add_correction_pt,
+ output_dict=output_dict,
+ num_frames=num_frames,
+ track_in_reverse=track_in_reverse,
+ run_mem_encoder=run_mem_encoder,
+ prev_sam_mask_logits=prev_sam_mask_logits,
+ multiplex_state=multiplex_state,
+ objects_to_interact=objects_to_interact,
+ need_aux_output=False,
+ )
+ current_out = self._trim_output_and_memory(
+ frame_idx, output_dict, current_out, memory_encoder_was_used=run_mem_encoder
+ )
+
+ return current_out
+
+ def back_convert(self, targets):
+ """To be compatible with SetCriterionAPI losses (mask loss only)."""
+ batched_targets = {}
+ batched_targets["num_boxes"] = targets.num_boxes
+ batched_targets["masks"] = targets.segments
+ batched_targets["is_valid_mask"] = targets.is_valid_segment
+ return batched_targets
+
+ def _use_multimask(self, is_init_cond_frame, point_inputs):
+ """Whether to use multimask output in the SAM head."""
+ num_pts = 0 if point_inputs is None else point_inputs["point_labels"].size(1)
+ multimask_output = (
+ self.multimask_output_in_sam
+ and (is_init_cond_frame or self.multimask_output_for_tracking)
+ and (self.multimask_min_pt_num <= num_pts <= self.multimask_max_pt_num)
+ and self.num_multimask_outputs > 0
+ )
+ return multimask_output
+
+ def _apply_non_overlapping_constraints(self, pred_masks):
+ """
+ Apply non-overlapping constraints to the object scores in pred_masks. Here we
+ keep only the highest scoring object at each spatial location in pred_masks.
+ """
+ batch_size = pred_masks.size(0)
+ if batch_size == 1:
+ return pred_masks
+
+ device = pred_masks.device
+ # "max_obj_inds": object index of the object with the highest score at each location
+ max_obj_inds = torch.argmax(pred_masks, dim=0, keepdim=True)
+ # "batch_obj_inds": object index of each object slice (along dim 0) in `pred_masks`
+ batch_obj_inds = torch.arange(batch_size, device=device)[:, None, None, None]
+ keep = max_obj_inds == batch_obj_inds
+ # suppress overlapping regions' scores below -10.0 so that the foreground regions
+ # don't overlap (here sigmoid(-10.0)=4.5398e-05)
+ pred_masks = torch.where(keep, pred_masks, torch.clamp(pred_masks, max=-10.0))
+ return pred_masks
+
+ def _compile_all_components(self):
+ """Compile all model components for faster inference."""
+ # a larger cache size to hold varying number of shapes for torch.compile
+ # see https://github.com/pytorch/pytorch/blob/v2.5.1/torch/_dynamo/config.py#L42-L49
+ torch._dynamo.config.cache_size_limit = 64
+ torch._dynamo.config.accumulated_cache_size_limit = 2048
+
+ logging.info("Compiling all components. First time may be very slow.")
+
+ self.maskmem_backbone.forward = torch.compile(
+ self.maskmem_backbone.forward,
+ mode="max-autotune",
+ fullgraph=True,
+ dynamic=False,
+ )
+ self.transformer.encoder.forward = torch.compile(
+ self.transformer.encoder.forward,
+ mode="max-autotune",
+ fullgraph=True,
+ dynamic=True, # Num. of memories varies
+ )
+ # We disable compilation of sam_prompt_encoder as it sometimes gives a large accuracy regression,
+ # especially when sam_mask_prompt (previous mask logits) is not None
+ # self.sam_prompt_encoder.forward = torch.compile(
+ # self.sam_prompt_encoder.forward,
+ # mode="max-autotune",
+ # fullgraph=True,
+ # dynamic=False, # Accuracy regression on True
+ # )
+ self.sam_mask_decoder.forward = torch.compile(
+ self.sam_mask_decoder.forward,
+ mode="max-autotune",
+ fullgraph=True,
+ dynamic=False, # Accuracy regression on True
+ )
+
+ def _maybe_clone(self, x):
+ """Clone a tensor if and only if `self.compile_all_components` is True."""
+ return x.clone() if self.compile_all_components else x
+
+ def get_propagation_dense_pe(self) -> torch.Tensor:
+ """
+ Returns the positional encoding used to encode point prompts,
+ applied to a dense set of points the shape of the image encoding.
+
+ Returns:
+ torch.Tensor: Positional encoding with shape
+ 1x(embed_dim)x(embedding_h)x(embedding_w)
+ """
+ return self.image_pe_layer(
+ (self.sam_image_embedding_size, self.sam_image_embedding_size)
+ ).unsqueeze(0)
+
+ def cal_mem_score(self, object_score_logits, iou_score):
+ object_score_norm = torch.where(
+ object_score_logits > 0,
+ object_score_logits.sigmoid() * 2 - 1, # rescale to [0, 1]
+ torch.zeros_like(object_score_logits),
+ )
+ score_per_frame = (object_score_norm * iou_score).mean()
+ return score_per_frame
+
+ def frame_filter(self, output_dict, track_in_reverse, frame_idx, num_frames, r):
+ if (frame_idx == 0 and not track_in_reverse) or (
+ frame_idx == num_frames - 1 and track_in_reverse
+ ):
+ return []
+
+ max_num = min(
+ num_frames, self.max_obj_ptrs_in_encoder
+ ) # maximum number of pointer memory frames to consider
+
+ if not track_in_reverse:
+ start = frame_idx - 1
+ end = 0
+ step = -r
+ must_include = frame_idx - 1
+ else:
+ start = frame_idx + 1
+ end = num_frames
+ step = r
+ must_include = frame_idx + 1
+
+ valid_indices = []
+ for i in range(start, end, step):
+ if (
+ i not in output_dict["non_cond_frame_outputs"]
+ or "eff_iou_score" not in output_dict["non_cond_frame_outputs"][i]
+ ):
+ continue
+
+ score_per_frame = output_dict["non_cond_frame_outputs"][i]["eff_iou_score"]
+
+ if score_per_frame > self.mf_threshold: # threshold
+ valid_indices.insert(0, i)
+
+ if len(valid_indices) >= max_num - 1:
+ break
+
+ if must_include not in valid_indices:
+ valid_indices.append(must_include)
+
+ return valid_indices
+
+
+def concat_points(old_point_inputs, new_points, new_labels):
+ """Add new points and labels to previous point inputs (add at the end)."""
+ if old_point_inputs is None:
+ points, labels = new_points, new_labels
+ else:
+ points = torch.cat([old_point_inputs["point_coords"], new_points], dim=1)
+ labels = torch.cat([old_point_inputs["point_labels"], new_labels], dim=1)
+
+ return {"point_coords": points, "point_labels": labels}
+
+
+def _append(
+ d1: StageOutput, d2: SAMOutput, k1: str, k2: str, dim: int = 0, strict: bool = True
+):
+ if strict:
+ assert k1 in d1, f"{k1} not found"
+ else:
+ if k1 not in d1:
+ return
+
+ d1[k1] = torch.cat([d1[k1], d2[k2]], dim=dim)
+
+
+def _merge(
+ d1: StageOutput,
+ d2: SAMOutput,
+ k1: str,
+ k2: str,
+ d2_idx: list[int],
+ strict: bool = True,
+):
+ if strict:
+ assert k1 in d1, f"{k1} not found"
+ else:
+ if k1 not in d1:
+ return
+ d1[k1][d2_idx] = d2[k2].to(dtype=d1[k1].dtype)
+
+
+class VideoTrackingDynamicMultiplex(VideoTrackingMultiplex):
+ def __init__(
+ self,
+ enable_dynamic_training: bool = True, # Allows the number of objects to increase across frames during training
+ rand_num_transition_points: bool = True, # Randomizes the number of transition points
+ max_num_transition_points: int = 3, # Maximum number of transition points
+ add_all_transition_frames_as_cond: bool = True,
+ max_trans_frames_in_attn: int = 4,
+ is_dynamic_model: bool = True, # Overrides the default
+ is_dynamic_vos_evaluation: bool = False, # For datasets like YouTubeVOS which have new objects
+ **kwargs,
+ ):
+ super().__init__(is_dynamic_model=is_dynamic_model, **kwargs)
+
+ self.enable_dynamic_training = enable_dynamic_training
+ self.rand_num_transition_points = rand_num_transition_points
+ self.max_num_transition_points = max_num_transition_points
+
+ self.add_all_transition_frames_as_cond = add_all_transition_frames_as_cond
+ self.max_trans_frames_in_attn = max_trans_frames_in_attn
+ self.is_dynamic_vos_evaluation = is_dynamic_vos_evaluation
+
+ def prepare_prompt_inputs(self, backbone_out, input, start_frame_idx=0):
+ """
+ Prepare input mask, point or box prompts. Optionally, we allow tracking from
+ a custom `start_frame_idx` to the end of the video (for evaluation purposes).
+ """
+
+ """
+ This function, in addition to the prompt preparation done in the parent class, preprocesses the
+ masks and pre-computes visibility/validity attributes necessary for training with dynamic bucketing.
+
+ **Data**
+ We use a modified dataset class and a modified collate_fn such that:
+ 1. The mask for an object is loaded if it is visible (area>0) on any of the loaded frames
+ 2. A "visible_objects_per_frame" attribute is computed, which contains the set of objects with area>0 on each frame
+
+ Here, we use [] to denote a set of objects; i.e., object A and B are represented as [A, B].
+ Consider the masks given by the dataloader in an arbitrary yet deterministic order.
+ That is, [2, 3] can appear on the first frame, and [1, 2, 3, 17] can appear on the second frame.
+
+ This is incompatible with the object addition implementation, since we assume new objects are appended, not inserted.
+ Thus, we compute object_appearance_order which sorts the object idx using the frame at which they appear
+ (conditional frames always appear first). For objects that appear on the same frame, we shuffle them as augmentation.
+ We also reorder the ground-truth masks used for supervision.
+
+ **Causal supervision**
+ Since not all objects appear on the first frame, we should not supervise on the objects that the model has no knowledge of yet.
+ Thus, we keep track of the set of objects that have been introduced, and the frame at which that happens.
+ We compute valid_idx_per_frame (and correspondingly trim the ground-truth) to enforce reasonable supervisions.
+
+ **Transition points**
+ Transition points are non-initial-conditioning frames that introduce new objects. We uniformly sample some frames
+ to be candidates for transition points, and use them if they actually introduce new objects compared to the last seen
+ conditional frame/transition point.
+ Transitions do not always happen when an object first becomes visible, because our (initial) sampling is agnostic to visibility.
+ This is intended, as new objects do not always get detected immediately in the dense tracking setting.
+ """
+
+ # First, prepare the prompt inputs following the parent class
+ backbone_out = super()._prepare_prompt_inputs_meta(
+ backbone_out, input, start_frame_idx=start_frame_idx
+ )
+
+ num_frames = backbone_out["num_frames"]
+ gt_masks_per_frame = backbone_out["gt_masks_per_frame"]
+
+ if self.training or self.is_dynamic_vos_evaluation:
+ visible_objects_per_frame: dict[int, set[int]] = (
+ input.visible_objects_per_frame
+ )
+ else:
+ visible_objects_per_frame: dict[int, set[int]] = {
+ stage_id: set(range(gt_masks_per_frame[stage_id].shape[0]))
+ for stage_id in range(num_frames)
+ }
+
+ # If we have more than one conditioning frame,
+ # all visible objects on any of the conditioning frames become valid for all frames
+ init_cond_frames: list[int] = backbone_out["init_cond_frames"]
+ init_cond_frames = sorted(init_cond_frames)
+ frames_not_in_init_cond: list[int] = backbone_out["frames_not_in_init_cond"]
+
+ # Rare case: the data guard might fail and we could have an empty first frame.
+ # In this case, we track an empty object.
+ if len(visible_objects_per_frame[start_frame_idx]) == 0:
+ if self.training:
+ logging.warning("Empty first frame, tracking an empty object")
+ visible_objects_per_frame[start_frame_idx] = {0}
+ # set the GT mask for this object to be all zeros
+ for stage_id in range(num_frames):
+ gt_masks_per_frame[stage_id][0] = torch.zeros_like(
+ gt_masks_per_frame[stage_id][0]
+ )
+ else:
+ # During evaluation, this should only happen for YouTubeVOS.
+ # We will skip the frames before the first conditional frame.
+ assert self.is_dynamic_vos_evaluation, (
+ f"{visible_objects_per_frame=} invalid"
+ )
+ assert len(init_cond_frames) == 1
+ for stage_id in range(start_frame_idx, num_frames):
+ if len(visible_objects_per_frame[stage_id]) > 0:
+ init_cond_frames = [stage_id]
+ break
+ for i in range(
+ init_cond_frames[0] + 1
+ ): # also remove init_cond_frames[0]
+ if i in frames_not_in_init_cond:
+ frames_not_in_init_cond.remove(i)
+
+ backbone_out["init_cond_frames"] = init_cond_frames
+
+ # The object idx in valid_idx_per_frame should be in sequential order.
+ # We will first reshuffle the objects using object_appearance_order,
+ # and then index via valid_idx_per_frame.
+ valid_idx_per_frame: dict[int, list[int]] = {}
+ # Importantly, we cannot simply use valid_idx_per_frame[stage_id-1] because it might be a conditional frame.
+ valid_idx_prior_to_each_transition: dict[int, list[int]] = {}
+ new_idx_per_transition: dict[int, list[int]] = {}
+
+ if self.training and self.enable_dynamic_training:
+ # Select the number of transition points
+ if self.rand_num_transition_points:
+ # Randomly select 1 to `max_num_transition_points` transition points
+ num_transition_points = self.rng.integers(
+ 1, self.max_num_transition_points, endpoint=True
+ )
+ else:
+ num_transition_points = self.max_num_transition_points
+
+ available_transition_points = frames_not_in_init_cond
+ num_transition_points = min(
+ num_transition_points, len(available_transition_points)
+ )
+ # num_transition_points can differ between GPUs so we use rng2
+ transition_points = self.rng2.choice(
+ available_transition_points, num_transition_points, replace=False
+ ).tolist()
+ transition_points = sorted(transition_points)
+
+ # Filter for the transition points that do introduce new objects
+ filtered_transition_points = []
+ objects_seen = set()
+ for stage_id in init_cond_frames:
+ objects_seen.update(visible_objects_per_frame[stage_id])
+
+ for stage_id in range(start_frame_idx, num_frames):
+ if stage_id in transition_points:
+ new_objects_seen = (
+ visible_objects_per_frame[stage_id] - objects_seen
+ )
+ if len(new_objects_seen) > 0:
+ filtered_transition_points.append(stage_id)
+ objects_seen.update(new_objects_seen)
+ new_idx_per_transition[stage_id] = list(new_objects_seen)
+ transition_points = filtered_transition_points
+
+ # Create appearance-based object ordering with randomization
+ init_objects = set()
+ for stage_id in init_cond_frames:
+ init_objects.update(visible_objects_per_frame[stage_id])
+ init_objects = list(init_objects)
+ self.rng2.shuffle(init_objects)
+
+ object_appearance_order = init_objects.copy()
+ valid_idx_per_frame[start_frame_idx] = list(range(len(init_objects)))
+ for stage_id in range(start_frame_idx + 1, num_frames):
+ if stage_id in transition_points:
+ # When objects appear at a transition point, we add them to the end of the list
+ stage_objects = new_idx_per_transition[stage_id].copy()
+ self.rng2.shuffle(stage_objects)
+ valid_idx_prior_to_each_transition[stage_id] = list(
+ range(len(object_appearance_order))
+ )
+ new_idx_per_transition[stage_id] = list(
+ range(
+ len(object_appearance_order),
+ len(object_appearance_order) + len(stage_objects),
+ )
+ )
+ object_appearance_order.extend(stage_objects)
+
+ # Update the valid objects at this frame
+ if stage_id in init_cond_frames:
+ # Note: on any non-first init cond frame, the number of valid objects
+ # might be fewer than the previous frame because we always process the init cond frames first.
+ # For example, if [1, 2, 4] are visible on the two init cond frames (e.g., frame 0 and frame 5),
+ # and object 3 appears on frame 4 (as a transition point), object 3 would not be considered valid on frame 5.
+ # This should not break any processing steps or affect correctness (since invalid objects are marked as floating).
+ valid_idx_per_frame[stage_id] = valid_idx_per_frame[
+ start_frame_idx
+ ].copy()
+ elif stage_id in frames_not_in_init_cond:
+ valid_idx_per_frame[stage_id] = list(
+ range(len(object_appearance_order))
+ )
+ else:
+ raise ValueError(
+ f"Unexpected {stage_id=}? {init_cond_frames=} {frames_not_in_init_cond=} {transition_points=}"
+ )
+ elif self.is_dynamic_vos_evaluation and not self.training:
+ # In dynamic VOS evaluation, we find the transition points manually.
+ # Each object should appear on exactly one frame.
+ # NOTE: The new release of YouTubeVOS apparently did not enforce this.
+ # We are enforcing it here.
+
+ # Find first appearance of each object
+ object_appearance_order: list[int] = []
+ object_appear_at_stage: dict[int, int] = {}
+ transition_points: list[int] = []
+ stage_to_new_objects: dict[int, list[int]] = defaultdict(list)
+ for stage_id in range(start_frame_idx, num_frames):
+ visible_objects = sorted(list(visible_objects_per_frame[stage_id]))
+ for obj_id in visible_objects:
+ if obj_id in object_appear_at_stage:
+ continue # skip seen objects
+
+ object_appear_at_stage[obj_id] = stage_id
+ object_appearance_order.append(obj_id)
+ stage_to_new_objects[stage_id].append(obj_id)
+ if stage_id not in init_cond_frames:
+ transition_points.append(stage_id)
+
+ # Track cumulative object count
+ objects_seen_so_far = []
+ for stage_id in range(start_frame_idx, num_frames):
+ if stage_id in transition_points:
+ # New objects appear at this frame
+ new_objects = stage_to_new_objects[stage_id]
+ num_objects_before = len(objects_seen_so_far)
+
+ # Record which objects were valid before this transition
+ valid_idx_prior_to_each_transition[stage_id] = list(
+ range(num_objects_before)
+ )
+ # Record the indices of new objects
+ new_idx_per_transition[stage_id] = list(
+ range(num_objects_before, num_objects_before + len(new_objects))
+ )
+
+ objects_seen_so_far.extend(new_objects)
+
+ # Set valid objects for this frame
+ if stage_id in init_cond_frames:
+ # For init cond frames, only the initial objects are valid
+ valid_idx_per_frame[stage_id] = list(
+ range(len(stage_to_new_objects[stage_id]))
+ )
+ objects_seen_so_far.extend(stage_to_new_objects[stage_id])
+ else:
+ # For other frames, all objects seen so far are valid
+ valid_idx_per_frame[stage_id] = list(
+ range(len(objects_seen_so_far))
+ )
+
+ else:
+ # Use no transition points when dynamic training is disabled
+ transition_points = []
+ visible_objects_on_first_frame = sorted(
+ list(visible_objects_per_frame[start_frame_idx])
+ )
+ # Since visible_objects_on_first_frame might not be consecutive
+ object_orderings = list(range(len(visible_objects_on_first_frame)))
+ # Use the original order for evaluation
+ object_appearance_order = visible_objects_on_first_frame.copy()
+ for stage_id in range(start_frame_idx, num_frames):
+ valid_idx_per_frame[stage_id] = object_orderings.copy()
+
+ # Apply the appearance-based mapping to ground-truth masks
+ for stage_id in range(start_frame_idx, num_frames):
+ gt_masks_per_frame[stage_id] = gt_masks_per_frame[stage_id][
+ object_appearance_order
+ ][valid_idx_per_frame[stage_id]]
+
+ # We also want to apply this change in-place to the input, such that loss can be computed correctly.
+ # For targets.segments, we need to delay the object introduction by 1 frame.
+ # At transition points, use current frame's masks but only for objects that existed in the previous frame.
+ # This allows us to compute the loss on the existing objects and not on the newly added objects.
+ for stage_id, targets in enumerate(input.find_targets):
+ if stage_id in transition_points:
+ # At transition points, use current frame's masks but only keep objects from the previous frame
+ prev_objects = valid_idx_prior_to_each_transition[stage_id]
+ # Only keep masks for objects that existed in the previous frame
+ targets.segments = gt_masks_per_frame[stage_id][prev_objects].squeeze(1)
+ else:
+ targets.segments = gt_masks_per_frame[stage_id].squeeze(1)
+ # Ensure that we are averaging the loss correctly.
+ # Although this is called num_boxes, it actually stores an array of ones with length=number of objects in the VOS setting.
+ targets.num_boxes = targets.num_boxes[: targets.segments.shape[0]]
+
+ backbone_out["valid_idx_per_frame"] = valid_idx_per_frame
+ backbone_out["new_idx_per_transition"] = new_idx_per_transition
+ backbone_out["valid_objects_prior_to_each_transition"] = (
+ valid_idx_prior_to_each_transition
+ )
+ backbone_out["transition_points"] = set(transition_points)
+ backbone_out["gt_masks_per_frame"] = gt_masks_per_frame
+ backbone_out["object_appearance_order"] = object_appearance_order
+
+ backbone_out = self._prepare_conditional_frames(backbone_out)
+
+ return backbone_out
+
+ def add_new_masks_to_existing_state(
+ self,
+ *,
+ interactive_pix_feat: torch.Tensor,
+ interactive_high_res_features: list[torch.Tensor],
+ propagation_vision_feats: Optional[
+ list[torch.Tensor]
+ ], # needed when add_mask_to_memory=True
+ propagation_feat_sizes: Optional[
+ list[tuple[int, int]]
+ ], # needed when add_mask_to_memory=True
+ new_masks: torch.Tensor,
+ obj_idxs_in_mask: list[
+ int
+ ], # len(obj_idxs_in_mask) == new_masks.shape[0]; object idx internal to this state
+ obj_ids_in_mask: Optional[
+ list[int]
+ ], # len(obj_ids_in_mask) == new_masks.shape[0]; global object ids
+ prev_output: StageOutput, # this state will be modified in-place
+ multiplex_state: MultiplexState,
+ add_mask_to_memory: bool = True,
+ are_masks_from_pts: bool = False,
+ allow_new_buckets: bool = False,
+ prefer_new_buckets: bool = False,
+ ) -> None:
+ """
+ Add new objects to an existing output/multiplex state.
+
+ This function encodes the input masks as new masks and merges them with the existing state.
+ The new object entries are always appended to the existing objects.
+
+ This is because, in the dense tracking scenario, we should always propagate (existing state)
+ to the current frame first before introducing the new objects.
+ """
+ assert self.use_mask_input_as_output_without_sam
+ assert new_masks.shape[0] == len(obj_idxs_in_mask)
+
+ num_new_objects = new_masks.shape[0]
+
+ if obj_ids_in_mask is not None:
+ assert len(obj_ids_in_mask) == num_new_objects
+
+ if self.use_obj_ptrs_in_encoder:
+ # demux the existing pointers before we change the multiplex state
+ existing_pointers = multiplex_state.demux(prev_output["obj_ptr"])
+
+ # Step 1: Inform the multiplex state that we are adding new objects
+ new_object_idx = multiplex_state.find_next_batch_of_available_indices(
+ num_objects=num_new_objects,
+ allow_new_buckets=allow_new_buckets,
+ prefer_new_buckets=prefer_new_buckets,
+ )
+ multiplex_state.add_objects(
+ object_indices=new_object_idx,
+ object_ids=obj_ids_in_mask,
+ allow_new_buckets=allow_new_buckets,
+ prefer_new_buckets=prefer_new_buckets,
+ )
+
+ # Step 2: Encode the incoming masks
+ mask_output = self._use_mask_as_output(
+ backbone_features=interactive_pix_feat,
+ high_res_features=interactive_high_res_features,
+ mask_inputs=new_masks,
+ multiplex_state=multiplex_state,
+ objects_in_mask=new_object_idx,
+ )
+
+ # Step 3: Merge the existing state with new encoded features
+ # Handle resolution mismatch between propagation (e.g., 1008) and interactive (e.g., 288) features
+ # Determine target resolution from interactive features (newly generated masks)
+ interactive_resolution = mask_output["high_res_masks"].shape[-1]
+
+ # Check if prev_output needs resolution adjustment
+ if (
+ "pred_masks_high_res" in prev_output
+ and prev_output["pred_masks_high_res"] is not None
+ ):
+ existing_resolution = prev_output["pred_masks_high_res"].shape[-1]
+
+ if existing_resolution != interactive_resolution:
+ # Resize existing outputs to match interactive resolution
+ # This happens when frame was bootstrapped with propagation features (1008)
+ # but we're now adding interactive masks (288)
+ prev_output["pred_masks_high_res"] = F.interpolate(
+ prev_output["pred_masks_high_res"],
+ size=(interactive_resolution, interactive_resolution),
+ mode="bilinear",
+ align_corners=False,
+ )
+
+ # Resize low_res_masks to match prev_output resolution
+ h, w = prev_output["pred_masks"].shape[-2:]
+ mask_output["low_res_masks"] = F.interpolate(
+ mask_output["low_res_masks"],
+ size=(h, w),
+ align_corners=False,
+ mode="bilinear",
+ antialias=True, # use antialias for downsampling
+ )
+
+ _append(prev_output, mask_output, "pred_masks", "low_res_masks")
+ _append(
+ prev_output,
+ mask_output,
+ "pred_masks_high_res",
+ "high_res_masks",
+ strict=False,
+ )
+ _append(prev_output, mask_output, "object_score_logits", "object_score_logits")
+ if self.use_memory_selection:
+ mask_output["ious"] = mask_output["ious"].squeeze(-1)
+ _append(prev_output, mask_output, "iou_score", "ious")
+
+ # Merge the input masks
+ if "input_masks" in prev_output:
+ prev_output["input_masks"] = torch.cat(
+ [prev_output["input_masks"], new_masks], dim=0
+ )
+
+ if self.use_obj_ptrs_in_encoder:
+ # Merge the object pointers. Note that the pointers in SAMOutput are in the data space,
+ # while those in StageOutput are in the mux space.
+ new_pointers = mask_output["obj_ptr"].to(existing_pointers.dtype)
+ combined_pointers = torch.cat([existing_pointers, new_pointers], dim=0)
+ prev_output["obj_ptr"] = multiplex_state.mux(combined_pointers)
+
+ # Step 4: Update the set of conditioning objects at this frame.
+ prev_output["conditioning_objects"].update(new_object_idx)
+
+ # Step 5: Re-encode the spatial memory if needed
+ if add_mask_to_memory:
+ assert (
+ prev_output["pred_masks_high_res"].shape[0]
+ == multiplex_state.total_valid_entries
+ )
+ # Add the new masks to the memory
+ maskmem_features, maskmem_pos_enc = self._encode_new_memory(
+ image=None,
+ current_vision_feats=propagation_vision_feats,
+ feat_sizes=propagation_feat_sizes,
+ pred_masks_high_res=prev_output["pred_masks_high_res"],
+ object_score_logits=prev_output["object_score_logits"],
+ conditioning_objects=prev_output["conditioning_objects"],
+ is_mask_from_pts=are_masks_from_pts,
+ multiplex_state=multiplex_state,
+ )
+ prev_output["maskmem_features"] = maskmem_features
+ prev_output["maskmem_pos_enc"] = maskmem_pos_enc
+ if self.save_image_features:
+ # They should already be in the state; no modification is needed
+ assert "image_features" in prev_output
+ assert "image_pos_enc" in prev_output
+
+ def recondition_masks_in_existing_state(
+ self,
+ *,
+ interactive_pix_feat: torch.Tensor,
+ interactive_high_res_features: list[torch.Tensor],
+ propagation_vision_feats: Optional[
+ list[torch.Tensor]
+ ], # needed when add_mask_to_memory=True
+ propagation_feat_sizes: Optional[
+ list[tuple[int, int]]
+ ], # needed when add_mask_to_memory=True
+ new_masks: torch.Tensor,
+ obj_idxs_in_mask: list[
+ int
+ ], # len(obj_idxs_in_mask) == new_masks.shape[0]; object idx internal to this state
+ obj_ids_in_mask: Optional[
+ list[int]
+ ], # len(obj_ids_in_mask) == new_masks.shape[0]; global object ids
+ prev_output: StageOutput, # this state will be modified in-place
+ multiplex_state: MultiplexState,
+ add_mask_to_memory: bool = True,
+ ) -> None:
+ """
+ Recondition existing objects in an existing output/multiplex state.
+
+ This function encodes the input masks and merges them with the existing state.
+ """
+ assert self.use_mask_input_as_output_without_sam
+ assert new_masks.shape[0] == len(obj_idxs_in_mask)
+
+ num_new_objects = new_masks.shape[0]
+
+ if obj_ids_in_mask is not None:
+ assert len(obj_ids_in_mask) == num_new_objects
+
+ if self.use_obj_ptrs_in_encoder:
+ # demux the existing pointers before we change the multiplex state
+ existing_pointers = multiplex_state.demux(prev_output["obj_ptr"])
+
+ # Step 1: Encode the incoming masks
+ mask_output = self._use_mask_as_output(
+ backbone_features=interactive_pix_feat,
+ high_res_features=interactive_high_res_features,
+ mask_inputs=new_masks,
+ multiplex_state=multiplex_state,
+ objects_in_mask=obj_idxs_in_mask,
+ )
+
+ # Step 2: Merge the existing state with new encoded features
+ # TODO: Remove this and fix the resolution mismatch
+ h, w = prev_output["pred_masks"].shape[-2:]
+ mask_output["low_res_masks"] = F.interpolate(
+ mask_output["low_res_masks"],
+ size=(h, w),
+ align_corners=False,
+ mode="bilinear",
+ antialias=True, # use antialias for downsampling
+ )
+
+ _merge(
+ prev_output, mask_output, "pred_masks", "low_res_masks", obj_idxs_in_mask
+ )
+ _merge(
+ prev_output,
+ mask_output,
+ "pred_masks_high_res",
+ "high_res_masks",
+ obj_idxs_in_mask,
+ strict=False,
+ )
+ _merge(
+ prev_output,
+ mask_output,
+ "object_score_logits",
+ "object_score_logits",
+ obj_idxs_in_mask,
+ )
+ if self.use_memory_selection:
+ mask_output["ious"] = mask_output["ious"].squeeze(-1)
+ _merge(
+ prev_output,
+ mask_output,
+ "iou_score",
+ "ious",
+ obj_idxs_in_mask,
+ )
+
+ # Merge the input masks
+ if "input_masks" in prev_output:
+ prev_output["input_masks"][obj_idxs_in_mask] = new_masks
+
+ if self.use_obj_ptrs_in_encoder:
+ # Merge the object pointers. Note that the pointers in SAMOutput are in the data space,
+ # while those in StageOutput are in the mux space.
+ new_pointers = mask_output["obj_ptr"].to(existing_pointers.dtype)
+ existing_pointers[obj_idxs_in_mask] = new_pointers
+ prev_output["obj_ptr"] = multiplex_state.mux(existing_pointers)
+
+ # Step 3: Update the set of conditioning objects at this frame
+ prev_output["conditioning_objects"].update(obj_idxs_in_mask)
+
+ # Step 4: Re-encode the spatial memory if needed
+ if add_mask_to_memory:
+ assert (
+ prev_output["pred_masks_high_res"].shape[0]
+ == multiplex_state.total_valid_entries
+ )
+ # Add the new masks to the memory
+ maskmem_features, maskmem_pos_enc = self._encode_new_memory(
+ image=None,
+ current_vision_feats=propagation_vision_feats,
+ feat_sizes=propagation_feat_sizes,
+ pred_masks_high_res=prev_output["pred_masks_high_res"],
+ object_score_logits=prev_output["object_score_logits"],
+ conditioning_objects=prev_output["conditioning_objects"],
+ is_mask_from_pts=False,
+ multiplex_state=multiplex_state,
+ )
+ prev_output["maskmem_features"] = maskmem_features
+ prev_output["maskmem_pos_enc"] = maskmem_pos_enc
+ if self.save_image_features:
+ # They should already be in the state; no modification is needed
+ assert "image_features" in prev_output
+ assert "image_pos_enc" in prev_output
+
+ def track_step(
+ self,
+ *,
+ frame_idx,
+ is_init_cond_frame,
+ backbone_features_interactive,
+ backbone_features_propagation,
+ image,
+ point_inputs,
+ mask_inputs,
+ gt_masks,
+ frames_to_add_correction_pt,
+ output_dict,
+ num_frames,
+ track_in_reverse=False, # tracking in reverse time order (for demo usage)
+ # Whether to run the memory encoder on the predicted masks. Sometimes we might want
+ # to skip the memory encoder with `run_mem_encoder=False`. For example,
+ # in demo we might call `track_step` multiple times for each user click,
+ # and only encode the memory when the user finalizes their clicks. And in ablation
+ # settings like SAM training on static images, we don't need the memory encoder.
+ run_mem_encoder=True,
+ # The previously predicted SAM mask logits (which can be fed together with new clicks in demo).
+ prev_sam_mask_logits=None,
+ multiplex_state: MultiplexState,
+ # The list of object IDs that point_inputs correspond to; only this set of objects will
+ # be interacted with in the correction stage
+ objects_to_interact: Optional[list[int]] = None,
+ # The following parameters are specific to the dynamic multiplexing model
+ new_object_masks: Optional[torch.Tensor] = None,
+ new_object_idxs: Optional[list[int]] = None,
+ new_object_ids: Optional[list[int]] = None,
+ are_new_masks_from_pts: bool = False,
+ ) -> StageOutput:
+ # First, run track_step_aux.
+ # This includes propagation, interaction, and correction.
+ current_out, aux_out = self._track_step_aux(
+ frame_idx=frame_idx,
+ is_init_cond_frame=is_init_cond_frame,
+ backbone_features_interactive=backbone_features_interactive,
+ backbone_features_propagation=backbone_features_propagation,
+ image=image,
+ point_inputs=point_inputs,
+ mask_inputs=mask_inputs,
+ gt_masks=gt_masks,
+ frames_to_add_correction_pt=frames_to_add_correction_pt,
+ output_dict=output_dict,
+ num_frames=num_frames,
+ track_in_reverse=track_in_reverse,
+ run_mem_encoder=(run_mem_encoder and new_object_masks is None),
+ prev_sam_mask_logits=prev_sam_mask_logits,
+ multiplex_state=multiplex_state,
+ objects_to_interact=objects_to_interact,
+ need_aux_output=(new_object_masks is not None),
+ )
+
+ # If new masks are provided, merge them into the existing state
+ if new_object_masks is not None:
+ assert new_object_idxs is not None
+ self.add_new_masks_to_existing_state(
+ interactive_pix_feat=aux_out["interactive_pix_feat"],
+ interactive_high_res_features=aux_out["interactive_high_res_features"],
+ propagation_vision_feats=aux_out["propagation_vision_feats"],
+ propagation_feat_sizes=aux_out["propagation_feat_sizes"],
+ new_masks=new_object_masks,
+ obj_idxs_in_mask=new_object_idxs,
+ obj_ids_in_mask=new_object_ids,
+ prev_output=current_out,
+ multiplex_state=multiplex_state,
+ add_mask_to_memory=run_mem_encoder,
+ are_masks_from_pts=are_new_masks_from_pts,
+ )
+
+ # lastly, trim the output
+ current_out = self._trim_output_and_memory(
+ frame_idx=frame_idx,
+ output_dict=output_dict,
+ current_out=current_out,
+ memory_encoder_was_used=run_mem_encoder,
+ )
+
+ return current_out
+
+ def forward_tracking(
+ self,
+ backbone_out,
+ input,
+ return_dict=False,
+ objects_to_interact: Optional[list[int]] = None,
+ ):
+ """Forward video tracking on each frame (and sample correction clicks)."""
+ img_feats_already_computed = (
+ "interactive" in backbone_out or "sam2_backbone_out" in backbone_out
+ )
+ if img_feats_already_computed:
+ # Prepare the backbone features
+ # - vision_feats and vision_pos_embeds are in (HW)BC format
+ # - vision_masks are in B(HW) format, dtype=bool (False is valid, True is padding)
+ backbone_features = self._prepare_backbone_features(backbone_out)
+
+ # Starting the stage loop
+ num_frames = backbone_out["num_frames"]
+ init_cond_frames = backbone_out["init_cond_frames"]
+ frames_to_add_correction_pt = backbone_out["frames_to_add_correction_pt"]
+ # First process all the initial conditioning frames to encode them as memory,
+ # And then condition on them to track the remaining frames
+ processing_order = init_cond_frames + backbone_out["frames_not_in_init_cond"]
+
+ new_idx_per_transition = backbone_out["new_idx_per_transition"]
+ valid_objects_prior_to_each_transition = backbone_out[
+ "valid_objects_prior_to_each_transition"
+ ]
+ transition_points = backbone_out["transition_points"]
+
+ cond_frame_outputs: dict[int, StageOutput] = {}
+ non_cond_frame_outputs: dict[int, StageOutput] = {}
+ output_dict = {
+ "cond_frame_outputs": cond_frame_outputs,
+ "non_cond_frame_outputs": non_cond_frame_outputs,
+ }
+ multiplex_state = self.multiplex_controller.get_state(
+ backbone_out["gt_masks_per_frame"][processing_order[0]].shape[0],
+ device=backbone_out["gt_masks_per_frame"][processing_order[0]].device,
+ dtype=torch.float,
+ random=self.training,
+ )
+
+ for stage_id in processing_order:
+ # Get the image features for the current frame
+ img_ids = input.find_inputs[stage_id].img_ids
+ # The image ids are for the entire batch
+ assert all(
+ [img_id == img_ids[0] for img_id in img_ids]
+ ) # should be all the same
+ # force this to have a batch size of 1
+ img_ids = torch.tensor(
+ [img_ids[0]], device=img_ids.device, dtype=img_ids.dtype
+ )
+
+ if img_feats_already_computed:
+ # Retrieve image features according to img_ids (if they are already computed).
+ current_image = input.img_batch.tensors[img_ids]
+ current_backbone_features = {}
+ for neck_k, neck_out in backbone_features.items():
+ current_backbone_features[neck_k] = {
+ "vision_feats": [
+ x[:, img_ids] for x in neck_out["vision_feats"]
+ ],
+ "vision_masks": [
+ x[img_ids] if x is not None else None
+ for x in neck_out["vision_masks"]
+ ],
+ "vision_pos_embeds": [
+ x[:, img_ids] for x in neck_out["vision_pos_embeds"]
+ ],
+ "feat_sizes": neck_out["feat_sizes"],
+ }
+ else:
+ # Otherwise, compute the image features on the fly for the given img_ids
+ # (this might be used for evaluation on long videos to avoid backbone OOM).
+ need_interactive_out = (
+ (stage_id in frames_to_add_correction_pt)
+ or (stage_id in init_cond_frames)
+ or (stage_id in transition_points)
+ )
+ (current_image, current_backbone_features) = (
+ self._prepare_backbone_features_per_frame(
+ input.img_batch,
+ img_ids,
+ need_interactive_out=need_interactive_out,
+ need_propagation_out=True,
+ )
+ )
+
+ gt_masks = backbone_out["gt_masks_per_frame"].get(stage_id, None)
+ if stage_id in transition_points:
+ assert gt_masks is not None
+
+ # Figure out new object masks / idxs
+ new_object_idxs = new_idx_per_transition[stage_id]
+ # Get the new object masks, ensure correct ordering
+ assert sorted(new_object_idxs) == new_object_idxs
+ assert new_object_idxs[0] == len(
+ valid_objects_prior_to_each_transition[stage_id]
+ ), (
+ f"{new_object_idxs=}; {gt_masks.shape=}; {valid_objects_prior_to_each_transition[stage_id]=}"
+ )
+ assert new_object_idxs[-1] == (len(gt_masks) - 1), (
+ f"{new_object_idxs=}; {gt_masks.shape=}"
+ )
+ new_object_masks = gt_masks[new_object_idxs]
+
+ # Remove the new objects from the gt masks
+ gt_masks = gt_masks[: new_object_idxs[0]]
+ else:
+ new_object_masks = None
+ new_object_idxs = None
+
+ # Get output masks based on this frame's prompts and previous memory
+ current_out = self.track_step(
+ frame_idx=stage_id,
+ is_init_cond_frame=stage_id in init_cond_frames,
+ backbone_features_interactive=current_backbone_features.get(
+ "interactive"
+ ),
+ backbone_features_propagation=current_backbone_features.get(
+ "sam2_backbone_out"
+ ),
+ image=current_image,
+ point_inputs=backbone_out["point_inputs_per_frame"].get(stage_id, None),
+ mask_inputs=backbone_out["mask_inputs_per_frame"].get(stage_id, None),
+ gt_masks=gt_masks,
+ frames_to_add_correction_pt=frames_to_add_correction_pt,
+ output_dict=output_dict,
+ num_frames=num_frames,
+ multiplex_state=multiplex_state,
+ objects_to_interact=objects_to_interact,
+ new_object_masks=new_object_masks,
+ new_object_idxs=new_object_idxs,
+ )
+ # Append the output, depending on whether it's a conditioning frame
+ add_output_as_cond_frame = (
+ stage_id in init_cond_frames
+ or (
+ self.add_all_frames_to_correct_as_cond
+ and stage_id in frames_to_add_correction_pt
+ )
+ or (
+ self.add_all_transition_frames_as_cond
+ and stage_id in transition_points
+ )
+ )
+
+ if add_output_as_cond_frame:
+ output_dict["cond_frame_outputs"][stage_id] = current_out
+ else:
+ output_dict["non_cond_frame_outputs"][stage_id] = current_out
+
+ output_dict["multiplex_state"] = multiplex_state
+
+ if return_dict:
+ return output_dict
+ # turn `output_dict` into a list for loss function
+ all_frame_outputs = {}
+ all_frame_outputs.update(output_dict["cond_frame_outputs"])
+ all_frame_outputs.update(output_dict["non_cond_frame_outputs"])
+ if self.is_dynamic_vos_evaluation:
+ all_frame_outputs = [all_frame_outputs.get(t) for t in range(num_frames)]
+ else:
+ all_frame_outputs = [all_frame_outputs[t] for t in range(num_frames)]
+ # Make DDP happy with activation checkpointing by removing unused keys
+ all_frame_outputs = [
+ {k: v for k, v in d.items() if k != "obj_ptr"} if d is not None else None
+ for d in all_frame_outputs
+ ]
+
+ if self.is_dynamic_vos_evaluation:
+ object_appearance_order = backbone_out["object_appearance_order"]
+ num_objects = len(input.find_metadatas[0].coco_image_id)
+
+ # since we have remapped the object appearance order, we would need to map it back here
+ inverse_object_appearance_order = [None for _ in object_appearance_order]
+ for idx, obj_id in enumerate(object_appearance_order):
+ inverse_object_appearance_order[obj_id] = idx
+ assert all(i is not None for i in inverse_object_appearance_order)
+
+ # this is for a rare case where the dataloader thinks that there is an object
+ # (is in input.find_metadatas[0].coco_image_id)
+ # but it is not visible anywhere in the frames
+ # I suspect this is due to mask resizing (the object is so small that it got lost)
+ # but I am not 100% sure; haven't investigated yet.
+ # This only happens if we evaluate on the new (fully annotated) YouTubeVOS set.
+ if len(inverse_object_appearance_order) < num_objects:
+ inverse_object_appearance_order.extend(
+ list(range(len(inverse_object_appearance_order), num_objects))
+ )
+
+ # we need to pad the outputs with zeros (for the frames before the object appears)
+ last_mask = all_frame_outputs[-1]["pred_masks"]
+
+ shape = last_mask.shape[1:]
+ dtype = last_mask.dtype
+ device = last_mask.device
+ for stage_i, frame_out in enumerate(all_frame_outputs):
+ if frame_out is None:
+ all_frame_outputs[stage_i] = {
+ "pred_masks": torch.zeros(
+ (num_objects, *shape), device=device, dtype=dtype
+ )
+ }
+ continue
+
+ pred_mask = frame_out["pred_masks"]
+ if pred_mask.shape[0] < num_objects:
+ shape = pred_mask.shape[
+ 1:
+ ] # might have a different shape, e.g., input mask
+ frame_out["pred_masks"] = torch.cat(
+ [
+ pred_mask,
+ torch.zeros(
+ (num_objects - pred_mask.shape[0], *shape),
+ device=device,
+ dtype=dtype,
+ ),
+ ],
+ dim=0,
+ )[inverse_object_appearance_order]
+
+ return all_frame_outputs
diff --git a/sam3/model/video_tracking_multiplex_demo.py b/sam3/model/video_tracking_multiplex_demo.py
new file mode 100644
index 0000000..067907c
--- /dev/null
+++ b/sam3/model/video_tracking_multiplex_demo.py
@@ -0,0 +1,3476 @@
+import logging
+from collections import OrderedDict
+from copy import deepcopy
+from typing import Iterable, Optional
+
+import numpy as np
+import torch
+from sam3.model.data_misc import NestedTensor
+from sam3.model.io_utils import load_video_frames
+from sam3.model.multiplex_utils import MultiplexState
+from sam3.model.sam3_tracker_utils import fill_holes_in_mask_scores
+from sam3.model.video_tracking_multiplex import (
+ concat_points,
+ NO_OBJ_SCORE,
+ VideoTrackingDynamicMultiplex,
+)
+from tqdm import tqdm
+
+
+class VideoTrackingMultiplexDemo(VideoTrackingDynamicMultiplex):
+ """
+ The demo class that extends the `VideoTrackingDynamicMultiplex` to handle user interactions
+ and manage inference states, with support for multi-object tracking.
+
+ Interactions are not yet implemented.
+ """
+
+ def __init__(
+ self,
+ # whether to clear non-conditioning memory of the surrounding frames (which may contain outdated information) after adding correction clicks;
+ # note that this would only apply to *single-object tracking* unless `clear_non_cond_mem_for_multi_obj` is also set to True)
+ clear_non_cond_mem_around_input=False,
+ # whether to also clear non-conditioning memory of the surrounding frames (only effective when `clear_non_cond_mem_around_input` is True).
+ clear_non_cond_mem_for_multi_obj=False,
+ # if fill_hole_area > 0, we fill small holes in the final masks up to this area (after resizing them to the original video resolution)
+ fill_hole_area=0,
+ # if always_start_from_first_ann_frame is True, we always start tracking from the frame where we receive the first annotation (clicks or mask)
+ # and ignore the `start_frame_idx` passed to `propagate_in_video`
+ always_start_from_first_ann_frame=False,
+ # the maximum number of points to be used in the prompt encoder, which reduce the domain gap between training (that only has 8 points)
+ # - if it's set to a positive integer, we only take the `max_point_num_in_prompt_enc//2` points and
+ # the last `(max_point_num_in_prompt_enc - max_point_num_in_prompt_enc//2)` points in the prompt encoder
+ # - if it's set to 0 or negative, this option is turned off and we use all points in the prompt encoder
+ max_point_num_in_prompt_enc=16,
+ non_overlap_masks_for_output=True,
+ **kwargs,
+ ):
+ super().__init__(**kwargs)
+
+ self.clear_non_cond_mem_around_input = clear_non_cond_mem_around_input
+ self.clear_non_cond_mem_for_multi_obj = clear_non_cond_mem_for_multi_obj
+ self.fill_hole_area = fill_hole_area
+ self.always_start_from_first_ann_frame = always_start_from_first_ann_frame
+ self.max_point_num_in_prompt_enc = max_point_num_in_prompt_enc
+ self.non_overlap_masks_for_output = non_overlap_masks_for_output
+
+ @torch.inference_mode()
+ def init_state(
+ self,
+ video_path,
+ offload_video_to_cpu,
+ offload_state_to_cpu,
+ async_loading_frames=False,
+ use_torchcodec=False,
+ use_cv2=False,
+ ):
+ """Initialize a inference state."""
+ # Make sure that sigmoid is used on mask logits (should be True for all our recent models).
+ # Since we rely on large negative values as scores for missing objects, the raw logits
+ # cannot be consumed directly and must be converted into 0~1 range via sigmoid first.
+ if not self.apply_sigmoid_to_mask_logits_for_mem_enc:
+ raise NotImplementedError(
+ "Multi-object tracking requires sigmoid in memory encoder for non-overlapping constraints."
+ )
+
+ images, video_height, video_width = load_video_frames(
+ video_path=video_path,
+ image_size=self.image_size,
+ offload_video_to_cpu=offload_video_to_cpu,
+ async_loading_frames=async_loading_frames,
+ use_torchcodec=use_torchcodec,
+ use_cv2=use_cv2,
+ )
+ inference_state = {}
+ inference_state["images"] = images
+ inference_state["num_frames"] = len(images)
+ # whether to offload the video frames to CPU memory
+ # turning on this option saves the GPU memory with only a very small overhead
+ inference_state["offload_video_to_cpu"] = offload_video_to_cpu
+ # whether to offload the inference state to CPU memory
+ # turning on this option saves the GPU memory at the cost of a lower tracking fps
+ # (e.g. in a test case of 768x768 model, fps dropped from 27 to 24 when tracking one object
+ # and from 24 to 21 when tracking two objects)
+ inference_state["offload_state_to_cpu"] = offload_state_to_cpu
+ # the original video height and width, used for resizing final output scores
+ inference_state["video_height"] = video_height
+ inference_state["video_width"] = video_width
+ inference_state["device"] = torch.device("cuda")
+ if offload_state_to_cpu:
+ inference_state["storage_device"] = torch.device("cpu")
+ else:
+ inference_state["storage_device"] = torch.device("cuda")
+ # inputs on each frame
+ inference_state["point_inputs_per_obj"] = {}
+ inference_state["mask_inputs_per_obj"] = {}
+ # visual features on a small number of recently visited frames for quick interactions
+ inference_state["cached_features"] = {}
+ # values that don't change across frames (so we only need to hold one copy of them)
+ inference_state["constants"] = {}
+ # mapping between client-side object id and model-side object index
+ inference_state["obj_id_to_idx"] = OrderedDict()
+ inference_state["obj_idx_to_id"] = OrderedDict()
+ inference_state["obj_ids"] = []
+ # A storage to hold the model's tracking results and states on each frame
+ inference_state["output_dict"] = {
+ "cond_frame_outputs": {}, # dict containing {frame_idx: }
+ "non_cond_frame_outputs": {}, # dict containing {frame_idx: }
+ }
+ # The index of the frame that received the first annotation
+ inference_state["first_ann_frame_idx"] = None
+ # Slice (view) of each object tracking results, sharing the same memory with "output_dict"
+ inference_state["output_dict_per_obj"] = {}
+ # A temporary storage to hold new outputs when user interact with a frame
+ # to add clicks or mask (it's merged into "output_dict" before propagation starts)
+ inference_state["temp_output_dict_per_obj"] = {}
+ # Frames that already holds consolidated outputs from click or mask inputs
+ # (we directly use their consolidated outputs during tracking)
+ inference_state["consolidated_frame_inds"] = {
+ "cond_frame_outputs": set(), # set containing frame indices
+ "non_cond_frame_outputs": set(), # set containing frame indices
+ }
+ # metadata for each tracking frame (e.g. which direction it's tracked)
+ inference_state["tracking_has_started"] = False
+ inference_state["frames_already_tracked"] = {}
+ inference_state["multiplex_state"] = None
+ # Track which frames have been refined by user interaction (per object)
+ # This is used to distinguish first refinement (fresh) vs subsequent refinements (incremental)
+ inference_state["user_refined_frames_per_obj"] = {}
+ # # Warm up the whole model and cache the image feature on frame 0
+ # # by making a dummy click on the first frame (and then cleaning it up)
+ # self.add_new_points(
+ # inference_state=inference_state,
+ # frame_idx=0,
+ # obj_id=1,
+ # points=torch.tensor([[0.5, 0.5]], dtype=torch.float32),
+ # labels=torch.tensor([1], dtype=torch.int32),
+ # clear_old_points=True,
+ # rel_coordinates=True,
+ # )
+ # self.clear_all_points_in_video(inference_state)
+ return inference_state
+
+ def _obj_id_to_idx(self, inference_state, obj_id, error_if_new=False):
+ """Map client-side object id to model-side object index."""
+ obj_idx = inference_state["obj_id_to_idx"].get(obj_id, None)
+ if obj_idx is not None:
+ return obj_idx
+
+ if (
+ self.is_dynamic_model or not inference_state["tracking_has_started"]
+ ) and not error_if_new:
+ # get the next object slot
+ obj_idx = len(inference_state["obj_id_to_idx"])
+ inference_state["obj_id_to_idx"][obj_id] = obj_idx
+ inference_state["obj_idx_to_id"][obj_idx] = obj_id
+ inference_state["obj_ids"] = list(inference_state["obj_id_to_idx"])
+ # set up input and output structures for this object
+ inference_state["point_inputs_per_obj"][obj_idx] = {}
+ inference_state["mask_inputs_per_obj"][obj_idx] = {}
+ inference_state["output_dict_per_obj"][obj_idx] = {
+ "cond_frame_outputs": {}, # dict containing {frame_idx: }
+ "non_cond_frame_outputs": {}, # dict containing {frame_idx: }
+ }
+ inference_state["temp_output_dict_per_obj"][obj_idx] = {
+ "cond_frame_outputs": {}, # dict containing {frame_idx: }
+ "non_cond_frame_outputs": {}, # dict containing {frame_idx: }
+ }
+ return obj_idx
+ else:
+ raise RuntimeError(
+ f"Cannot add new object id {obj_id}. "
+ f"All existing object ids: {inference_state['obj_ids']}."
+ )
+
+ def _obj_idx_to_id(self, inference_state, obj_idx):
+ """Map model-side object index to client-side object id."""
+ return inference_state["obj_idx_to_id"][obj_idx]
+
+ def _get_obj_num(self, inference_state):
+ """Get the total number of unique object ids received so far in this session."""
+ # return len(inference_state["obj_idx_to_id"])
+ return inference_state["multiplex_state"].total_valid_entries
+
+ @torch.inference_mode()
+ def _extract_object_for_interaction(self, inference_state, obj_id, frame_idx):
+ """
+ Extract a single object from multiplex state for singleton interaction.
+ Adapted from sam3_multiplex_tracking._extract_object_to_singleton_state()
+
+ Returns:
+ singleton_state: New inference state containing only this object
+ obj_idx_in_source: Original object index before removal (for merging back)
+ """
+ source_state = inference_state
+ obj_idx_in_source = source_state["obj_id_to_idx"][obj_id]
+
+ # Step 1: Extract all object data BEFORE removing it
+ multiplex_state = source_state.get("multiplex_state")
+
+ # Extract consolidated outputs (slice NOW before remove_object modifies tensors)
+ singleton_consolidated_outputs = {
+ "cond_frame_outputs": {},
+ "non_cond_frame_outputs": {},
+ }
+
+ if "output_dict" in source_state:
+ for storage_key in ["cond_frame_outputs", "non_cond_frame_outputs"]:
+ source_outputs = source_state["output_dict"].get(storage_key, {})
+
+ for f_idx, source_frame_out in source_outputs.items():
+ # Check if this frame has valid data for this object
+ has_valid_data = (
+ source_frame_out["pred_masks"].shape[0] >= obj_idx_in_source + 1
+ )
+
+ if has_valid_data:
+ # Create singleton frame output by slicing
+ singleton_frame_out = {
+ "pred_masks": source_frame_out["pred_masks"][
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone(),
+ "object_score_logits": source_frame_out[
+ "object_score_logits"
+ ][obj_idx_in_source : obj_idx_in_source + 1].clone(),
+ # image_features and image_pos_enc remain shared (not in multiplex space)
+ "image_features": source_frame_out.get("image_features"),
+ "image_pos_enc": source_frame_out.get("image_pos_enc"),
+ "local_obj_id_to_idx": {obj_id: 0},
+ }
+
+ # Handle maskmem_features by converting from multiplex space to data space
+ maskmem_features = source_frame_out.get("maskmem_features")
+ if maskmem_features is not None:
+ if multiplex_state is not None:
+ expected_buckets = multiplex_state.num_buckets
+ expected_multiplex = multiplex_state.multiplex_count
+ if (
+ maskmem_features.dim() >= 2
+ and maskmem_features.shape[0] == expected_buckets
+ and maskmem_features.shape[1] == expected_multiplex
+ ):
+ try:
+ demuxed_features = multiplex_state.demux(
+ maskmem_features
+ )
+ except AssertionError as exc:
+ logging.warning(
+ "[EXTRACT] demux failed for maskmem_features shape %s: %s",
+ tuple(maskmem_features.shape),
+ exc,
+ )
+ demuxed_features = None
+ if demuxed_features is not None:
+ maskmem_features = demuxed_features[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ else:
+ maskmem_features = maskmem_features[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ elif maskmem_features.shape[0] == 0:
+ # No entries for this object yet; treat as missing without warning
+ maskmem_features = None
+ elif maskmem_features.shape[0] >= obj_idx_in_source + 1:
+ # Already in data space; slice directly
+ maskmem_features = maskmem_features[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ else:
+ logging.warning(
+ "[EXTRACT] maskmem_features shape %s incompatible with multiplex state; dropping tensor",
+ tuple(maskmem_features.shape),
+ )
+ maskmem_features = None
+ else:
+ maskmem_features = maskmem_features[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ singleton_frame_out["maskmem_features"] = maskmem_features
+
+ # Handle maskmem_pos_enc similarly, level by level
+ maskmem_pos_enc = source_frame_out.get("maskmem_pos_enc")
+ if maskmem_pos_enc is not None:
+ remapped_pos_enc = []
+ for level_enc in maskmem_pos_enc:
+ if level_enc is None:
+ remapped_pos_enc.append(None)
+ continue
+ if multiplex_state is not None:
+ expected_buckets = multiplex_state.num_buckets
+ expected_multiplex = multiplex_state.multiplex_count
+ if (
+ level_enc.dim() >= 2
+ and level_enc.shape[0] == expected_buckets
+ and level_enc.shape[1] == expected_multiplex
+ ):
+ try:
+ demuxed_level = multiplex_state.demux(
+ level_enc
+ )
+ except AssertionError as exc:
+ logging.warning(
+ "[EXTRACT] demux failed for maskmem_pos_enc level shape %s: %s",
+ tuple(level_enc.shape),
+ exc,
+ )
+ demuxed_level = None
+ if demuxed_level is not None:
+ remapped_pos_enc.append(
+ demuxed_level[
+ obj_idx_in_source : obj_idx_in_source
+ + 1
+ ].clone()
+ )
+ elif (
+ level_enc.shape[0] >= obj_idx_in_source + 1
+ ):
+ remapped_pos_enc.append(
+ level_enc[
+ obj_idx_in_source : obj_idx_in_source
+ + 1
+ ].clone()
+ )
+ else:
+ logging.warning(
+ "[EXTRACT] maskmem_pos_enc level shape %s incompatible with multiplex state; dropping level",
+ tuple(level_enc.shape),
+ )
+ remapped_pos_enc.append(None)
+ elif level_enc.shape[0] >= obj_idx_in_source + 1:
+ remapped_pos_enc.append(
+ level_enc[
+ obj_idx_in_source : obj_idx_in_source
+ + 1
+ ].clone()
+ )
+ else:
+ logging.warning(
+ "[EXTRACT] maskmem_pos_enc level shape %s incompatible with multiplex state; dropping level",
+ tuple(level_enc.shape),
+ )
+ remapped_pos_enc.append(None)
+ else:
+ remapped_pos_enc.append(
+ level_enc[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ )
+ maskmem_pos_enc = remapped_pos_enc
+ singleton_frame_out["maskmem_pos_enc"] = maskmem_pos_enc
+
+ # Handle obj_ptr (must demux from multiplex space first)
+ if (
+ "obj_ptr" in source_frame_out
+ and self.use_obj_ptrs_in_encoder
+ ):
+ source_obj_ptr = source_frame_out["obj_ptr"]
+ if multiplex_state is not None:
+ # Demux: multiplex space → data space
+ obj_ptr_data_space = multiplex_state.demux(
+ source_obj_ptr
+ )
+ # Slice for this object
+ singleton_frame_out["obj_ptr"] = obj_ptr_data_space[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+ else:
+ singleton_frame_out["obj_ptr"] = source_obj_ptr[
+ obj_idx_in_source : obj_idx_in_source + 1
+ ].clone()
+
+ # Convert conditioning_objects
+ if "conditioning_objects" in source_frame_out:
+ if (
+ obj_idx_in_source
+ in source_frame_out["conditioning_objects"]
+ ):
+ singleton_frame_out["conditioning_objects"] = {0}
+ else:
+ singleton_frame_out["conditioning_objects"] = set()
+
+ singleton_consolidated_outputs[storage_key][f_idx] = (
+ singleton_frame_out
+ )
+
+ # Extract point and mask inputs
+ extracted_point_inputs = {}
+ extracted_mask_inputs = {}
+
+ if (
+ "point_inputs_per_obj" in source_state
+ and obj_idx_in_source in source_state["point_inputs_per_obj"]
+ ):
+ extracted_point_inputs = source_state["point_inputs_per_obj"][
+ obj_idx_in_source
+ ].copy()
+
+ if (
+ "mask_inputs_per_obj" in source_state
+ and obj_idx_in_source in source_state["mask_inputs_per_obj"]
+ ):
+ extracted_mask_inputs = source_state["mask_inputs_per_obj"][
+ obj_idx_in_source
+ ].copy()
+
+ # Extract per-object outputs
+ extracted_obj_cond_outputs = {}
+ extracted_obj_non_cond_outputs = {}
+ extracted_temp_cond_outputs = {}
+ extracted_temp_non_cond_outputs = {}
+
+ if (
+ "output_dict_per_obj" in source_state
+ and obj_idx_in_source in source_state["output_dict_per_obj"]
+ ):
+ obj_output_dict = source_state["output_dict_per_obj"][obj_idx_in_source]
+ extracted_obj_cond_outputs = obj_output_dict.get(
+ "cond_frame_outputs", {}
+ ).copy()
+ extracted_obj_non_cond_outputs = obj_output_dict.get(
+ "non_cond_frame_outputs", {}
+ ).copy()
+
+ if (
+ "temp_output_dict_per_obj" in source_state
+ and obj_idx_in_source in source_state["temp_output_dict_per_obj"]
+ ):
+ temp_obj_output_dict = source_state["temp_output_dict_per_obj"][
+ obj_idx_in_source
+ ]
+ extracted_temp_cond_outputs = temp_obj_output_dict.get(
+ "cond_frame_outputs", {}
+ ).copy()
+ extracted_temp_non_cond_outputs = temp_obj_output_dict.get(
+ "non_cond_frame_outputs", {}
+ ).copy()
+
+ # Step 2: Remove the object from source state
+ remaining_obj_ids, _ = self.remove_object(
+ source_state,
+ obj_id,
+ strict=False,
+ need_output=False,
+ clear_user_refined_map=False,
+ )
+
+ # If multiplex state became empty, reset it so downstream code can reinitialize
+ updated_multiplex_state = source_state.get("multiplex_state")
+ if updated_multiplex_state is not None:
+ if (
+ getattr(updated_multiplex_state, "assignments", None) is None
+ or updated_multiplex_state.total_valid_entries == 0
+ ):
+ source_state["multiplex_state"] = None
+
+ # Step 3: Create new singleton inference state
+ singleton_state = self.init_state(
+ cached_features=source_state["cached_features"],
+ video_height=source_state["video_height"],
+ video_width=source_state["video_width"],
+ num_frames=source_state["num_frames"],
+ )
+
+ # Step 4: Set up singleton state structure
+ singleton_state["obj_id_to_idx"] = {obj_id: 0}
+ singleton_state["obj_idx_to_id"] = {0: obj_id}
+ singleton_state["obj_ids"] = [obj_id]
+ singleton_state["point_inputs_per_obj"] = {0: extracted_point_inputs}
+ singleton_state["mask_inputs_per_obj"] = {0: extracted_mask_inputs}
+ singleton_state["output_dict_per_obj"] = {
+ 0: {
+ "cond_frame_outputs": extracted_obj_cond_outputs,
+ "non_cond_frame_outputs": extracted_obj_non_cond_outputs,
+ }
+ }
+ singleton_state["temp_output_dict_per_obj"] = {
+ 0: {
+ "cond_frame_outputs": extracted_temp_cond_outputs,
+ "non_cond_frame_outputs": extracted_temp_non_cond_outputs,
+ }
+ }
+ singleton_state["frames_already_tracked"] = source_state[
+ "frames_already_tracked"
+ ].copy()
+
+ # Step 5: Create new singleton multiplex state (even for 1 object, needed for obj_ptr)
+ new_multiplex_state = self.multiplex_controller.get_state(
+ num_valid_entries=1,
+ device=source_state["device"],
+ dtype=torch.float32,
+ random=False,
+ object_ids=[obj_id],
+ )
+ singleton_state["multiplex_state"] = new_multiplex_state
+
+ # Step 6: Remux extracted tensors into the singleton multiplex space
+ for storage_key in ["cond_frame_outputs", "non_cond_frame_outputs"]:
+ for f_idx, frame_out in singleton_consolidated_outputs[storage_key].items():
+ # mask memory features
+ if frame_out.get("maskmem_features") is not None:
+ # Keep mask memory features in data space (num_objects, C, H, W)
+ frame_out["maskmem_features"] = frame_out[
+ "maskmem_features"
+ ].clone()
+
+ if frame_out.get("maskmem_pos_enc") is not None:
+ remapped_levels = []
+ for level_enc in frame_out["maskmem_pos_enc"]:
+ if level_enc is None:
+ remapped_levels.append(None)
+ continue
+ remapped_levels.append(level_enc.clone())
+ frame_out["maskmem_pos_enc"] = remapped_levels
+
+ # object pointers
+ if "obj_ptr" in frame_out and self.use_obj_ptrs_in_encoder:
+ # Mux: data space [1, D] → singleton multiplex space [1, 1, D]
+ frame_out["obj_ptr"] = new_multiplex_state.mux(frame_out["obj_ptr"])
+
+ singleton_state["output_dict"] = singleton_consolidated_outputs
+
+ return singleton_state, obj_idx_in_source
+
+ @torch.inference_mode()
+ def _merge_singleton_interaction_result(
+ self,
+ inference_state,
+ singleton_state,
+ obj_id,
+ original_obj_idx,
+ ):
+ """
+ Merge singleton interaction result back into multiplex state.
+
+ SIMPLIFIED APPROACH: Add object back at the END (new index), not at original position.
+ This avoids complex index shifting and works with multiplex controller's add_objects() API.
+
+ Args:
+ inference_state: The main multiplex inference state
+ singleton_state: The singleton state with interaction results
+ obj_id: The object ID
+ original_obj_idx: The original index before extraction (unused - we add at end instead)
+ """
+ # Determine new index (add at end)
+ new_obj_idx = len(inference_state["obj_ids"])
+
+ # Step 1: Add object mappings at new index
+ inference_state["obj_ids"].append(obj_id)
+ inference_state["obj_id_to_idx"][obj_id] = new_obj_idx
+
+ # Create entry in output_dict_per_obj and temp_output_dict_per_obj for new index
+ # These are DICTIONARIES indexed by obj_idx, not lists!
+ inference_state["output_dict_per_obj"][new_obj_idx] = {
+ "cond_frame_outputs": {},
+ "non_cond_frame_outputs": {},
+ }
+ inference_state["temp_output_dict_per_obj"][new_obj_idx] = {
+ "cond_frame_outputs": {},
+ "non_cond_frame_outputs": {},
+ }
+
+ inference_state["obj_idx_to_id"][new_obj_idx] = obj_id
+
+ # Step 2: Add object to multiplex state buckets using proper API
+ multiplex_state = inference_state.get("multiplex_state")
+
+ assignments = (
+ getattr(multiplex_state, "assignments", None)
+ if multiplex_state is not None
+ else None
+ )
+ total_valid_entries = (
+ getattr(multiplex_state, "total_valid_entries", 0)
+ if multiplex_state is not None and assignments is not None
+ else 0
+ )
+ need_state_reinit = (
+ multiplex_state is None or assignments is None or total_valid_entries == 0
+ )
+
+ if not need_state_reinit and getattr(multiplex_state, "object_ids", None):
+ if obj_id in multiplex_state.object_ids:
+ old_idx = multiplex_state.object_ids.index(obj_id)
+ multiplex_state.remove_objects(object_indices=[old_idx], strict=False)
+ assignments = getattr(multiplex_state, "assignments", None)
+ total_valid_entries = (
+ getattr(multiplex_state, "total_valid_entries", 0)
+ if assignments is not None
+ else 0
+ )
+ need_state_reinit = assignments is None or total_valid_entries == 0
+
+ if need_state_reinit:
+ inference_state["multiplex_state"] = self.multiplex_controller.get_state(
+ num_valid_entries=len(inference_state["obj_ids"]),
+ device=inference_state["device"],
+ dtype=torch.float32,
+ random=False,
+ object_ids=list(inference_state["obj_ids"]),
+ )
+ multiplex_state = inference_state["multiplex_state"]
+ else:
+ # Allow new buckets since we're adding at a new index (the old bucket slot may have been removed)
+ multiplex_state.add_objects(
+ object_indices=[new_obj_idx],
+ object_ids=[obj_id],
+ allow_new_buckets=True, # May need new bucket if old slot was compacted
+ )
+
+ # Step 3: Restore point and mask inputs at new index
+ singleton_obj_idx = 0 # Object is always at index 0 in singleton state
+ if (
+ "point_inputs_per_obj" in singleton_state
+ and singleton_obj_idx in singleton_state["point_inputs_per_obj"]
+ ):
+ if "point_inputs_per_obj" not in inference_state:
+ inference_state["point_inputs_per_obj"] = {}
+ inference_state["point_inputs_per_obj"][new_obj_idx] = singleton_state[
+ "point_inputs_per_obj"
+ ][singleton_obj_idx].copy()
+
+ if (
+ "mask_inputs_per_obj" in singleton_state
+ and singleton_obj_idx in singleton_state["mask_inputs_per_obj"]
+ ):
+ if "mask_inputs_per_obj" not in inference_state:
+ inference_state["mask_inputs_per_obj"] = {}
+ inference_state["mask_inputs_per_obj"][new_obj_idx] = singleton_state[
+ "mask_inputs_per_obj"
+ ][singleton_obj_idx].copy()
+
+ # Step 4: Restore per-object outputs at new index
+ if (
+ "output_dict_per_obj" in singleton_state
+ and singleton_obj_idx in singleton_state["output_dict_per_obj"]
+ ):
+ if "output_dict_per_obj" not in inference_state:
+ inference_state["output_dict_per_obj"] = {}
+ inference_state["output_dict_per_obj"][new_obj_idx] = singleton_state[
+ "output_dict_per_obj"
+ ][singleton_obj_idx].copy()
+
+ if (
+ "temp_output_dict_per_obj" in singleton_state
+ and singleton_obj_idx in singleton_state["temp_output_dict_per_obj"]
+ ):
+ if "temp_output_dict_per_obj" not in inference_state:
+ inference_state["temp_output_dict_per_obj"] = {}
+ inference_state["temp_output_dict_per_obj"][new_obj_idx] = singleton_state[
+ "temp_output_dict_per_obj"
+ ][singleton_obj_idx].copy()
+
+ # Step 5: Merge consolidated outputs back into multiplex (append at new_obj_idx)
+ # Preserve each frame's original storage key from the singleton state so that
+ # conditioning frames remain in cond_frame_outputs after the merge.
+ if "output_dict" in singleton_state:
+ singleton_multiplex_state = singleton_state.get("multiplex_state")
+ for singleton_storage_key in [
+ "cond_frame_outputs",
+ "non_cond_frame_outputs",
+ ]:
+ singleton_outputs = singleton_state["output_dict"].get(
+ singleton_storage_key, {}
+ )
+
+ # Skip if singleton doesn't have any frames in this storage_key
+ if not singleton_outputs:
+ continue
+
+ for frame_idx, singleton_frame_out in singleton_outputs.items():
+ # Get or create frame output in main state at the EXPECTED storage_key
+ if "output_dict" not in inference_state:
+ inference_state["output_dict"] = {
+ "cond_frame_outputs": {},
+ "non_cond_frame_outputs": {},
+ }
+
+ if (
+ frame_idx
+ not in inference_state["output_dict"][singleton_storage_key]
+ ):
+ # Frame doesn't exist - create with singleton results at new_obj_idx
+ num_objs = len(inference_state["obj_ids"])
+
+ # Ensure num_objs is at least new_obj_idx + 1
+ # (in case obj_ids list is somehow inconsistent)
+ if num_objs <= new_obj_idx:
+ num_objs = new_obj_idx + 1
+
+ new_maskmem_features = None
+ new_maskmem_pos_enc = None
+ if (
+ singleton_frame_out.get("maskmem_features") is not None
+ and multiplex_state is not None
+ ):
+ # Check if singleton features are in multiplexed format and demux if needed
+ singleton_features_muxed = singleton_frame_out[
+ "maskmem_features"
+ ]
+ if singleton_features_muxed.shape[:2] == (
+ singleton_multiplex_state.num_buckets,
+ singleton_multiplex_state.multiplex_count,
+ ):
+ # Singleton features are multiplexed, need to demux
+ singleton_features_data = (
+ singleton_multiplex_state.demux(
+ singleton_features_muxed
+ )
+ )
+ else:
+ # Singleton features are in data space
+ singleton_features_data = singleton_features_muxed
+
+ feature_shape = (num_objs,) + singleton_features_data.shape[
+ 1:
+ ]
+ maskmem_features_data = torch.zeros(
+ feature_shape,
+ dtype=singleton_features_data.dtype,
+ device=singleton_features_data.device,
+ )
+ maskmem_features_data[new_obj_idx : new_obj_idx + 1] = (
+ singleton_features_data
+ )
+ # Mux using destination multiplex state
+ new_maskmem_features = multiplex_state.mux(
+ maskmem_features_data
+ )
+
+ if (
+ singleton_frame_out.get("maskmem_pos_enc") is not None
+ and multiplex_state is not None
+ ):
+ new_maskmem_pos_enc = []
+ for level_enc in singleton_frame_out["maskmem_pos_enc"]:
+ if level_enc is None:
+ new_maskmem_pos_enc.append(None)
+ continue
+ # Check if singleton pos_enc is in multiplexed format and demux if needed
+ if level_enc.shape[:2] == (
+ singleton_multiplex_state.num_buckets,
+ singleton_multiplex_state.multiplex_count,
+ ):
+ # Singleton pos_enc is multiplexed, need to demux
+ level_data = singleton_multiplex_state.demux(
+ level_enc
+ )
+ else:
+ # Singleton pos_enc is in data space
+ level_data = level_enc
+
+ level_shape = (num_objs,) + level_data.shape[1:]
+ level_tensor = torch.zeros(
+ level_shape,
+ dtype=level_data.dtype,
+ device=level_data.device,
+ )
+ level_tensor[new_obj_idx : new_obj_idx + 1] = level_data
+ # Mux using destination multiplex state to store in multiplex format
+ new_maskmem_pos_enc.append(
+ multiplex_state.mux(level_tensor)
+ )
+
+ inference_state["output_dict"][singleton_storage_key][
+ frame_idx
+ ] = {
+ "maskmem_features": new_maskmem_features,
+ "maskmem_pos_enc": new_maskmem_pos_enc,
+ "image_features": singleton_frame_out.get("image_features"),
+ "image_pos_enc": singleton_frame_out.get("image_pos_enc"),
+ "local_obj_id_to_idx": {obj_id: new_obj_idx},
+ "conditioning_objects": (
+ set([new_obj_idx])
+ if singleton_obj_idx
+ in singleton_frame_out.get(
+ "conditioning_objects", set()
+ )
+ else set()
+ ),
+ "pred_masks": torch.zeros(
+ (
+ num_objs,
+ 1,
+ singleton_frame_out["pred_masks"].shape[2],
+ singleton_frame_out["pred_masks"].shape[3],
+ ),
+ dtype=singleton_frame_out["pred_masks"].dtype,
+ device=singleton_frame_out["pred_masks"].device,
+ ),
+ "object_score_logits": torch.full(
+ (num_objs, 1),
+ NO_OBJ_SCORE,
+ dtype=singleton_frame_out["object_score_logits"].dtype,
+ device=singleton_frame_out[
+ "object_score_logits"
+ ].device,
+ ),
+ }
+ # Set singleton results at new_obj_idx
+ inference_state["output_dict"][singleton_storage_key][
+ frame_idx
+ ]["pred_masks"][
+ new_obj_idx : new_obj_idx + 1
+ ] = singleton_frame_out["pred_masks"]
+ inference_state["output_dict"][singleton_storage_key][
+ frame_idx
+ ]["object_score_logits"][
+ new_obj_idx : new_obj_idx + 1
+ ] = singleton_frame_out["object_score_logits"]
+
+ # Also copy pred_masks_video_res if it exists in singleton output
+ if "pred_masks_video_res" in singleton_frame_out:
+ inference_state["output_dict"][singleton_storage_key][
+ frame_idx
+ ]["pred_masks_video_res"] = torch.zeros(
+ (
+ num_objs,
+ 1,
+ singleton_frame_out["pred_masks_video_res"].shape[
+ 2
+ ],
+ singleton_frame_out["pred_masks_video_res"].shape[
+ 3
+ ],
+ ),
+ dtype=singleton_frame_out["pred_masks_video_res"].dtype,
+ device=singleton_frame_out[
+ "pred_masks_video_res"
+ ].device,
+ )
+ inference_state["output_dict"][singleton_storage_key][
+ frame_idx
+ ]["pred_masks_video_res"][
+ new_obj_idx : new_obj_idx + 1
+ ] = singleton_frame_out["pred_masks_video_res"]
+
+ # Handle obj_ptr if present
+ if (
+ "obj_ptr" in singleton_frame_out
+ and self.use_obj_ptrs_in_encoder
+ ):
+ singleton_obj_ptr_data = singleton_multiplex_state.demux(
+ singleton_frame_out["obj_ptr"]
+ )
+ obj_ptr_data = torch.zeros(
+ (num_objs, singleton_obj_ptr_data.shape[1]),
+ dtype=singleton_obj_ptr_data.dtype,
+ device=singleton_obj_ptr_data.device,
+ )
+ obj_ptr_data[new_obj_idx : new_obj_idx + 1] = (
+ singleton_obj_ptr_data
+ )
+ inference_state["output_dict"][singleton_storage_key][
+ frame_idx
+ ]["obj_ptr"] = multiplex_state.mux(obj_ptr_data)
+ else:
+ # Frame exists - expand tensors and add singleton results
+ main_frame_out = inference_state["output_dict"][
+ singleton_storage_key
+ ][frame_idx]
+
+ num_objs_total = len(inference_state["obj_ids"])
+
+ if (
+ singleton_frame_out.get("maskmem_features") is not None
+ and multiplex_state is not None
+ ):
+ # Check if singleton features are in multiplexed format and demux if needed
+ singleton_features_muxed = singleton_frame_out[
+ "maskmem_features"
+ ]
+ if singleton_features_muxed.shape[:2] == (
+ singleton_multiplex_state.num_buckets,
+ singleton_multiplex_state.multiplex_count,
+ ):
+ # Singleton features are multiplexed, need to demux
+ singleton_features_data = (
+ singleton_multiplex_state.demux(
+ singleton_features_muxed
+ )
+ )
+ else:
+ # Singleton features are in data space
+ singleton_features_data = singleton_features_muxed
+
+ existing_features_muxed = main_frame_out.get(
+ "maskmem_features"
+ )
+ if existing_features_muxed is not None:
+ # Check if features are in multiplex format before demuxing
+ if existing_features_muxed.shape[:2] == (
+ multiplex_state.num_buckets,
+ multiplex_state.multiplex_count,
+ ):
+ # Features are in multiplex format, demux them
+ existing_features_data = multiplex_state.demux(
+ existing_features_muxed
+ )
+ else:
+ # Features are already in data space, use directly
+ existing_features_data = existing_features_muxed
+ else:
+ existing_features_data = None
+
+ if existing_features_data is None:
+ feature_shape = (
+ num_objs_total,
+ ) + singleton_features_data.shape[1:]
+ existing_features_data = torch.zeros(
+ feature_shape,
+ dtype=singleton_features_data.dtype,
+ device=singleton_features_data.device,
+ )
+ elif existing_features_data.shape[0] < num_objs_total:
+ pad_size = (
+ num_objs_total - existing_features_data.shape[0]
+ )
+ pad = torch.zeros(
+ (pad_size,) + existing_features_data.shape[1:],
+ dtype=existing_features_data.dtype,
+ device=existing_features_data.device,
+ )
+ existing_features_data = torch.cat(
+ [existing_features_data, pad], dim=0
+ )
+
+ existing_features_data[new_obj_idx : new_obj_idx + 1] = (
+ singleton_features_data
+ )
+ main_frame_out["maskmem_features"] = multiplex_state.mux(
+ existing_features_data
+ )
+
+ if (
+ singleton_frame_out.get("maskmem_pos_enc") is not None
+ and multiplex_state is not None
+ ):
+ existing_pos_enc_list = (
+ main_frame_out.get("maskmem_pos_enc") or []
+ )
+ new_maskmem_pos_enc = []
+ max_levels = max(
+ len(singleton_frame_out["maskmem_pos_enc"]),
+ len(existing_pos_enc_list),
+ )
+ for level_idx in range(max_levels):
+ singleton_level_muxed = (
+ singleton_frame_out["maskmem_pos_enc"][level_idx]
+ if level_idx
+ < len(singleton_frame_out["maskmem_pos_enc"])
+ else None
+ )
+ existing_level_muxed = (
+ existing_pos_enc_list[level_idx]
+ if level_idx < len(existing_pos_enc_list)
+ else None
+ )
+
+ if singleton_level_muxed is None:
+ # Keep existing entry (which may also be None)
+ new_maskmem_pos_enc.append(existing_level_muxed)
+ continue
+
+ # Check if singleton pos_enc is in multiplexed format and demux if needed
+ if singleton_level_muxed.shape[:2] == (
+ singleton_multiplex_state.num_buckets,
+ singleton_multiplex_state.multiplex_count,
+ ):
+ # Singleton pos_enc is multiplexed, need to demux
+ singleton_level_data = (
+ singleton_multiplex_state.demux(
+ singleton_level_muxed
+ )
+ )
+ else:
+ # Singleton pos_enc is in data space
+ singleton_level_data = singleton_level_muxed
+
+ if existing_level_muxed is not None:
+ # Check if pos_enc is in multiplex format before demuxing
+ if existing_level_muxed.shape[:2] == (
+ multiplex_state.num_buckets,
+ multiplex_state.multiplex_count,
+ ):
+ # Positional encoding is in multiplex format, demux it
+ existing_level_data = multiplex_state.demux(
+ existing_level_muxed
+ )
+ else:
+ # Positional encoding is already in data space, use directly
+ existing_level_data = existing_level_muxed
+ else:
+ existing_level_data = None
+
+ if existing_level_data is None:
+ level_shape = (
+ num_objs_total,
+ ) + singleton_level_data.shape[1:]
+ existing_level_data = torch.zeros(
+ level_shape,
+ dtype=singleton_level_data.dtype,
+ device=singleton_level_data.device,
+ )
+ elif existing_level_data.shape[0] < num_objs_total:
+ pad_size = (
+ num_objs_total - existing_level_data.shape[0]
+ )
+ pad = torch.zeros(
+ (pad_size,) + existing_level_data.shape[1:],
+ dtype=existing_level_data.dtype,
+ device=existing_level_data.device,
+ )
+ existing_level_data = torch.cat(
+ [existing_level_data, pad], dim=0
+ )
+
+ existing_level_data[new_obj_idx : new_obj_idx + 1] = (
+ singleton_level_data
+ )
+ new_maskmem_pos_enc.append(
+ multiplex_state.mux(existing_level_data)
+ )
+
+ main_frame_out["maskmem_pos_enc"] = new_maskmem_pos_enc
+
+ singleton_pred_masks = singleton_frame_out[
+ "pred_masks"
+ ] # [1, 1, H, W]
+ singleton_scores = singleton_frame_out[
+ "object_score_logits"
+ ] # [1, 1]
+
+ # Expand tensors if needed
+ num_existing_objs = main_frame_out["pred_masks"].shape[0]
+ if new_obj_idx >= num_existing_objs:
+ num_objs_needed = new_obj_idx + 1
+ pad_size = num_objs_needed - num_existing_objs
+
+ main_frame_out["pred_masks"] = torch.cat(
+ [
+ main_frame_out["pred_masks"],
+ torch.zeros(
+ (
+ pad_size,
+ 1,
+ singleton_pred_masks.shape[2],
+ singleton_pred_masks.shape[3],
+ ),
+ dtype=singleton_pred_masks.dtype,
+ device=singleton_pred_masks.device,
+ ),
+ ],
+ dim=0,
+ )
+
+ main_frame_out["object_score_logits"] = torch.cat(
+ [
+ main_frame_out["object_score_logits"],
+ torch.full(
+ (pad_size, 1),
+ NO_OBJ_SCORE,
+ dtype=singleton_scores.dtype,
+ device=singleton_scores.device,
+ ),
+ ],
+ dim=0,
+ )
+
+ # Set singleton results at new_obj_idx
+ main_frame_out["pred_masks"][new_obj_idx : new_obj_idx + 1] = (
+ singleton_pred_masks
+ )
+ main_frame_out["object_score_logits"][
+ new_obj_idx : new_obj_idx + 1
+ ] = singleton_scores
+ # Initialize local_obj_id_to_idx if missing (e.g., frame
+ # output was created by VG propagation's track_step which
+ # does not populate this field).
+ if "local_obj_id_to_idx" not in main_frame_out:
+ main_frame_out["local_obj_id_to_idx"] = deepcopy(
+ inference_state["obj_id_to_idx"]
+ )
+ main_frame_out["local_obj_id_to_idx"][obj_id] = new_obj_idx
+
+ # Also expand and copy pred_masks_video_res if it exists in singleton output
+ if "pred_masks_video_res" in singleton_frame_out:
+ if "pred_masks_video_res" in main_frame_out:
+ # Expand existing video_res masks
+ if (
+ main_frame_out["pred_masks_video_res"].shape[0]
+ < new_obj_idx + 1
+ ):
+ pad_size = (
+ new_obj_idx
+ + 1
+ - main_frame_out["pred_masks_video_res"].shape[
+ 0
+ ]
+ )
+ main_frame_out["pred_masks_video_res"] = torch.cat(
+ [
+ main_frame_out["pred_masks_video_res"],
+ torch.zeros(
+ (
+ pad_size,
+ 1,
+ singleton_frame_out[
+ "pred_masks_video_res"
+ ].shape[2],
+ singleton_frame_out[
+ "pred_masks_video_res"
+ ].shape[3],
+ ),
+ dtype=singleton_frame_out[
+ "pred_masks_video_res"
+ ].dtype,
+ device=singleton_frame_out[
+ "pred_masks_video_res"
+ ].device,
+ ),
+ ],
+ dim=0,
+ )
+ else:
+ # Create new video_res masks tensor
+ num_objs = len(inference_state["obj_ids"])
+ main_frame_out["pred_masks_video_res"] = torch.zeros(
+ (
+ num_objs,
+ 1,
+ singleton_frame_out[
+ "pred_masks_video_res"
+ ].shape[2],
+ singleton_frame_out[
+ "pred_masks_video_res"
+ ].shape[3],
+ ),
+ dtype=singleton_frame_out[
+ "pred_masks_video_res"
+ ].dtype,
+ device=singleton_frame_out[
+ "pred_masks_video_res"
+ ].device,
+ )
+ # Set singleton video_res mask
+ main_frame_out["pred_masks_video_res"][
+ new_obj_idx : new_obj_idx + 1
+ ] = singleton_frame_out["pred_masks_video_res"]
+
+ # Handle obj_ptr
+ if (
+ "obj_ptr" in singleton_frame_out
+ and self.use_obj_ptrs_in_encoder
+ ):
+ singleton_obj_ptr_data = singleton_multiplex_state.demux(
+ singleton_frame_out["obj_ptr"]
+ ) # [1, D]
+
+ if "obj_ptr" in main_frame_out:
+ # The existing obj_ptr may have been created with a DIFFERENT number of buckets
+ # (before we called multiplex_state.add_objects() which may have created new buckets).
+ # We need to infer the OLD bucket count from the tensor shape to demux it correctly.
+
+ old_obj_ptr_muxed = main_frame_out["obj_ptr"]
+ # Infer old bucket count: shape is [B_old, M_old, D]
+ old_num_buckets = old_obj_ptr_muxed.shape[1]
+
+ # Create temporary multiplex state with old bucket count to demux
+ if old_num_buckets != multiplex_state.num_buckets:
+ # Bucket count changed - cannot safely demux old obj_ptr
+ # Instead, create new obj_ptr from scratch for all objects
+ num_objs = len(inference_state["obj_ids"])
+ obj_ptr_data = torch.zeros(
+ (num_objs, singleton_obj_ptr_data.shape[1]),
+ dtype=singleton_obj_ptr_data.dtype,
+ device=singleton_obj_ptr_data.device,
+ )
+ # Only set the singleton object's ptr, leave others as zeros
+ obj_ptr_data[new_obj_idx : new_obj_idx + 1] = (
+ singleton_obj_ptr_data
+ )
+ main_frame_out["obj_ptr"] = multiplex_state.mux(
+ obj_ptr_data
+ )
+ else:
+ # Bucket count matches - safe to demux
+ main_obj_ptr_data = multiplex_state.demux(
+ old_obj_ptr_muxed
+ )
+
+ # Expand if needed
+ if main_obj_ptr_data.shape[0] < new_obj_idx + 1:
+ pad_size = (
+ new_obj_idx + 1 - main_obj_ptr_data.shape[0]
+ )
+ main_obj_ptr_data = torch.cat(
+ [
+ main_obj_ptr_data,
+ torch.zeros(
+ (
+ pad_size,
+ main_obj_ptr_data.shape[1],
+ ),
+ dtype=main_obj_ptr_data.dtype,
+ device=main_obj_ptr_data.device,
+ ),
+ ],
+ dim=0,
+ )
+
+ main_obj_ptr_data[new_obj_idx : new_obj_idx + 1] = (
+ singleton_obj_ptr_data
+ )
+ main_frame_out["obj_ptr"] = multiplex_state.mux(
+ main_obj_ptr_data
+ )
+ else:
+ # Create new obj_ptr
+ num_objs = len(inference_state["obj_ids"])
+ obj_ptr_data = torch.zeros(
+ (num_objs, singleton_obj_ptr_data.shape[1]),
+ dtype=singleton_obj_ptr_data.dtype,
+ device=singleton_obj_ptr_data.device,
+ )
+ obj_ptr_data[new_obj_idx : new_obj_idx + 1] = (
+ singleton_obj_ptr_data
+ )
+ main_frame_out["obj_ptr"] = multiplex_state.mux(
+ obj_ptr_data
+ )
+
+ # Update conditioning_objects
+ if singleton_obj_idx in singleton_frame_out.get(
+ "conditioning_objects", set()
+ ):
+ main_frame_out["conditioning_objects"].add(new_obj_idx)
+
+ @torch.inference_mode()
+ def add_new_points(
+ self,
+ inference_state,
+ frame_idx,
+ obj_id,
+ points,
+ labels,
+ clear_old_points,
+ rel_coordinates=True,
+ use_prev_mem_frame=False,
+ ):
+ """
+ Add new points to create a new object in the multiplex model.
+
+ This method converts point inputs to masks via the interactivity head and adds
+ the new object to the existing multiplex bucket (for dynamic models).
+
+ Args:
+ inference_state: Current inference state
+ frame_idx: Frame index to add points
+ obj_id: Object ID (will be auto-created if new)
+ points: Point coordinates tensor
+ labels: Point labels tensor (1 for positive, 0 for negative)
+ clear_old_points: Whether to clear old points on this frame
+ rel_coordinates: Whether points are in relative coordinates [0, 1]
+ use_prev_mem_frame: Whether to use previous memory frames (for compatibility)
+
+ Returns:
+ Tuple of (frame_idx, obj_ids, low_res_masks, video_res_masks)
+ """
+ obj_idx = self._obj_id_to_idx(inference_state, obj_id)
+ obj_idxs = [obj_idx]
+ obj_ids = [obj_id]
+
+ point_inputs_per_frame = inference_state["point_inputs_per_obj"][obj_idx]
+ mask_inputs_per_frame = inference_state["mask_inputs_per_obj"][obj_idx]
+
+ if points.dim() == 2:
+ points = points.unsqueeze(0)
+ if labels.dim() == 1:
+ labels = labels.unsqueeze(0)
+
+ if rel_coordinates:
+ points = points * self.image_size
+
+ points = points.to(inference_state["device"])
+ labels = labels.to(inference_state["device"])
+
+ if not clear_old_points:
+ old_point_inputs = point_inputs_per_frame.get(frame_idx, None)
+ else:
+ old_point_inputs = None
+
+ point_inputs = concat_points(old_point_inputs, points, labels)
+ point_inputs_per_frame[frame_idx] = point_inputs
+
+ is_init_cond_frame = frame_idx not in inference_state["frames_already_tracked"]
+
+ if is_init_cond_frame:
+ reverse = False
+ else:
+ reverse = inference_state["frames_already_tracked"][frame_idx]["reverse"]
+
+ is_cond = is_init_cond_frame or self.add_all_frames_to_correct_as_cond
+ storage_key = "cond_frame_outputs" if is_cond else "non_cond_frame_outputs"
+
+ multiplex_state = inference_state["multiplex_state"]
+ is_new_state = multiplex_state is None
+
+ if is_new_state:
+ multiplex_state = self.multiplex_controller.get_state(
+ num_valid_entries=1,
+ device=inference_state["device"],
+ dtype=torch.float32,
+ random=False,
+ object_ids=obj_ids,
+ )
+ inference_state["multiplex_state"] = multiplex_state
+
+ # Determine interaction case:
+ # - New object: never seen before
+ # - Refine: existing mask on tracked frame
+ # - Gap fill: object exists but frame has no output
+ is_existing_object = (
+ not is_new_state
+ and multiplex_state is not None
+ and obj_id in multiplex_state.object_ids
+ )
+
+ if is_existing_object:
+ if is_init_cond_frame:
+ is_new_obj = False
+ is_refine = False
+ is_gap_fill_case = True
+ else:
+ is_new_obj = False
+ is_refine = True
+ is_gap_fill_case = False
+ else:
+ is_new_obj = True
+ is_refine = False
+ is_gap_fill_case = False
+
+ if is_new_obj:
+ should_add_to_existing = not is_new_state
+ allow_new_buckets_local = True
+ prefer_new_buckets_local = True
+
+ current_out, _ = self._run_single_frame_inference(
+ inference_state=inference_state,
+ output_dict=inference_state["output_dict"],
+ frame_idx=frame_idx,
+ batch_size=1,
+ is_init_cond_frame=True,
+ point_inputs=point_inputs,
+ mask_inputs=None,
+ reverse=False,
+ run_mem_encoder=False,
+ prev_sam_mask_logits=None,
+ add_to_existing_state=should_add_to_existing,
+ new_obj_idxs=obj_idxs,
+ new_obj_ids=obj_ids,
+ allow_new_buckets=allow_new_buckets_local,
+ prefer_new_buckets=prefer_new_buckets_local,
+ objects_to_interact=None,
+ )
+ elif is_refine:
+ singleton_state, original_obj_idx = self._extract_object_for_interaction(
+ inference_state, obj_id, frame_idx
+ )
+
+ user_refined_frames_map = inference_state.get(
+ "user_refined_frames_per_obj", {}
+ )
+ user_refined_frames = user_refined_frames_map.get(obj_id)
+ if user_refined_frames is None:
+ user_refined_frames = set()
+ is_first_refinement = frame_idx not in user_refined_frames
+
+ prev_sam_mask_logits_singleton = None
+ if not is_first_refinement:
+ singleton_obj_idx = 0
+ singleton_output_dict = singleton_state["output_dict_per_obj"][
+ singleton_obj_idx
+ ]
+ singleton_temp_output_dict = singleton_state[
+ "temp_output_dict_per_obj"
+ ][singleton_obj_idx]
+
+ # Check BOTH storage keys since previous refinement might be in a different key
+ # (e.g., first refinement creates cond_frame, but after propagation,
+ # second refinement on same frame would look for non_cond_frame)
+ prev_out = None
+
+ storage_key_current = (
+ "cond_frame_outputs" if is_cond else "non_cond_frame_outputs"
+ )
+ prev_out = singleton_temp_output_dict[storage_key_current].get(
+ frame_idx
+ )
+
+ if prev_out is None:
+ prev_out = singleton_output_dict["cond_frame_outputs"].get(
+ frame_idx
+ )
+ if prev_out is None:
+ prev_out = singleton_output_dict["non_cond_frame_outputs"].get(
+ frame_idx
+ )
+
+ if prev_out is not None and prev_out["pred_masks"] is not None:
+ prev_sam_mask_logits_singleton = prev_out["pred_masks"].cuda(
+ non_blocking=True
+ )
+ prev_sam_mask_logits_singleton = torch.clamp(
+ prev_sam_mask_logits_singleton, -32.0, 32.0
+ )
+
+ if is_first_refinement:
+ # ALWAYS use is_init_cond_frame=True to force interaction_only mode
+ # for fresh segmentation from points (not refinement of propagated mask).
+ singleton_is_init_cond = True
+ singleton_objects_to_interact = None
+ else:
+ # Second+ refinement: Incremental refinement for quality improvement
+ singleton_is_init_cond = False
+ singleton_objects_to_interact = (
+ [0] if prev_sam_mask_logits_singleton is not None else None
+ )
+
+ singleton_obj_idx = 0
+ singleton_obj_idxs = [singleton_obj_idx]
+ singleton_obj_ids = [obj_id]
+
+ current_out, _ = self._run_single_frame_inference(
+ inference_state=singleton_state,
+ output_dict=singleton_state["output_dict"],
+ frame_idx=frame_idx,
+ batch_size=1,
+ is_init_cond_frame=singleton_is_init_cond,
+ point_inputs=point_inputs,
+ mask_inputs=None,
+ reverse=False,
+ run_mem_encoder=False,
+ prev_sam_mask_logits=prev_sam_mask_logits_singleton,
+ add_to_existing_state=False,
+ new_obj_idxs=singleton_obj_idxs,
+ new_obj_ids=singleton_obj_ids,
+ allow_new_buckets=False,
+ objects_to_interact=singleton_objects_to_interact,
+ )
+
+ singleton_storage_key = (
+ "cond_frame_outputs"
+ if singleton_is_init_cond
+ else "non_cond_frame_outputs"
+ )
+
+ _, singleton_video_res_masks = self._get_orig_video_res_output(
+ singleton_state, current_out["pred_masks"]
+ )
+ current_out["pred_masks_video_res"] = singleton_video_res_masks
+
+ singleton_state["output_dict"][singleton_storage_key][frame_idx] = (
+ current_out
+ )
+
+ self._merge_singleton_interaction_result(
+ inference_state, singleton_state, obj_id, original_obj_idx
+ )
+
+ obj_idx = inference_state["obj_id_to_idx"][obj_id]
+ obj_idxs = [obj_idx]
+
+ if "user_refined_frames_per_obj" not in inference_state:
+ inference_state["user_refined_frames_per_obj"] = {}
+ if obj_id not in inference_state["user_refined_frames_per_obj"]:
+ inference_state["user_refined_frames_per_obj"][obj_id] = set()
+
+ inference_state["user_refined_frames_per_obj"][obj_id].add(frame_idx)
+
+ merged_frame_out = inference_state["output_dict"][singleton_storage_key][
+ frame_idx
+ ]
+ obj_output_dict = inference_state["output_dict_per_obj"][obj_idx]
+ obj_temp_output_dict = inference_state["temp_output_dict_per_obj"][obj_idx]
+
+ if "pred_masks_video_res" in merged_frame_out:
+ pred_masks_video_res_slice = merged_frame_out["pred_masks_video_res"][
+ obj_idx : obj_idx + 1
+ ]
+ else:
+ _, video_res_masks = self._get_orig_video_res_output(
+ inference_state, merged_frame_out["pred_masks"]
+ )
+ pred_masks_video_res_slice = video_res_masks[obj_idx : obj_idx + 1]
+
+ pred_masks_slice = merged_frame_out["pred_masks"][obj_idx : obj_idx + 1]
+
+ obj_temp_output_dict[singleton_storage_key][frame_idx] = {
+ "pred_masks": pred_masks_slice,
+ "pred_masks_video_res": pred_masks_video_res_slice,
+ "object_score_logits": merged_frame_out["object_score_logits"][
+ obj_idx : obj_idx + 1
+ ],
+ }
+ obj_output_dict[singleton_storage_key][frame_idx] = obj_temp_output_dict[
+ singleton_storage_key
+ ][frame_idx]
+
+ elif is_gap_fill_case:
+ # Gap fill: Run inference directly in multiplex mode (no singleton extraction)
+ # Even though is_init_cond_frame=True, we use add_to_existing_state=False
+ # because the object ALREADY EXISTS in multiplex state.
+ obj_idx = inference_state["obj_id_to_idx"][obj_id]
+ obj_idxs = [obj_idx]
+ batch_size = self._get_obj_num(inference_state)
+
+ obj_output_dict = inference_state["output_dict_per_obj"][obj_idx]
+ obj_temp_output_dict = inference_state["temp_output_dict_per_obj"][obj_idx]
+
+ current_out, _ = self._run_single_frame_inference(
+ inference_state=inference_state,
+ output_dict=inference_state["output_dict"],
+ frame_idx=frame_idx,
+ batch_size=batch_size,
+ is_init_cond_frame=True,
+ point_inputs=point_inputs,
+ mask_inputs=None,
+ reverse=False,
+ run_mem_encoder=False,
+ prev_sam_mask_logits=None,
+ add_to_existing_state=False,
+ new_obj_idxs=[obj_idx],
+ new_obj_ids=[obj_id],
+ allow_new_buckets=False,
+ prefer_new_buckets=False,
+ objects_to_interact=[obj_idx],
+ )
+
+ current_out["local_obj_id_to_idx"] = deepcopy(
+ inference_state["obj_id_to_idx"]
+ )
+
+ _, video_res_masks = self._get_orig_video_res_output(
+ inference_state, current_out["pred_masks"]
+ )
+ current_out["pred_masks_video_res"] = video_res_masks
+
+ is_cond = storage_key == "cond_frame_outputs"
+ if (
+ is_cond
+ and frame_idx
+ in inference_state["output_dict"]["non_cond_frame_outputs"]
+ ):
+ del inference_state["output_dict"]["non_cond_frame_outputs"][frame_idx]
+ if "consolidated_frame_inds" in inference_state:
+ inference_state["consolidated_frame_inds"][
+ "non_cond_frame_outputs"
+ ].discard(frame_idx)
+
+ # Store consolidated output (has obj_ptr, maskmem_features, etc.)
+ inference_state["output_dict"][storage_key][frame_idx] = current_out
+
+ # Mark as consolidated
+ if "consolidated_frame_inds" in inference_state:
+ inference_state["consolidated_frame_inds"][storage_key].add(frame_idx)
+
+ # Also store per-object slices in temp_output_dict_per_obj
+ obj_temp_output_dict[storage_key][frame_idx] = {
+ "pred_masks": current_out["pred_masks"][obj_idx : obj_idx + 1],
+ "pred_masks_video_res": video_res_masks[obj_idx : obj_idx + 1],
+ "object_score_logits": current_out["object_score_logits"][
+ obj_idx : obj_idx + 1
+ ],
+ }
+ obj_output_dict[storage_key][frame_idx] = obj_temp_output_dict[storage_key][
+ frame_idx
+ ]
+
+ # Store outputs and prepare return values
+ obj_output_dict = inference_state["output_dict_per_obj"][obj_idx]
+ obj_temp_output_dict = inference_state["temp_output_dict_per_obj"][obj_idx]
+
+ # For refinement/gap fill (singleton extraction), handle singleton output specially
+ if is_refine or is_gap_fill_case:
+ # Singleton case: The merge already updated the consolidated output_dict during merge.
+ # However, we need to ensure the frame is properly stored and marked.
+
+ singleton_obj_idx = 0
+
+ # Get video resolution masks from singleton output
+ _, video_res_masks_singleton = self._get_orig_video_res_output(
+ inference_state, current_out["pred_masks"]
+ )
+
+ # Mark frame as consolidated (prevents double consolidation in preflight)
+ if "consolidated_frame_inds" in inference_state:
+ inference_state["consolidated_frame_inds"][storage_key].add(frame_idx)
+
+ # For return value, use singleton masks
+ video_res_masks_to_return = video_res_masks_singleton[
+ singleton_obj_idx : singleton_obj_idx + 1
+ ]
+ else:
+ # Standard multiplex output - use obj_idx
+ _, video_res_masks = self._get_orig_video_res_output(
+ inference_state, current_out["pred_masks"]
+ )
+
+ current_out["pred_masks_video_res"] = video_res_masks
+ current_out["local_obj_id_to_idx"] = deepcopy(
+ inference_state["obj_id_to_idx"]
+ )
+
+ # Remove from non_cond if this becomes a cond frame
+ if (
+ is_cond
+ and frame_idx
+ in inference_state["output_dict"]["non_cond_frame_outputs"]
+ ):
+ del inference_state["output_dict"]["non_cond_frame_outputs"][frame_idx]
+ # Also update consolidated_frame_inds
+ if "consolidated_frame_inds" in inference_state:
+ inference_state["consolidated_frame_inds"][
+ "non_cond_frame_outputs"
+ ].discard(frame_idx)
+
+ inference_state["output_dict"][storage_key][frame_idx] = current_out
+
+ # Update consolidated_frame_inds to track this frame
+ if "consolidated_frame_inds" in inference_state:
+ inference_state["consolidated_frame_inds"][storage_key].add(frame_idx)
+
+ # Store per-object outputs (slice from the full multiplex output)
+ obj_temp_output_dict[storage_key][frame_idx] = {
+ "pred_masks_video_res": current_out["pred_masks_video_res"][
+ obj_idx : obj_idx + 1
+ ],
+ "pred_masks": current_out["pred_masks"][obj_idx : obj_idx + 1],
+ "object_score_logits": current_out["object_score_logits"][
+ obj_idx : obj_idx + 1
+ ],
+ }
+
+ obj_output_dict[storage_key][frame_idx] = obj_temp_output_dict[storage_key][
+ frame_idx
+ ]
+
+ video_res_masks_to_return = video_res_masks[obj_idx : obj_idx + 1]
+
+ low_res_masks = None
+ return frame_idx, obj_ids, low_res_masks, video_res_masks_to_return
+
+ @torch.inference_mode()
+ def add_new_masks(
+ self,
+ inference_state,
+ frame_idx,
+ obj_ids,
+ masks,
+ # for compatibility with per_obj_inference class, not used here
+ add_mask_to_memory=False,
+ # for object reconditioning; do not update the multiplex state
+ reconditioning=False,
+ ):
+ """Add new mask to a frame."""
+ if isinstance(obj_ids, np.ndarray):
+ obj_ids = obj_ids.tolist()
+ obj_idxs = [
+ self._obj_id_to_idx(inference_state, obj_id, error_if_new=reconditioning)
+ for obj_id in obj_ids
+ ]
+ point_inputs_per_frame = [
+ inference_state["point_inputs_per_obj"][obj_idx] for obj_idx in obj_idxs
+ ]
+ mask_inputs_per_frame = [
+ inference_state["mask_inputs_per_obj"][obj_idx] for obj_idx in obj_idxs
+ ]
+
+ assert masks.dim() == 3
+ num_objects, mask_H, mask_W = masks.shape
+ assert num_objects == len(obj_ids)
+ masks_inputs_orig = masks[:, None, :, :] # add channel dimension
+ masks_inputs_orig = masks_inputs_orig.float().to(inference_state["device"])
+
+ # resize the mask if it doesn't match the model's input mask size
+ if mask_H != self.input_mask_size or mask_W != self.input_mask_size:
+ mask_inputs = torch.nn.functional.interpolate(
+ masks_inputs_orig,
+ size=(self.input_mask_size, self.input_mask_size),
+ align_corners=False,
+ mode="bilinear",
+ antialias=True, # use antialias for downsampling
+ )
+ else:
+ mask_inputs = masks_inputs_orig
+
+ # also get the mask at the original video resolution (for outputting)
+ video_H = inference_state["video_height"]
+ video_W = inference_state["video_width"]
+ if mask_H != video_H or mask_W != video_W:
+ mask_inputs_video_res = torch.nn.functional.interpolate(
+ masks_inputs_orig,
+ size=(video_H, video_W),
+ align_corners=False,
+ mode="bilinear",
+ antialias=True, # use antialias for potential downsampling
+ )
+ else:
+ mask_inputs_video_res = masks_inputs_orig
+ # convert mask_inputs_video_res to binary (threshold at 0.5 as it is in range 0~1)
+ mask_inputs_video_res = mask_inputs_video_res > 0.5
+
+ multiplex_state = inference_state["multiplex_state"]
+ is_new_state = multiplex_state is None
+
+ if not reconditioning:
+ if is_new_state:
+ multiplex_state = self.multiplex_controller.get_state(
+ num_valid_entries=num_objects,
+ device=inference_state["device"],
+ dtype=torch.float32, # lower precision is also fine
+ random=False,
+ object_ids=obj_ids,
+ )
+ inference_state["multiplex_state"] = multiplex_state
+ else:
+ assert self.is_dynamic_model, (
+ "New objects are not allowed after state creation"
+ )
+
+ for i in range(num_objects):
+ mask_inputs_per_frame[i][frame_idx] = mask_inputs_video_res[i : i + 1]
+ point_inputs_per_frame[i].pop(frame_idx, None)
+ # If this frame hasn't been tracked before, we treat it as an initial conditioning
+ # frame, meaning that the inputs points are to generate segments on this frame without
+ # using any memory from other frames, like in SAM. Otherwise (if it has been tracked),
+ # the input points will be used to correct the already tracked masks.
+ is_init_cond_frame = frame_idx not in inference_state["frames_already_tracked"]
+ # whether to track in reverse time order
+ if is_init_cond_frame:
+ reverse = False
+ else:
+ reverse = inference_state["frames_already_tracked"][frame_idx]["reverse"]
+ obj_output_dicts = [
+ inference_state["output_dict_per_obj"][obj_idx] for obj_idx in obj_idxs
+ ]
+ obj_temp_output_dicts = [
+ inference_state["temp_output_dict_per_obj"][obj_idx] for obj_idx in obj_idxs
+ ]
+ # Add a frame to conditioning output if it's an initial conditioning frame or
+ # if the model sees all frames receiving clicks/mask as conditioning frames.
+ is_cond = is_init_cond_frame or self.add_all_frames_to_correct_as_cond
+ storage_key = "cond_frame_outputs" if is_cond else "non_cond_frame_outputs"
+
+ # Allow creating a new bucket only when existing buckets cannot fit the new objects
+ allow_new_buckets_local = False
+ if not is_new_state and not reconditioning and multiplex_state is not None:
+ if multiplex_state.available_slots < num_objects:
+ allow_new_buckets_local = True
+
+ current_out, _ = self._run_single_frame_inference(
+ inference_state=inference_state,
+ output_dict=inference_state["output_dict"],
+ frame_idx=frame_idx,
+ batch_size=num_objects,
+ is_init_cond_frame=is_init_cond_frame,
+ point_inputs=None,
+ mask_inputs=mask_inputs,
+ reverse=reverse,
+ # Skip the memory encoder when adding clicks or mask. We execute the memory encoder
+ # at the beginning of `propagate_in_video` (after user finalize their clicks). This
+ # allows us to enforce non-overlapping constraints on all objects before encoding
+ # them into memory.
+ run_mem_encoder=False,
+ add_to_existing_state=not is_new_state and not reconditioning,
+ new_obj_idxs=obj_idxs,
+ new_obj_ids=obj_ids,
+ allow_new_buckets=allow_new_buckets_local,
+ reconditioning=reconditioning,
+ )
+ # We directly use the input mask at video resolution as the output mask for a better
+ # video editing experience (so that the masks don't change after each brushing).
+ # Here NO_OBJ_SCORE is a large negative value to represent the background and
+ # similarly -NO_OBJ_SCORE is a large positive value to represent the foreground.
+ _, video_res_masks = self._get_orig_video_res_output(
+ inference_state, current_out["pred_masks"]
+ )
+ obj_idxs_t = torch.as_tensor(obj_idxs, device=video_res_masks.device)
+ video_res_masks[obj_idxs_t] = torch.where(
+ mask_inputs_video_res, -NO_OBJ_SCORE, NO_OBJ_SCORE
+ )
+
+ current_out["pred_masks_video_res"] = video_res_masks
+ with torch.profiler.record_function("add_new_masks._deepcopy"):
+ current_out["local_obj_id_to_idx"] = deepcopy(
+ inference_state["obj_id_to_idx"]
+ )
+ if (
+ is_cond
+ and frame_idx in inference_state["output_dict"]["non_cond_frame_outputs"]
+ ):
+ del inference_state["output_dict"]["non_cond_frame_outputs"][frame_idx]
+ # Also update consolidated_frame_inds
+ if "consolidated_frame_inds" in inference_state:
+ inference_state["consolidated_frame_inds"][
+ "non_cond_frame_outputs"
+ ].discard(frame_idx)
+
+ inference_state["output_dict"][storage_key][frame_idx] = current_out
+
+ # Update consolidated_frame_inds to track this frame
+ if "consolidated_frame_inds" in inference_state:
+ inference_state["consolidated_frame_inds"][storage_key].add(frame_idx)
+
+ with torch.profiler.record_function("add_new_masks.obj_loop"):
+ # Step 1: Set all new object masks first (batched)
+ for i, obj_idx in enumerate(obj_idxs):
+ # Add the predicted masks to the output dict
+ # NOTE: object ordering matters here but I guess this is the same for the per-object implementation
+ obj_temp_output_dicts[i][storage_key][frame_idx] = {
+ "pred_masks_video_res": current_out["pred_masks_video_res"][
+ obj_idx : obj_idx + 1
+ ]
+ }
+ obj_output_dicts[i][storage_key][frame_idx] = obj_temp_output_dicts[i][
+ storage_key
+ ][frame_idx]
+
+ # Step 2: Precompute suppress masks to avoid O(n*m) torch.where calls
+ # Combined mask of all new objects (for existing objects)
+ combined_new_mask = mask_inputs_video_res.any(
+ dim=0, keepdim=True
+ ) # (1, 1, H, W)
+
+ # Precompute exclude-self masks for new objects (if there are multiple new objects)
+ num_new = len(obj_idxs)
+ exclude_self_masks = {}
+ if num_new > 1:
+ for i in range(num_new):
+ other_indices = torch.cat(
+ [
+ torch.arange(i, device=mask_inputs_video_res.device),
+ torch.arange(
+ i + 1, num_new, device=mask_inputs_video_res.device
+ ),
+ ]
+ )
+ exclude_self_masks[obj_idxs[i]] = mask_inputs_video_res[
+ other_indices
+ ].any(dim=0, keepdim=True)
+
+ # Step 3: Apply suppression to all objects in a single pass
+ temp_output_dict_per_obj = inference_state["temp_output_dict_per_obj"]
+ obj_idxs_set = set(obj_idxs)
+
+ for obj_idx2, obj_temp_output_dict2 in temp_output_dict_per_obj.items():
+ current_out2 = obj_temp_output_dict2[storage_key].get(frame_idx, None)
+ if current_out2 is None:
+ continue
+
+ if obj_idx2 not in obj_idxs_set:
+ # Existing object: suppress by all new masks
+ suppress_mask = combined_new_mask
+ elif obj_idx2 in exclude_self_masks:
+ # New object: suppress by other new objects' masks
+ suppress_mask = exclude_self_masks[obj_idx2]
+ else:
+ # Only one new object - nothing to suppress for itself
+ continue
+
+ current_out2["pred_masks_video_res"] = torch.where(
+ suppress_mask,
+ NO_OBJ_SCORE,
+ current_out2["pred_masks_video_res"],
+ )
+
+ # Resize the output mask to the original video resolution
+ obj_ids = inference_state["obj_ids"]
+ consolidated_out = self._consolidate_temp_output_across_obj(
+ inference_state,
+ frame_idx,
+ is_cond=is_cond,
+ run_mem_encoder=False,
+ consolidate_at_video_res=True,
+ )
+ _, video_res_masks = self._get_orig_video_res_output(
+ inference_state, consolidated_out["pred_masks_video_res"]
+ )
+ low_res_masks = None # not needed by the demo
+
+ consolidated_out["local_obj_id_to_idx"] = current_out["local_obj_id_to_idx"]
+
+ return frame_idx, obj_ids, low_res_masks, video_res_masks
+
+ def _get_orig_video_res_output(self, inference_state, any_res_masks):
+ """
+ Resize the object scores to the original video resolution (video_res_masks)
+ and apply non-overlapping constraints for final output.
+ """
+ device = inference_state["device"]
+ video_H = inference_state["video_height"]
+ video_W = inference_state["video_width"]
+ any_res_masks = any_res_masks.to(device, non_blocking=True)
+ if any_res_masks.shape[-2:] == (video_H, video_W):
+ video_res_masks = any_res_masks
+ else:
+ video_res_masks = torch.nn.functional.interpolate(
+ any_res_masks,
+ size=(video_H, video_W),
+ mode="bilinear",
+ align_corners=False,
+ )
+ if self.non_overlap_masks_for_output:
+ video_res_masks = self._apply_non_overlapping_constraints(video_res_masks)
+ # potentially fill holes in the predicted masks
+ if self.fill_hole_area > 0:
+ video_res_masks = fill_holes_in_mask_scores(
+ video_res_masks, self.fill_hole_area
+ )
+ return any_res_masks, video_res_masks
+
+ def _consolidate_temp_output_across_obj(
+ self,
+ inference_state,
+ frame_idx,
+ is_cond,
+ run_mem_encoder,
+ consolidate_at_video_res=False,
+ ):
+ """
+ Consolidate the per-object temporary outputs in `temp_output_dict_per_obj` on
+ a frame into a single output for all objects, including
+ 1) fill any missing objects either from `output_dict_per_obj` (if they exist in
+ `output_dict_per_obj` for this frame) or leave them as placeholder values
+ (if they don't exist in `output_dict_per_obj` for this frame);
+ 2) if specified, rerun memory encoder after apply non-overlapping constraints
+ on the object scores.
+ """
+ batch_size = self._get_obj_num(inference_state)
+ storage_key = "cond_frame_outputs" if is_cond else "non_cond_frame_outputs"
+
+ # After singleton merge, objects can be added at indices beyond batch_size
+ # We need to find the maximum object index that has temp or regular outputs to size the tensor correctly
+ max_obj_idx = batch_size - 1 # Default to batch_size - 1
+
+ # Check both temp and regular output dicts to find max index
+ for obj_idx in inference_state["temp_output_dict_per_obj"].keys():
+ if obj_idx > max_obj_idx:
+ max_obj_idx = obj_idx
+ for obj_idx in inference_state["output_dict_per_obj"].keys():
+ if obj_idx > max_obj_idx:
+ max_obj_idx = obj_idx
+
+ # Size the consolidated tensor to accommodate all object indices (not just count)
+ consolidated_batch_size = max(max_obj_idx + 1, 0) # Ensure non-negative
+
+ # Optionally, we allow consolidating the temporary outputs at the original
+ # video resolution (to provide a better editing experience for mask prompts).
+ if consolidate_at_video_res:
+ assert not run_mem_encoder, "memory encoder cannot run at video resolution"
+ consolidated_H = inference_state["video_height"]
+ consolidated_W = inference_state["video_width"]
+ consolidated_mask_key = "pred_masks_video_res"
+ else:
+ consolidated_H = consolidated_W = self.low_res_mask_size
+ consolidated_mask_key = "pred_masks"
+
+ # Initialize `consolidated_out`. Its "maskmem_features" and "maskmem_pos_enc"
+ # will be added when rerunning the memory encoder after applying non-overlapping
+ # constraints to object scores. Its "pred_masks" are prefilled with a large
+ # negative value (NO_OBJ_SCORE) to represent missing objects.
+
+ consolidated_out = {
+ "conditioning_objects": None,
+ "maskmem_features": None,
+ "maskmem_pos_enc": None,
+ "image_features": None,
+ "image_pos_enc": None,
+ "obj_ptr": None,
+ consolidated_mask_key: torch.full(
+ size=(
+ consolidated_batch_size,
+ 1,
+ consolidated_H,
+ consolidated_W,
+ ), # Use consolidated_batch_size, not batch_size!
+ fill_value=NO_OBJ_SCORE,
+ dtype=torch.float32,
+ device=inference_state["storage_device"],
+ ),
+ }
+
+ all_out = inference_state["output_dict"]["cond_frame_outputs"].get(
+ frame_idx, None
+ )
+ if all_out is None:
+ all_out = inference_state["output_dict"]["non_cond_frame_outputs"].get(
+ frame_idx, None
+ )
+
+ # Handle the case where output_dict is empty (e.g., during demo VG propagation)
+ # In this case, we'll reconstruct the consolidated output from per-object outputs
+ need_to_reconstruct_from_per_obj = all_out is None
+
+ if need_to_reconstruct_from_per_obj:
+ # Initialize fields that will be populated from per-object outputs or later
+ # Determine which objects are conditioned by checking if they have point/mask inputs on this frame
+ conditioning_objects = set()
+ for obj_idx in range(batch_size):
+ # Check if this object has point inputs on this frame
+ if obj_idx in inference_state["point_inputs_per_obj"]:
+ point_inputs = inference_state["point_inputs_per_obj"][obj_idx]
+ if (
+ frame_idx in point_inputs
+ and point_inputs[frame_idx] is not None
+ ):
+ conditioning_objects.add(obj_idx)
+ continue
+
+ # Check if this object has mask inputs on this frame
+ if obj_idx in inference_state["mask_inputs_per_obj"]:
+ mask_inputs = inference_state["mask_inputs_per_obj"][obj_idx]
+ if frame_idx in mask_inputs and mask_inputs[frame_idx] is not None:
+ conditioning_objects.add(obj_idx)
+
+ consolidated_out["conditioning_objects"] = conditioning_objects
+ # Shared features will be populated when running memory encoder
+ # Note: obj_ptr and object_score_logits will be populated from per-object outputs below
+ else:
+ # Normal case: populate from existing consolidated output
+ consolidated_out["conditioning_objects"] = all_out.get(
+ "conditioning_objects", set()
+ )
+ consolidated_out["obj_ptr"] = all_out["obj_ptr"]
+ consolidated_out["object_score_logits"] = all_out["object_score_logits"]
+ if self.use_memory_selection:
+ consolidated_out["iou_score"] = all_out["iou_score"]
+ # These fields might not exist in per-object outputs (e.g., after singleton extraction)
+ consolidated_out["maskmem_features"] = all_out.get("maskmem_features")
+ consolidated_out["maskmem_pos_enc"] = all_out.get("maskmem_pos_enc")
+ consolidated_out["image_features"] = all_out.get("image_features")
+ consolidated_out["image_pos_enc"] = all_out.get("image_pos_enc")
+ consolidated_out["local_obj_id_to_idx"] = all_out.get(
+ "local_obj_id_to_idx", {}
+ )
+ consolidated_out["obj_ptr"] = all_out["obj_ptr"]
+ consolidated_out["object_score_logits"] = all_out["object_score_logits"]
+ if self.use_memory_selection:
+ consolidated_out["iou_score"] = all_out["iou_score"]
+ # These fields might not exist in per-object outputs (e.g., after singleton extraction)
+ consolidated_out["maskmem_features"] = all_out.get("maskmem_features")
+ consolidated_out["maskmem_pos_enc"] = all_out.get("maskmem_pos_enc")
+ consolidated_out["image_features"] = all_out.get("image_features")
+ consolidated_out["image_pos_enc"] = all_out.get("image_pos_enc")
+ consolidated_out["local_obj_id_to_idx"] = all_out.get(
+ "local_obj_id_to_idx", {}
+ )
+ all_mask = all_out.get("pred_masks_video_res", all_out["pred_masks"])
+ # Ensure masks are at the correct consolidated resolution
+ # This handles the case where all_out has interactive resolution (288) masks
+ # that need to be resized to SAM2's low_res_mask_size (256) for consistency
+ if all_mask.shape[-2:] == (consolidated_H, consolidated_W):
+ consolidated_out[consolidated_mask_key] = all_mask
+ else:
+ # Resize first if mask has a different resolution (e.g., 288 from interactive)
+ # Determine if we're downsampling or upsampling
+ is_downsampling = all_mask.shape[-1] > consolidated_W
+ resized_mask = torch.nn.functional.interpolate(
+ all_mask,
+ size=(consolidated_H, consolidated_W),
+ mode="bilinear",
+ align_corners=False,
+ antialias=is_downsampling, # use antialias for downsampling
+ )
+ consolidated_out[consolidated_mask_key] = resized_mask
+
+ # Collect per-object outputs (masks and scores) to build consolidated output
+ # When reconstructing from per-object outputs, we also need to collect obj_ptr and object_score_logits
+ obj_score_logits_list = []
+ obj_ptr_list = [] if need_to_reconstruct_from_per_obj else None
+ iou_scores_list = (
+ []
+ if need_to_reconstruct_from_per_obj and self.use_memory_selection
+ else None
+ )
+
+ # When reconstructing from per-object outputs, initialize the mask tensor
+ # with the correct size (consolidated_batch_size, not batch_size)
+ if (
+ need_to_reconstruct_from_per_obj
+ and consolidated_mask_key not in consolidated_out
+ ):
+ # Initialize with zeros - will be populated from per-object outputs below
+ consolidated_out[consolidated_mask_key] = torch.zeros(
+ (consolidated_batch_size, 1, consolidated_H, consolidated_W),
+ dtype=torch.float32,
+ device=inference_state["storage_device"],
+ )
+ consolidated_out["object_score_logits"] = torch.full(
+ (consolidated_batch_size, 1),
+ NO_OBJ_SCORE,
+ dtype=torch.float32,
+ device=inference_state["storage_device"],
+ )
+
+ for obj_idx in range(
+ consolidated_batch_size
+ ): # Use consolidated_batch_size instead of batch_size
+ # Check if this object index exists in temp/output dicts (it may not if object was just added)
+ if obj_idx not in inference_state["temp_output_dict_per_obj"]:
+ continue
+ if obj_idx not in inference_state["output_dict_per_obj"]:
+ continue
+ obj_temp_output_dict = inference_state["temp_output_dict_per_obj"][obj_idx]
+ obj_output_dict = inference_state["output_dict_per_obj"][obj_idx]
+ out = obj_temp_output_dict[storage_key].get(frame_idx, None)
+ # If the object doesn't appear in "temp_output_dict_per_obj" on this frame,
+ # we fall back and look up its previous output in "output_dict_per_obj".
+ # We look up both "cond_frame_outputs" and "non_cond_frame_outputs" in
+ # "output_dict_per_obj" to find a previous output for this object.
+ if out is None:
+ out = obj_output_dict["cond_frame_outputs"].get(frame_idx, None)
+ if out is None:
+ out = obj_output_dict["non_cond_frame_outputs"].get(frame_idx, None)
+ if out is None:
+ # object pointers are filled globally above; we don't need empty_mask_ptr
+ continue
+ # Add the temporary object output mask to consolidated output mask
+ # (use "pred_masks_video_res" if it's available)
+ obj_mask = out.get("pred_masks_video_res")
+ if obj_mask is None:
+ obj_mask = out.get("pred_masks")
+ consolidated_pred_masks = consolidated_out[consolidated_mask_key]
+
+ # If obj_idx is beyond the consolidated_pred_masks size,
+ # we need to expand it (can happen after singleton merge adds object at end)
+ if obj_idx >= consolidated_pred_masks.shape[0]:
+ pad_size = obj_idx + 1 - consolidated_pred_masks.shape[0]
+ consolidated_pred_masks = torch.cat(
+ [
+ consolidated_pred_masks,
+ torch.zeros(
+ (
+ pad_size,
+ 1,
+ consolidated_pred_masks.shape[-2],
+ consolidated_pred_masks.shape[-1],
+ ),
+ dtype=consolidated_pred_masks.dtype,
+ device=consolidated_pred_masks.device,
+ ),
+ ],
+ dim=0,
+ )
+ consolidated_out[consolidated_mask_key] = consolidated_pred_masks
+ # Also expand object_score_logits if present
+ if "object_score_logits" in consolidated_out:
+ consolidated_scores = consolidated_out["object_score_logits"]
+ consolidated_scores = torch.cat(
+ [
+ consolidated_scores,
+ torch.full(
+ (pad_size, 1),
+ NO_OBJ_SCORE,
+ dtype=consolidated_scores.dtype,
+ device=consolidated_scores.device,
+ ),
+ ],
+ dim=0,
+ )
+ consolidated_out["object_score_logits"] = consolidated_scores
+
+ if obj_mask.shape[-2:] == consolidated_pred_masks.shape[-2:]:
+ # Ensure dtype match between source and destination before assignment
+ if obj_mask.dtype != consolidated_pred_masks.dtype:
+ obj_mask = obj_mask.to(consolidated_pred_masks.dtype)
+ consolidated_pred_masks[obj_idx : obj_idx + 1] = obj_mask
+ else:
+ # Resize first if temporary object mask has a different resolution
+ is_downsampling = "pred_masks_video_res" in out
+ resized_obj_mask = torch.nn.functional.interpolate(
+ obj_mask,
+ size=consolidated_pred_masks.shape[-2:],
+ mode="bilinear",
+ align_corners=False,
+ antialias=is_downsampling, # use antialias for downsampling
+ )
+ # Ensure dtype match between source and destination before assignment
+ if resized_obj_mask.dtype != consolidated_pred_masks.dtype:
+ resized_obj_mask = resized_obj_mask.to(
+ consolidated_pred_masks.dtype
+ )
+ consolidated_pred_masks[obj_idx : obj_idx + 1] = resized_obj_mask
+
+ # When reconstructing from per-object outputs, also collect scores
+ if need_to_reconstruct_from_per_obj:
+ if "object_score_logits" in out:
+ obj_score_logits_list.append(out["object_score_logits"])
+ if self.use_memory_selection and "iou_score" in out:
+ iou_scores_list.append(out["iou_score"])
+
+ # If we reconstructed from per-object outputs, consolidate the score fields
+ if need_to_reconstruct_from_per_obj:
+ # Check if we have ANY valid per-object outputs
+ # If not, we're trying to consolidate a VG-propagated frame that was never
+ # stored in output_dict (only in cached_frame_outputs)
+ # In this case, we SKIP memory encoding during preflight and will do it
+ # during the first propagation step instead
+ if not obj_score_logits_list and run_mem_encoder:
+ run_mem_encoder = False # Skip for now, will encode during propagation
+
+ if obj_score_logits_list:
+ consolidated_out["object_score_logits"] = torch.cat(
+ obj_score_logits_list, dim=0
+ )
+ else:
+ # Create placeholder scores - these will be replaced when memory encoder runs
+ device = inference_state["device"]
+ consolidated_out["object_score_logits"] = torch.zeros(
+ (batch_size, 1),
+ dtype=torch.float32,
+ device=device,
+ )
+
+ if self.use_memory_selection:
+ if iou_scores_list:
+ consolidated_out["iou_score"] = torch.cat(iou_scores_list, dim=0)
+ else:
+ consolidated_out["iou_score"] = None
+
+ # obj_ptr will be populated by memory encoder, set to None for now
+ consolidated_out["obj_ptr"] = None
+
+ # Optionally, apply non-overlapping constraints on the consolidated scores
+ # and rerun the memory encoder
+ if run_mem_encoder:
+ device = inference_state["device"]
+ high_res_masks = torch.nn.functional.interpolate(
+ consolidated_out["pred_masks"].to(device, non_blocking=True),
+ size=(self.image_size, self.image_size),
+ mode="bilinear",
+ align_corners=False,
+ )
+ high_res_masks = self._apply_non_overlapping_constraints(high_res_masks)
+ maskmem_features, maskmem_pos_enc, image_features, image_pos_enc = (
+ self._run_memory_encoder(
+ inference_state=inference_state,
+ frame_idx=frame_idx,
+ batch_size=batch_size,
+ high_res_masks=high_res_masks,
+ object_score_logits=consolidated_out["object_score_logits"],
+ is_mask_from_pts=True, # these frames are what the user interacted with
+ conditioning_objects=consolidated_out[
+ "conditioning_objects"
+ ], # Pass conditioning_objects
+ )
+ )
+ consolidated_out["maskmem_features"] = maskmem_features
+ consolidated_out["maskmem_pos_enc"] = maskmem_pos_enc
+ consolidated_out["image_features"] = image_features
+ consolidated_out["image_pos_enc"] = image_pos_enc
+
+ return consolidated_out
+
+ @torch.inference_mode()
+ def propagate_in_video_preflight(self, inference_state, run_mem_encoder=True):
+ """Prepare inference_state and consolidate temporary outputs before tracking."""
+ inference_state["tracking_has_started"] = True
+ batch_size = self._get_obj_num(inference_state)
+
+ # Consolidate per-object temporary outputs in "temp_output_dict_per_obj" and
+ # add them into "output_dict".
+ temp_output_dict_per_obj = inference_state["temp_output_dict_per_obj"]
+ output_dict = inference_state["output_dict"]
+ # "consolidated_frame_inds" contains indices of those frames where consolidated
+ # temporary outputs have been added (either in this call or any previous calls
+ # to `propagate_in_video_preflight`).
+ consolidated_frame_inds = inference_state["consolidated_frame_inds"]
+ for is_cond in [False, True]:
+ # Separately consolidate conditioning and non-conditioning temp outptus
+ storage_key = "cond_frame_outputs" if is_cond else "non_cond_frame_outputs"
+ # Find all the frames that contain temporary outputs for any objects
+ # (these should be the frames that have just received clicks for mask inputs
+ # via `add_new_points` or `add_new_mask`)
+ temp_frame_inds = set()
+ for obj_temp_output_dict in temp_output_dict_per_obj.values():
+ temp_frame_inds.update(obj_temp_output_dict[storage_key].keys())
+ consolidated_frame_inds[storage_key].update(temp_frame_inds)
+ # consolidate the temprary output across all objects on this frame
+ for frame_idx in temp_frame_inds:
+ consolidated_out = self._consolidate_temp_output_across_obj(
+ inference_state,
+ frame_idx,
+ is_cond=is_cond,
+ run_mem_encoder=run_mem_encoder,
+ )
+ # merge them into "output_dict" and also create per-object slices
+ output_dict[storage_key][frame_idx] = consolidated_out
+ self._add_output_per_object(
+ inference_state, frame_idx, consolidated_out, storage_key
+ )
+ clear_non_cond_mem = self.clear_non_cond_mem_around_input and (
+ self.clear_non_cond_mem_for_multi_obj or batch_size <= 1
+ )
+ if clear_non_cond_mem:
+ # clear non-conditioning memory of the surrounding frames
+ self._clear_non_cond_mem_around_input(inference_state, frame_idx)
+
+ # clear temporary outputs in `temp_output_dict_per_obj`
+ for obj_temp_output_dict in temp_output_dict_per_obj.values():
+ obj_temp_output_dict[storage_key].clear()
+
+ # edge case: if an output is added to "cond_frame_outputs", we remove any prior
+ # output on the same frame in "non_cond_frame_outputs"
+ for frame_idx in output_dict["cond_frame_outputs"]:
+ output_dict["non_cond_frame_outputs"].pop(frame_idx, None)
+ for obj_output_dict in inference_state["output_dict_per_obj"].values():
+ for frame_idx in obj_output_dict["cond_frame_outputs"]:
+ obj_output_dict["non_cond_frame_outputs"].pop(frame_idx, None)
+ for frame_idx in consolidated_frame_inds["cond_frame_outputs"]:
+ assert frame_idx in output_dict["cond_frame_outputs"]
+ consolidated_frame_inds["non_cond_frame_outputs"].discard(frame_idx)
+
+ # Make sure that the frame indices in "consolidated_frame_inds" are exactly those frames
+ # with either points or mask inputs (which should be true under a correct demo workflow).
+ all_consolidated_frame_inds = (
+ consolidated_frame_inds["cond_frame_outputs"]
+ | consolidated_frame_inds["non_cond_frame_outputs"]
+ )
+
+ input_frames_inds = set()
+ for point_inputs_per_frame in inference_state["point_inputs_per_obj"].values():
+ input_frames_inds.update(point_inputs_per_frame.keys())
+ for mask_inputs_per_frame in inference_state["mask_inputs_per_obj"].values():
+ input_frames_inds.update(mask_inputs_per_frame.keys())
+ assert all_consolidated_frame_inds == input_frames_inds
+ # Record the first interacted frame index (for tracking start)
+ if inference_state["first_ann_frame_idx"] is None:
+ inference_state["first_ann_frame_idx"] = min(
+ input_frames_inds, default=None
+ )
+ # In case `first_ann_frame_idx` is not in the conditioning frames (e.g. because
+ # we cleared the input points on that frame), pick the first conditioning frame
+ if (
+ inference_state["first_ann_frame_idx"]
+ not in output_dict["cond_frame_outputs"]
+ ):
+ inference_state["first_ann_frame_idx"] = min(
+ output_dict["cond_frame_outputs"], default=None
+ )
+
+ def _get_processing_order(
+ self, inference_state, start_frame_idx, max_frame_num_to_track, reverse
+ ):
+ num_frames = inference_state["num_frames"]
+ # set start index, end index, and processing order
+ if self.always_start_from_first_ann_frame:
+ # in this case, we always start tracking from the frame where we receive
+ # the initial annotation and ignore the provided start_frame_idx
+ start_frame_idx = inference_state["first_ann_frame_idx"]
+ if start_frame_idx is None:
+ # default: start from the earliest frame with input points
+ start_frame_idx = min(inference_state["output_dict"]["cond_frame_outputs"])
+ if max_frame_num_to_track is None:
+ # default: track all the frames in the video
+ max_frame_num_to_track = num_frames
+ if reverse:
+ end_frame_idx = max(start_frame_idx - max_frame_num_to_track, 0)
+ if start_frame_idx > 0:
+ processing_order = range(start_frame_idx, end_frame_idx - 1, -1)
+ else:
+ # TODO: Jie - this is the edge case that we start from frame 0 and track in reverse order;
+ # and in the case we track a single frame for dense tracking, it should still run 1 frame (idx=0).
+ # Not sure if this has any side effect.
+ # processing_order = [] # skip reverse tracking if starting from frame 0 <-- original behaviour
+ processing_order = [0]
+ else:
+ end_frame_idx = min(
+ start_frame_idx + max_frame_num_to_track, num_frames - 1
+ )
+ processing_order = range(start_frame_idx, end_frame_idx + 1)
+ return processing_order
+
+ @torch.inference_mode()
+ def propagate_in_video(
+ self,
+ inference_state,
+ start_frame_idx,
+ max_frame_num_to_track,
+ reverse,
+ tqdm_disable=False,
+ obj_ids=None,
+ run_mem_encoder=True,
+ ):
+ """Propagate the input points across frames to track in the entire video."""
+ output_dict = inference_state["output_dict"]
+ consolidated_frame_inds = inference_state["consolidated_frame_inds"]
+ if obj_ids is not None:
+ raise NotImplementedError(
+ "Per-object tracking yet for batched inference if not implemented."
+ )
+ obj_ids = inference_state["obj_ids"]
+ batch_size = self._get_obj_num(inference_state)
+ if len(output_dict["cond_frame_outputs"]) == 0:
+ raise RuntimeError("No points are provided; please add points first")
+ clear_non_cond_mem = self.clear_non_cond_mem_around_input and (
+ self.clear_non_cond_mem_for_multi_obj or batch_size <= 1
+ )
+ assert clear_non_cond_mem is False, "Not implemented"
+
+ processing_order = self._get_processing_order(
+ inference_state,
+ start_frame_idx,
+ max_frame_num_to_track,
+ reverse,
+ )
+
+ for frame_idx in tqdm(
+ processing_order, desc="propagate in video", disable=tqdm_disable
+ ):
+ # We skip those frames already in consolidated outputs (these are frames
+ # that received input clicks or mask). Note that we cannot directly run
+ # batched forward on them via `_run_single_frame_inference` because the
+ # number of clicks on each object might be different.
+ if frame_idx in consolidated_frame_inds["cond_frame_outputs"]:
+ storage_key = "cond_frame_outputs"
+ current_out = output_dict[storage_key][frame_idx]
+ pred_masks = current_out["pred_masks"]
+ if clear_non_cond_mem:
+ # clear non-conditioning memory of the surrounding frames
+ self._clear_non_cond_mem_around_input(inference_state, frame_idx)
+ elif frame_idx in consolidated_frame_inds["non_cond_frame_outputs"]:
+ storage_key = "non_cond_frame_outputs"
+ current_out = output_dict[storage_key][frame_idx]
+ pred_masks = current_out["pred_masks"]
+ else:
+ storage_key = "non_cond_frame_outputs"
+ with torch.profiler.record_function(
+ "VideoTrackingMultiplexDemo._run_single_frame_inference"
+ ):
+ current_out, pred_masks = self._run_single_frame_inference(
+ inference_state=inference_state,
+ output_dict=output_dict,
+ frame_idx=frame_idx,
+ batch_size=batch_size,
+ is_init_cond_frame=False,
+ point_inputs=None,
+ mask_inputs=None,
+ reverse=reverse,
+ run_mem_encoder=run_mem_encoder,
+ )
+ current_out["local_obj_id_to_idx"] = deepcopy(
+ inference_state["obj_id_to_idx"]
+ )
+ output_dict[storage_key][frame_idx] = current_out
+ # Create slices of per-object outputs for subsequent interaction with each
+ # individual object after tracking.
+ self._add_output_per_object(
+ inference_state, frame_idx, current_out, storage_key
+ )
+ inference_state["frames_already_tracked"][frame_idx] = {"reverse": reverse}
+
+ # Resize the output mask to the original video resolution (we directly use
+ # the mask scores on GPU for output to avoid any CPU conversion in between)
+ low_res_masks, video_res_masks = self._get_orig_video_res_output(
+ inference_state, pred_masks
+ )
+ yield frame_idx, obj_ids, low_res_masks, video_res_masks
+
+ def _add_output_per_object(
+ self, inference_state, frame_idx, current_out, storage_key
+ ):
+ """
+ Split a multi-object output into per-object output slices and add them into
+ `output_dict_per_obj`. The resulting slices share the same tensor storage.
+ """
+ # Note for the multiplex model: we don't store the maskmem features
+ # because we don't use the memory during interaction
+
+ output_dict_per_obj = inference_state["output_dict_per_obj"]
+ for obj_idx, obj_output_dict in output_dict_per_obj.items():
+ obj_slice = slice(obj_idx, obj_idx + 1)
+ obj_out = {
+ "pred_masks": current_out["pred_masks"][obj_slice],
+ "object_score_logits": current_out["object_score_logits"][obj_slice],
+ }
+ if self.use_memory_selection:
+ obj_out["iou_score"] = current_out["iou_score"][obj_slice]
+ obj_output_dict[storage_key][frame_idx] = obj_out
+
+ @torch.inference_mode()
+ def clear_all_points_in_frame(
+ self,
+ inference_state,
+ frame_idx,
+ obj_id,
+ need_output=True,
+ preserve_user_refined: bool = False,
+ ):
+ """Remove all input points or mask in a specific frame for a given object."""
+ obj_idx = self._obj_id_to_idx(inference_state, obj_id)
+
+ # Clear the conditioning information on the given frame
+ inference_state["point_inputs_per_obj"][obj_idx].pop(frame_idx, None)
+ inference_state["mask_inputs_per_obj"][obj_idx].pop(frame_idx, None)
+
+ # Clear user refinement tracking for this frame and object unless preserving it
+ if (
+ not preserve_user_refined
+ and "user_refined_frames_per_obj" in inference_state
+ ):
+ user_refined_map = inference_state["user_refined_frames_per_obj"]
+ if obj_id in user_refined_map:
+ user_refined_map[obj_id].discard(frame_idx)
+
+ temp_output_dict_per_obj = inference_state["temp_output_dict_per_obj"]
+ temp_output_dict_per_obj[obj_idx]["cond_frame_outputs"].pop(frame_idx, None)
+ temp_output_dict_per_obj[obj_idx]["non_cond_frame_outputs"].pop(frame_idx, None)
+
+ # Check and see if there are still any inputs left on this frame
+ batch_size = self._get_obj_num(inference_state)
+ frame_has_input = False
+ for obj_idx2 in range(batch_size):
+ # Skip if this object doesn't exist in the input dictionaries
+ if obj_idx2 not in inference_state["point_inputs_per_obj"]:
+ continue
+ if obj_idx2 not in inference_state["mask_inputs_per_obj"]:
+ continue
+ if frame_idx in inference_state["point_inputs_per_obj"][obj_idx2]:
+ frame_has_input = True
+ break
+ if frame_idx in inference_state["mask_inputs_per_obj"][obj_idx2]:
+ frame_has_input = True
+ break
+
+ # If this frame has no remaining inputs for any objects, we further clear its
+ # conditioning frame status
+ if not frame_has_input:
+ output_dict = inference_state["output_dict"]
+ consolidated_frame_inds = inference_state["consolidated_frame_inds"]
+ consolidated_frame_inds["cond_frame_outputs"].discard(frame_idx)
+ consolidated_frame_inds["non_cond_frame_outputs"].discard(frame_idx)
+ # Remove the frame's conditioning output (possibly downgrading it to non-conditioning)
+ out = output_dict["cond_frame_outputs"].pop(frame_idx, None)
+ if out is not None:
+ # The frame is not a conditioning frame anymore since it's not receiving inputs,
+ # so we "downgrade" its output (if exists) to a non-conditioning frame output.
+ output_dict["non_cond_frame_outputs"][frame_idx] = out
+ inference_state["frames_already_tracked"].pop(frame_idx, None)
+ # Similarly, do it for the sliced output on each object.
+ for obj_idx2 in range(batch_size):
+ # Skip if this object doesn't exist in the output dictionary
+ if obj_idx2 not in inference_state["output_dict_per_obj"]:
+ continue
+ obj_output_dict = inference_state["output_dict_per_obj"][obj_idx2]
+ obj_out = obj_output_dict["cond_frame_outputs"].pop(frame_idx, None)
+ if obj_out is not None:
+ obj_output_dict["non_cond_frame_outputs"][frame_idx] = obj_out
+
+ # If all the conditioning frames have been removed, we also clear the tracking outputs
+ if len(output_dict["cond_frame_outputs"]) == 0:
+ self._reset_tracking_results(inference_state)
+
+ if not need_output:
+ return
+ # Finally, output updated masks per object (after removing the inputs above)
+ obj_ids = inference_state["obj_ids"]
+ is_cond = any(
+ frame_idx in obj_temp_output_dict["cond_frame_outputs"]
+ for obj_temp_output_dict in temp_output_dict_per_obj.values()
+ )
+ consolidated_out = self._consolidate_temp_output_across_obj(
+ inference_state,
+ frame_idx,
+ is_cond=is_cond,
+ run_mem_encoder=False,
+ consolidate_at_video_res=True,
+ )
+ _, video_res_masks = self._get_orig_video_res_output(
+ inference_state, consolidated_out["pred_masks_video_res"]
+ )
+ low_res_masks = None # not needed by the demo
+ return frame_idx, obj_ids, low_res_masks, video_res_masks
+
+ @torch.inference_mode()
+ def clear_all_points_in_video(self, inference_state):
+ """Remove all input points or mask in all frames throughout the video."""
+ self._reset_tracking_results(inference_state)
+ # Remove all object ids
+ inference_state["obj_id_to_idx"].clear()
+ inference_state["obj_idx_to_id"].clear()
+ inference_state["obj_ids"].clear()
+ inference_state["point_inputs_per_obj"].clear()
+ inference_state["mask_inputs_per_obj"].clear()
+ inference_state["output_dict_per_obj"].clear()
+ inference_state["temp_output_dict_per_obj"].clear()
+ inference_state["multiplex_state"] = None
+
+ def _reset_tracking_results(self, inference_state):
+ """Reset all tracking inputs and results across the videos."""
+ for v in inference_state["point_inputs_per_obj"].values():
+ v.clear()
+ for v in inference_state["mask_inputs_per_obj"].values():
+ v.clear()
+ for v in inference_state["output_dict_per_obj"].values():
+ v["cond_frame_outputs"].clear()
+ v["non_cond_frame_outputs"].clear()
+ for v in inference_state["temp_output_dict_per_obj"].values():
+ v["cond_frame_outputs"].clear()
+ v["non_cond_frame_outputs"].clear()
+ inference_state["output_dict"]["cond_frame_outputs"].clear()
+ inference_state["output_dict"]["non_cond_frame_outputs"].clear()
+ inference_state["consolidated_frame_inds"]["cond_frame_outputs"].clear()
+ inference_state["consolidated_frame_inds"]["non_cond_frame_outputs"].clear()
+ inference_state["tracking_has_started"] = False
+ inference_state["frames_already_tracked"].clear()
+ inference_state["first_ann_frame_idx"] = None
+
+ def _get_image_feature(self, inference_state, frame_idx, batch_size):
+ """Compute the image features on a given frame."""
+ # Look up in the cache first
+ image, backbone_out = inference_state["cached_features"].get(
+ frame_idx, (None, None)
+ )
+ if backbone_out is None:
+ # Cache miss -- we will run inference on a single image
+ image = inference_state["images"][frame_idx].cuda().float().unsqueeze(0)
+ # TODO: We should optimize this because we don't always need all three outs
+ backbone_out = self.forward_image(
+ NestedTensor(tensors=image, mask=None),
+ need_sam3_out=True,
+ need_interactive_out=True,
+ need_propagation_out=True,
+ )
+ # Cache the most recent frame's feature (for repeated interactions with
+ # a frame; we can use an LRU cache for more frames in the future).
+ inference_state["cached_features"] = {frame_idx: (image, backbone_out)}
+
+ features = self._prepare_backbone_features(backbone_out)
+ return image, features
+
+ def _run_single_frame_inference(
+ self,
+ inference_state,
+ output_dict,
+ frame_idx,
+ batch_size,
+ is_init_cond_frame,
+ point_inputs,
+ mask_inputs,
+ reverse,
+ run_mem_encoder,
+ prev_sam_mask_logits=None,
+ add_to_existing_state: bool = False,
+ new_obj_idxs: Optional[list[int]] = None,
+ new_obj_ids: Optional[list[int]] = None,
+ allow_new_buckets: bool = False,
+ prefer_new_buckets: bool = False,
+ reconditioning: bool = False,
+ objects_to_interact: Optional[list[int]] = None,
+ ):
+ """Run tracking on a single frame based on current inputs and previous memory."""
+ # Retrieve correct image features
+ with torch.profiler.record_function(
+ "VideoTrackingMultiplexDemo._get_image_feature"
+ ):
+ image, backbone_features = self._get_image_feature(
+ inference_state, frame_idx, batch_size
+ )
+
+ if add_to_existing_state or reconditioning:
+ assert new_obj_idxs is not None
+ assert new_obj_ids is not None
+
+ backbone_features_interactive = backbone_features["interactive"]
+ backbone_features_propagation = backbone_features["sam2_backbone_out"]
+
+ if add_to_existing_state or reconditioning:
+ with torch.profiler.record_function(
+ "VideoTrackingMultiplexDemo.add_new_masks_to_existing_state"
+ ):
+ # Get existing output from current frame to modify in-place
+ # Try both storage keys since the output could be in either location
+ existing_out = output_dict["cond_frame_outputs"].get(frame_idx)
+ if existing_out is None:
+ existing_out = output_dict["non_cond_frame_outputs"].get(frame_idx)
+ if existing_out is None:
+ raise RuntimeError(
+ f"No existing output found for frame {frame_idx} in either storage"
+ )
+
+ # Prepare interactive features
+ interactive_pix_feat = self._get_interactive_pix_mem(
+ backbone_features_interactive["vision_feats"],
+ backbone_features_interactive["feat_sizes"],
+ )
+
+ # High-resolution feature maps for the SAM head, reshape (HW)BC => BCHW
+ interactive_high_res_features = [
+ x.permute(1, 2, 0).view(x.size(1), x.size(2), *s)
+ for x, s in zip(
+ backbone_features_interactive["vision_feats"][:-1],
+ backbone_features_interactive["feat_sizes"][:-1],
+ )
+ ]
+
+ # Prepare propagation features for memory encoding
+ propagation_vision_feats = (
+ backbone_features_propagation["vision_feats"]
+ if run_mem_encoder
+ else None
+ )
+ propagation_feat_sizes = (
+ backbone_features_propagation["feat_sizes"]
+ if run_mem_encoder
+ else None
+ )
+
+ # Add new masks to existing state
+ if reconditioning:
+ self.recondition_masks_in_existing_state(
+ interactive_pix_feat=interactive_pix_feat,
+ interactive_high_res_features=interactive_high_res_features,
+ propagation_vision_feats=propagation_vision_feats,
+ propagation_feat_sizes=propagation_feat_sizes,
+ new_masks=mask_inputs,
+ obj_idxs_in_mask=new_obj_idxs,
+ obj_ids_in_mask=new_obj_ids,
+ prev_output=existing_out,
+ multiplex_state=inference_state["multiplex_state"],
+ add_mask_to_memory=run_mem_encoder,
+ )
+ else:
+ # If we are adding to existing state using points (mask_inputs is None),
+ # first convert points -> masks via the interactivity head.
+ new_masks_from_points = None
+ if mask_inputs is None and point_inputs is not None:
+ with torch.profiler.record_function(
+ "VideoTrackingMultiplexDemo.points_to_masks"
+ ):
+ multimask_output = self._use_multimask(
+ is_init_cond_frame, point_inputs=point_inputs
+ )
+ interaction_out = self._forward_sam_heads(
+ backbone_features=interactive_pix_feat,
+ point_inputs=point_inputs,
+ mask_inputs=None,
+ interactive_high_res_features=interactive_high_res_features,
+ multimask_output=multimask_output,
+ objects_to_interact=new_obj_idxs,
+ multiplex_state=inference_state["multiplex_state"],
+ )
+ new_masks_from_points = interaction_out["low_res_masks"]
+
+ self.add_new_masks_to_existing_state(
+ interactive_pix_feat=interactive_pix_feat,
+ interactive_high_res_features=interactive_high_res_features,
+ propagation_vision_feats=propagation_vision_feats,
+ propagation_feat_sizes=propagation_feat_sizes,
+ new_masks=(
+ mask_inputs
+ if mask_inputs is not None
+ else new_masks_from_points
+ ),
+ obj_idxs_in_mask=new_obj_idxs,
+ obj_ids_in_mask=new_obj_ids,
+ prev_output=existing_out,
+ multiplex_state=inference_state["multiplex_state"],
+ add_mask_to_memory=run_mem_encoder,
+ are_masks_from_pts=(mask_inputs is None),
+ allow_new_buckets=allow_new_buckets,
+ prefer_new_buckets=prefer_new_buckets,
+ )
+
+ # Return the modified existing output
+ current_out = existing_out
+ else:
+ # point and mask should not appear as input simultaneously on the same frame
+ assert point_inputs is None or mask_inputs is None
+ with torch.profiler.record_function(
+ "VideoTrackingMultiplexDemo.track_step"
+ ):
+ current_out = self.track_step(
+ frame_idx=frame_idx,
+ is_init_cond_frame=is_init_cond_frame,
+ backbone_features_interactive=backbone_features_interactive,
+ backbone_features_propagation=backbone_features_propagation,
+ image=image,
+ point_inputs=point_inputs,
+ mask_inputs=mask_inputs,
+ gt_masks=None,
+ frames_to_add_correction_pt=[],
+ output_dict=output_dict,
+ num_frames=inference_state["num_frames"],
+ track_in_reverse=reverse,
+ run_mem_encoder=run_mem_encoder,
+ prev_sam_mask_logits=prev_sam_mask_logits,
+ multiplex_state=inference_state["multiplex_state"],
+ objects_to_interact=objects_to_interact,
+ )
+
+ # optionally offload the output to CPU memory to save GPU space
+ storage_device = inference_state["storage_device"]
+ if current_out.get("maskmem_features") is not None:
+ maskmem_features = current_out["maskmem_features"]
+ maskmem_features = maskmem_features.to(
+ device=storage_device, dtype=torch.bfloat16, non_blocking=True
+ )
+ else:
+ maskmem_features = None
+
+ if current_out.get("image_features") is not None:
+ assert "image_pos_enc" in current_out
+ image_features = current_out["image_features"].to(
+ storage_device, non_blocking=True
+ )
+ image_pos_enc = current_out["image_pos_enc"].to(
+ storage_device, non_blocking=True
+ )
+ else:
+ image_features = image_pos_enc = None
+
+ pred_masks_gpu = current_out["pred_masks"]
+ pred_masks = pred_masks_gpu.to(storage_device, non_blocking=True)
+ # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it
+ with torch.profiler.record_function(
+ "VideoTrackingMultiplexDemo.maskmem_pos_enc"
+ ):
+ maskmem_pos_enc = self._get_maskmem_pos_enc(inference_state, current_out)
+ # object pointer is a small tensor, so we always keep it on GPU memory for fast access
+ obj_ptr = current_out["obj_ptr"]
+ object_score_logits = current_out["object_score_logits"]
+ conditioning_objects = current_out["conditioning_objects"]
+ # make a compact version of this frame's output to reduce the state size
+ compact_current_out = {
+ "maskmem_features": maskmem_features,
+ "maskmem_pos_enc": maskmem_pos_enc,
+ "image_features": image_features,
+ "image_pos_enc": image_pos_enc,
+ "pred_masks": pred_masks,
+ "obj_ptr": obj_ptr,
+ "object_score_logits": object_score_logits,
+ "conditioning_objects": conditioning_objects,
+ }
+ if self.use_memory_selection:
+ with torch.profiler.record_function(
+ "VideoTrackingMultiplexDemo.use_memory_selection"
+ ):
+ compact_current_out["iou_score"] = current_out["iou_score"]
+ compact_current_out["eff_iou_score"] = self.cal_mem_score(
+ object_score_logits, current_out["iou_score"]
+ )
+ return compact_current_out, pred_masks_gpu
+
+ def _run_memory_encoder(
+ self,
+ inference_state,
+ frame_idx,
+ batch_size,
+ high_res_masks,
+ object_score_logits,
+ is_mask_from_pts,
+ conditioning_objects=None, # Accept as parameter
+ ):
+ """
+ Run the memory encoder on `high_res_masks`. This is usually after applying
+ non-overlapping constraints to object scores. Since their scores changed, their
+ memory also need to be computed again with the memory encoder.
+ """
+ # Retrieve correct image features
+ image, backbone_features = self._get_image_feature(
+ inference_state, frame_idx, batch_size
+ )
+ backbone_features_propagation = backbone_features["sam2_backbone_out"]
+ propagation_vision_feats = backbone_features_propagation["vision_feats"]
+ propagation_vision_pos_embeds = backbone_features_propagation[
+ "vision_pos_embeds"
+ ]
+ propagation_feat_sizes = backbone_features_propagation["feat_sizes"]
+
+ # If conditioning_objects is not provided, look it up from output_dict
+ if conditioning_objects is None:
+ output_dict = inference_state["output_dict"]
+ for storage_key in ["cond_frame_outputs", "non_cond_frame_outputs"]:
+ storage = output_dict[storage_key]
+ if frame_idx not in storage:
+ continue
+ conditioning_objects = storage[frame_idx]["conditioning_objects"]
+ break
+ else:
+ raise ValueError(f"conditioning objects not found at {frame_idx=}")
+
+ maskmem_features, maskmem_pos_enc = self._encode_new_memory(
+ image=image,
+ current_vision_feats=propagation_vision_feats,
+ feat_sizes=propagation_feat_sizes,
+ pred_masks_high_res=high_res_masks,
+ object_score_logits=object_score_logits,
+ is_mask_from_pts=is_mask_from_pts,
+ conditioning_objects=conditioning_objects,
+ multiplex_state=inference_state["multiplex_state"],
+ )
+
+ # optionally offload the output to CPU memory to save GPU space
+ storage_device = inference_state["storage_device"]
+ maskmem_features = maskmem_features.to(torch.bfloat16)
+ maskmem_features = maskmem_features.to(storage_device, non_blocking=True)
+ # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it
+ maskmem_pos_enc = self._get_maskmem_pos_enc(
+ inference_state, {"maskmem_pos_enc": maskmem_pos_enc}
+ )
+
+ image_features = propagation_vision_feats[-1]
+ image_features = image_features.to(storage_device, non_blocking=True)
+ image_pos_enc = propagation_vision_pos_embeds[-1]
+ image_pos_enc = image_pos_enc.to(storage_device, non_blocking=True)
+ return maskmem_features, maskmem_pos_enc, image_features, image_pos_enc
+
+ def _get_maskmem_pos_enc(self, inference_state, current_out):
+ """
+ `maskmem_pos_enc` is the same across frames and objects, so we cache it as
+ a constant in the inference session to reduce session storage size.
+ """
+ model_constants = inference_state["constants"]
+ # "out_maskmem_pos_enc" should be either a list of tensors or None
+ out_maskmem_pos_enc = current_out.get("maskmem_pos_enc")
+ if out_maskmem_pos_enc is not None:
+ if "maskmem_pos_enc" not in model_constants:
+ assert isinstance(out_maskmem_pos_enc, list)
+ # only take the slice for one object, since it's same across objects
+ maskmem_pos_enc = [x[0:1].clone() for x in out_maskmem_pos_enc]
+ model_constants["maskmem_pos_enc"] = maskmem_pos_enc
+ else:
+ maskmem_pos_enc = model_constants["maskmem_pos_enc"]
+ # expand the cached maskmem_pos_enc to the actual batch size
+ batch_size = out_maskmem_pos_enc[0].size(0)
+ expanded_maskmem_pos_enc = [
+ x.expand(batch_size, -1, -1, -1) for x in maskmem_pos_enc
+ ]
+ else:
+ expanded_maskmem_pos_enc = None
+ return expanded_maskmem_pos_enc
+
+ @torch.inference_mode()
+ def remove_object(
+ self,
+ inference_state,
+ obj_id: int,
+ strict=False,
+ need_output=True,
+ clear_user_refined_map: bool = True,
+ ):
+ """
+ Remove a single object from the tracking state.
+
+ This is a convenience wrapper around remove_objects() for removing a single object.
+
+ Args:
+ inference_state: Current inference state
+ obj_id: Object ID to remove
+ strict: If True, raise error if object doesn't exist
+ need_output: Whether to return updated frames
+
+ Returns:
+ Tuple of (remaining_obj_ids, updated_frames)
+ """
+ return self.remove_objects(
+ inference_state,
+ obj_ids=[obj_id],
+ strict=strict,
+ need_output=need_output,
+ clear_user_refined_map=clear_user_refined_map,
+ )
+
+ @torch.inference_mode()
+ def remove_objects(
+ self,
+ inference_state,
+ obj_ids: Iterable[int],
+ strict=False,
+ need_output=True,
+ clear_user_refined_map: bool = True,
+ ):
+ """
+ Remove a list of object ids from the tracking state. If strict is True, we check whether
+ the object ids actually exist and raise an error if any of them don't exist.
+ """
+ obj_ids = list(obj_ids)
+ old_obj_idxs_to_rm = [
+ inference_state["obj_id_to_idx"].get(obj_id, None) for obj_id in obj_ids
+ ]
+ updated_frames = []
+ actually_used_obj_ids = []
+ removing_any = False
+ for old_obj_idx_to_rm, obj_id in zip(old_obj_idxs_to_rm, obj_ids, strict=True):
+ if old_obj_idx_to_rm is None:
+ if strict:
+ raise ValueError(
+ f"Object id {obj_id} does not exist in the tracking state."
+ )
+ else:
+ actually_used_obj_ids.append(obj_id)
+ removing_any = True
+ if not removing_any:
+ return inference_state["obj_ids"], updated_frames
+
+ # ignore any object IDs that don't exist
+ old_obj_idxs_to_rm = [x for x in old_obj_idxs_to_rm if x is not None]
+ obj_ids = actually_used_obj_ids
+ removed_obj_ids = list(obj_ids)
+
+ # There are still remaining objects after removing this object id. In this case,
+ # we need to delete the object storage from inference state tensors.
+ # Step 0: clear the input on those frames where this object id has point or mask input
+ # (note that this step is required as it might downgrade conditioning frames to
+ # non-conditioning ones)
+ if clear_user_refined_map and "user_refined_frames_per_obj" in inference_state:
+ user_refined_map = inference_state["user_refined_frames_per_obj"]
+ for removed_obj_id in removed_obj_ids:
+ if removed_obj_id in user_refined_map:
+ user_refined_map.pop(removed_obj_id, None)
+
+ all_obj_input_frames_inds = set()
+ for old_obj_idx_to_rm, obj_id in zip(old_obj_idxs_to_rm, obj_ids, strict=True):
+ obj_input_frames_inds = set()
+ obj_input_frames_inds.update(
+ inference_state["point_inputs_per_obj"][old_obj_idx_to_rm]
+ )
+ obj_input_frames_inds.update(
+ inference_state["mask_inputs_per_obj"][old_obj_idx_to_rm]
+ )
+ for frame_idx in obj_input_frames_inds:
+ self.clear_all_points_in_frame(
+ inference_state,
+ frame_idx,
+ obj_id,
+ need_output=False,
+ preserve_user_refined=not clear_user_refined_map,
+ )
+ all_obj_input_frames_inds.update(obj_input_frames_inds)
+
+ # Step 1: Update the object id mapping (note that it must be done after Step 0,
+ # since Step 0 still requires the old object id mappings in inference_state)
+ old_obj_ids = inference_state["obj_ids"]
+ old_obj_inds = list(range(len(old_obj_ids)))
+ remain_old_obj_inds = old_obj_inds.copy()
+ for old_obj_idx_to_rm in old_obj_idxs_to_rm:
+ remain_old_obj_inds.remove(old_obj_idx_to_rm)
+ new_obj_ids = [old_obj_ids[old_idx] for old_idx in remain_old_obj_inds]
+ new_obj_inds = list(range(len(new_obj_ids)))
+ # build new mappings
+ old_idx_to_new_idx = dict(zip(remain_old_obj_inds, new_obj_inds))
+ inference_state["obj_id_to_idx"] = dict(zip(new_obj_ids, new_obj_inds))
+ inference_state["obj_idx_to_id"] = dict(zip(new_obj_inds, new_obj_ids))
+ inference_state["obj_ids"] = new_obj_ids
+
+ if len(new_obj_ids) == 0:
+ return new_obj_ids, updated_frames
+
+ # Step 2: For per-object tensor storage, we shift their obj_idx in the dict keys.
+ # (note that "consolidated_frame_inds" doesn't need to be updated in this step as
+ # it's already handled in Step 0)
+ def _map_keys(container):
+ new_kvs = []
+ for k in old_obj_inds:
+ v = container.pop(k)
+ if k in old_idx_to_new_idx:
+ new_kvs.append((old_idx_to_new_idx[k], v))
+ container.update(new_kvs)
+
+ _map_keys(inference_state["point_inputs_per_obj"])
+ _map_keys(inference_state["mask_inputs_per_obj"])
+ _map_keys(inference_state["output_dict_per_obj"])
+ _map_keys(inference_state["temp_output_dict_per_obj"])
+
+ multiplex_state: MultiplexState = inference_state["multiplex_state"]
+ # strict is set to True because we have done the filtering above
+ buckets_to_keep = multiplex_state.remove_objects(
+ old_obj_idxs_to_rm, strict=True
+ )
+ obj_ids = set(obj_ids)
+
+ # Step 3: For packed tensor storage, we index the remaining ids and rebuild the per-bucket/per-object slices.
+ def _slice_state(output_dict, storage_key):
+ for frame_idx, out in output_dict[storage_key].items():
+ out["maskmem_features"] = out["maskmem_features"][buckets_to_keep]
+ out["maskmem_pos_enc"] = [
+ x[buckets_to_keep] for x in out["maskmem_pos_enc"]
+ ]
+ # "maskmem_pos_enc" is the same across frames, so we only need to store one copy of it
+ out["maskmem_pos_enc"] = self._get_maskmem_pos_enc(inference_state, out)
+ out["obj_ptr"] = out["obj_ptr"][buckets_to_keep]
+
+ # Note that pred_maks and score_logits are stored in a per-object manner
+ # When we add new objects, obj_id_to_idx mapping could be different
+ # locally (at this past frame) versus globally (at the current frame),
+ # so we need to use a local copy of this mapping
+ local_obj_id_to_idx = out["local_obj_id_to_idx"]
+
+ # Find which local indices correspond to the remaining old object indices
+ local_remain_old_obj_inds = [
+ obj_idx
+ for obj_id, obj_idx in local_obj_id_to_idx.items()
+ if obj_id not in obj_ids
+ ]
+
+ # Guard against stale indices by intersecting with available rows
+ max_pred = out["pred_masks"].shape[0]
+ max_scores = out["object_score_logits"].shape[0]
+ keep_indices = [
+ idx
+ for idx in local_remain_old_obj_inds
+ if 0 <= idx < max_pred and 0 <= idx < max_scores
+ ]
+ out["pred_masks"] = out["pred_masks"][keep_indices]
+ out["object_score_logits"] = out["object_score_logits"][keep_indices]
+ if self.use_memory_selection:
+ out["iou_score"] = out["iou_score"][keep_indices]
+ out["eff_iou_score"] = self.cal_mem_score(
+ out["object_score_logits"], out["iou_score"]
+ ) # recalculate the memory frame score
+ sliced_conditioning_objects = set()
+
+ # Update local_obj_id_to_idx to reflect the new indices after removal
+ new_local_obj_id_to_idx = {}
+ old_to_new = {
+ old_idx: new_i for new_i, old_idx in enumerate(keep_indices)
+ }
+ for obj_id, old_idx in local_obj_id_to_idx.items():
+ if obj_id not in obj_ids: # Keep objects not being removed
+ # Find the new index for this object if it was kept
+ if old_idx in old_to_new:
+ new_idx = old_to_new[old_idx]
+ new_local_obj_id_to_idx[obj_id] = new_idx
+ if old_idx in out["conditioning_objects"]:
+ sliced_conditioning_objects.add(new_idx)
+
+ out["local_obj_id_to_idx"] = new_local_obj_id_to_idx
+ out["conditioning_objects"] = sliced_conditioning_objects
+
+ # also update the per-object slices
+ self._add_output_per_object(
+ inference_state, frame_idx, out, storage_key
+ )
+
+ _slice_state(inference_state["output_dict"], "cond_frame_outputs")
+ _slice_state(inference_state["output_dict"], "non_cond_frame_outputs")
+
+ # Step 4: Further collect the outputs on those frames in `obj_input_frames_inds`, which
+ # could show an updated mask for objects previously occluded by the object being removed
+ if need_output:
+ temp_output_dict_per_obj = inference_state["temp_output_dict_per_obj"]
+ for frame_idx in all_obj_input_frames_inds:
+ is_cond = any(
+ frame_idx in obj_temp_output_dict["cond_frame_outputs"]
+ for obj_temp_output_dict in temp_output_dict_per_obj.values()
+ )
+ consolidated_out = self._consolidate_temp_output_across_obj(
+ inference_state,
+ frame_idx,
+ is_cond=is_cond,
+ run_mem_encoder=False,
+ consolidate_at_video_res=True,
+ )
+ _, video_res_masks = self._get_orig_video_res_output(
+ inference_state, consolidated_out["pred_masks_video_res"]
+ )
+ updated_frames.append((frame_idx, video_res_masks))
+
+ return inference_state["obj_ids"], updated_frames
+
+ def _clear_non_cond_mem_around_input(self, inference_state, frame_idx):
+ """
+ Remove the non-conditioning memory around the input frame. When users provide
+ correction clicks, the surrounding frames' non-conditioning memories can still
+ contain outdated object appearance information and could confuse the model.
+
+ This function clears those non-conditioning memories surrounding the interacted
+ frame to avoid giving the model both old and new information about the object.
+ """
+ r = self.memory_temporal_stride_for_eval
+ frame_idx_begin = frame_idx - r * self.num_maskmem
+ frame_idx_end = frame_idx + r * self.num_maskmem
+ output_dict = inference_state["output_dict"]
+ non_cond_frame_outputs = output_dict["non_cond_frame_outputs"]
+ for t in range(frame_idx_begin, frame_idx_end + 1):
+ non_cond_frame_outputs.pop(t, None)
+ for obj_output_dict in inference_state["output_dict_per_obj"].values():
+ obj_output_dict["non_cond_frame_outputs"].pop(t, None)
+
+ @torch.inference_mode()
+ @torch.autocast(device_type="cuda", dtype=torch.bfloat16)
+ def warm_up_compilation(
+ self, offload_video_to_cpu=False, offload_state_to_cpu=False
+ ):
+ """
+ Warm up the model by running a dummy inference to compile the model. This is
+ useful to avoid the compilation overhead in the first inference call.
+ """
+ if not self.compile_all_components:
+ return
+
+ raise NotImplementedError(
+ "Please use `VideoTrackingMultiplexDemoPerBucketInference` instead for full model compilation."
+ )
+
+
+class Sam3VideoTrackingMultiplexDemo(VideoTrackingMultiplexDemo):
+ @torch.inference_mode()
+ def init_state(
+ self,
+ video_height,
+ video_width,
+ num_frames,
+ cached_features=None,
+ offload_video_to_cpu=False,
+ offload_state_to_cpu=False,
+ ):
+ """Initialize a inference state."""
+ # Make sure that sigmoid is used on mask logits (should be True for all our recent models).
+ # Since we rely on large negative values as scores for missing objects, the raw logits
+ # cannot be consumed directly and must be converted into 0~1 range via sigmoid first.
+ if not self.apply_sigmoid_to_mask_logits_for_mem_enc:
+ raise NotImplementedError(
+ "Multi-object tracking requires sigmoid in memory encoder for non-overlapping constraints."
+ )
+ inference_state = {}
+ # inference_state["images"] = images
+ inference_state["num_frames"] = num_frames
+ # whether to offload the video frames to CPU memory
+ # turning on this option saves the GPU memory with only a very small overhead
+ inference_state["offload_video_to_cpu"] = offload_video_to_cpu
+ # whether to offload the inference state to CPU memory
+ # turning on this option saves the GPU memory at the cost of a lower tracking fps
+ # (e.g. in a test case of 768x768 model, fps dropped from 27 to 24 when tracking one object
+ # and from 24 to 21 when tracking two objects)
+ inference_state["offload_state_to_cpu"] = offload_state_to_cpu
+ # the original video height and width, used for resizing final output scores
+ inference_state["video_height"] = video_height
+ inference_state["video_width"] = video_width
+ inference_state["device"] = torch.device("cuda")
+ if offload_state_to_cpu:
+ inference_state["storage_device"] = torch.device("cpu")
+ else:
+ inference_state["storage_device"] = torch.device("cuda")
+ # inputs on each frame
+ inference_state["point_inputs_per_obj"] = {}
+ inference_state["mask_inputs_per_obj"] = {}
+ # visual features on a small number of recently visited frames for quick interactions
+ inference_state["cached_features"] = (
+ {} if cached_features is None else cached_features
+ )
+ # values that don't change across frames (so we only need to hold one copy of them)
+ inference_state["constants"] = {}
+ # mapping between client-side object id and model-side object index
+ inference_state["obj_id_to_idx"] = OrderedDict()
+ inference_state["obj_idx_to_id"] = OrderedDict()
+ inference_state["obj_ids"] = []
+ # A storage to hold the model's tracking results and states on each frame
+ inference_state["output_dict"] = {
+ "cond_frame_outputs": {}, # dict containing {frame_idx: }
+ "non_cond_frame_outputs": {}, # dict containing {frame_idx: }
+ }
+ # The index of the frame that received the first annotation
+ inference_state["first_ann_frame_idx"] = None
+ # Slice (view) of each object tracking results, sharing the same memory with "output_dict"
+ inference_state["output_dict_per_obj"] = {}
+ # A temporary storage to hold new outputs when user interact with a frame
+ # to add clicks or mask (it's merged into "output_dict" before propagation starts)
+ inference_state["temp_output_dict_per_obj"] = {}
+ # Frames that already holds consolidated outputs from click or mask inputs
+ # (we directly use their consolidated outputs during tracking)
+ inference_state["consolidated_frame_inds"] = {
+ "cond_frame_outputs": set(), # set containing frame indices
+ "non_cond_frame_outputs": set(), # set containing frame indices
+ }
+ # metadata for each tracking frame (e.g. which direction it's tracked)
+ inference_state["tracking_has_started"] = False
+ inference_state["frames_already_tracked"] = {}
+ inference_state["multiplex_state"] = None
+ # Warm up the whole model and cache the image feature on frame 0
+ # by making a dummy click on the first frame (and then cleaning it up)
+ # self.add_new_points(
+ # inference_state=inference_state,
+ # frame_idx=0,
+ # obj_id=1,
+ # points=torch.tensor([[0.5, 0.5]], dtype=torch.float32),
+ # labels=torch.tensor([1], dtype=torch.int32),
+ # clear_old_points=True,
+ # rel_coordinates=True,
+ # )
+ self.clear_all_points_in_video(inference_state)
+ return inference_state
+
+ def _suppress_shrinked_masks(
+ self, pred_masks, new_pred_masks, shrink_threshold=0.3
+ ):
+ area_before = (pred_masks > 0).sum(dim=(-1, -2))
+ area_after = (new_pred_masks > 0).sum(dim=(-1, -2))
+ area_before = torch.clamp(area_before, min=1.0)
+ area_ratio = area_after / area_before
+ keep = area_ratio >= shrink_threshold
+ keep_mask = keep[..., None, None].expand_as(pred_masks)
+ pred_masks_after = torch.where(
+ keep_mask, pred_masks, torch.clamp(pred_masks, max=-10.0)
+ )
+ return pred_masks_after
+
+ @staticmethod
+ def _suppress_object_pw_area_shrinkage(pred_masks):
+ """
+ This function suppresses masks that shrink in area after applying pixelwise non-overlapping constriants.
+ Note that the final output can still be overlapping.
+ """
+ # Apply pixel-wise non-overlapping constraint based on mask scores
+ # pixel_level_non_overlapping_masks = super()._apply_non_overlapping_constraints(
+ # pred_masks
+ # )
+
+ batch_size = pred_masks.size(0)
+ if batch_size == 1:
+ return pred_masks
+
+ device = pred_masks.device
+ # "max_obj_inds": object index of the object with the highest score at each location
+ max_obj_inds = torch.argmax(pred_masks, dim=0, keepdim=True)
+ # "batch_obj_inds": object index of each object slice (along dim 0) in `pred_masks`
+ batch_obj_inds = torch.arange(batch_size, device=device)[:, None, None, None]
+ keep = max_obj_inds == batch_obj_inds
+ # suppress overlapping regions' scores below -10.0 so that the foreground regions
+ # don't overlap (here sigmoid(-10.0)=4.5398e-05)
+ pixel_level_non_overlapping_masks = torch.where(
+ keep, pred_masks, torch.clamp(pred_masks, max=-10.0)
+ )
+
+ # Fully suppress masks with high shrinkage (probably noisy) based on the pixel wise non-overlapping constraints
+ # NOTE: The output of this function can be a no op if none of the masks shrinked by a large factor.
+ # pred_masks = self._suppress_shrinked_masks(
+ # pred_masks, pixel_level_non_overlapping_masks
+ # )
+
+ shrink_threshold = 0.3
+ area_before = (pred_masks > 0).sum(dim=(-1, -2))
+ area_after = (pixel_level_non_overlapping_masks > 0).sum(dim=(-1, -2))
+ area_before = torch.clamp(area_before, min=1.0)
+ area_ratio = area_after / area_before
+ keep = area_ratio >= shrink_threshold
+ keep_mask = keep[..., None, None].expand_as(pred_masks)
+ pred_masks_after = torch.where(
+ keep_mask, pred_masks, torch.clamp(pred_masks, max=-10.0)
+ )
+
+ return pred_masks_after
+
+ def _apply_object_wise_non_overlapping_constraints(
+ self, pred_masks, obj_scores, background_value=-10.0
+ ):
+ """
+ Applies non-overlapping constraints object wise (i.e. only one object can claim the overlapping region)
+ """
+ # TODO: Try suppression based on IoM here as well.
+ # Replace pixel scores with object scores
+ pred_masks_single_score = torch.where(
+ pred_masks > 0, obj_scores[..., None, None], background_value
+ )
+ # Apply pixel-wise non-overlapping constraint based on mask scores
+ pixel_level_non_overlapping_masks = super()._apply_non_overlapping_constraints(
+ pred_masks_single_score
+ )
+ # Replace object scores with pixel scores. Note, that now only one object can claim the overlapping region
+ pred_masks = torch.where(
+ pixel_level_non_overlapping_masks > 0,
+ pred_masks,
+ torch.clamp(pred_masks, max=background_value),
+ )
+ return pred_masks
+
+ @torch.inference_mode()
+ def propagate_in_video(
+ self,
+ inference_state,
+ start_frame_idx,
+ max_frame_num_to_track,
+ reverse,
+ tqdm_disable=False,
+ obj_ids=None,
+ run_mem_encoder=True,
+ ):
+ """Propagate the input points across frames to track in the entire video."""
+ # NOTE: This is a copy from the parent class, except that we return object scores as well.
+ output_dict = inference_state["output_dict"]
+ consolidated_frame_inds = inference_state["consolidated_frame_inds"]
+ if obj_ids is not None:
+ raise NotImplementedError(
+ "Per-object tracking yet for batched inference if not implemented."
+ )
+ obj_ids = inference_state["obj_ids"]
+ batch_size = self._get_obj_num(inference_state)
+ if len(output_dict["cond_frame_outputs"]) == 0:
+ raise RuntimeError("No points are provided; please add points first")
+ clear_non_cond_mem = self.clear_non_cond_mem_around_input and (
+ self.clear_non_cond_mem_for_multi_obj or batch_size <= 1
+ )
+
+ processing_order = self._get_processing_order(
+ inference_state,
+ start_frame_idx,
+ max_frame_num_to_track,
+ reverse,
+ )
+
+ for frame_idx in tqdm(
+ processing_order, desc="propagate in video", disable=tqdm_disable
+ ):
+ # We skip those frames already in consolidated outputs (these are frames
+ # that received input clicks or mask). Note that we cannot directly run
+ # batched forward on them via `_run_single_frame_inference` because the
+ # number of clicks on each object might be different.
+ if frame_idx in consolidated_frame_inds["cond_frame_outputs"]:
+ storage_key = "cond_frame_outputs"
+ current_out = output_dict[storage_key][frame_idx]
+ pred_masks = current_out["pred_masks"]
+ obj_scores = current_out["object_score_logits"]
+ if clear_non_cond_mem:
+ # clear non-conditioning memory of the surrounding frames
+ self._clear_non_cond_mem_around_input(inference_state, frame_idx)
+ elif frame_idx in consolidated_frame_inds["non_cond_frame_outputs"]:
+ storage_key = "non_cond_frame_outputs"
+ current_out = output_dict[storage_key][frame_idx]
+ pred_masks = current_out["pred_masks"]
+ obj_scores = current_out["object_score_logits"]
+ else:
+ storage_key = "non_cond_frame_outputs"
+ with torch.profiler.record_function(
+ "VideoTrackingMultiplexDemo._run_single_frame_inference"
+ ):
+ current_out, pred_masks = self._run_single_frame_inference(
+ inference_state=inference_state,
+ output_dict=output_dict,
+ frame_idx=frame_idx,
+ batch_size=batch_size,
+ is_init_cond_frame=False,
+ point_inputs=None,
+ mask_inputs=None,
+ reverse=reverse,
+ run_mem_encoder=run_mem_encoder,
+ )
+ obj_scores = current_out["object_score_logits"]
+ current_out["local_obj_id_to_idx"] = deepcopy(
+ inference_state["obj_id_to_idx"]
+ )
+ output_dict[storage_key][frame_idx] = current_out
+
+ # Create slices of per-object outputs for subsequent interaction with each
+ # individual object after tracking.
+ self._add_output_per_object(
+ inference_state, frame_idx, current_out, storage_key
+ )
+ inference_state["frames_already_tracked"][frame_idx] = {"reverse": reverse}
+
+ # Resize the output mask to the original video resolution (we directly use
+ # the mask scores on GPU for output to avoid any CPU conversion in between)
+ low_res_masks, video_res_masks = self._get_orig_video_res_output(
+ inference_state, pred_masks
+ )
+ yield frame_idx, obj_ids, low_res_masks, video_res_masks, obj_scores
diff --git a/sam3/model/vitdet.py b/sam3/model/vitdet.py
index bc4eeb0..f6771a1 100644
--- a/sam3/model/vitdet.py
+++ b/sam3/model/vitdet.py
@@ -22,13 +22,58 @@
import torch.utils.checkpoint as checkpoint
try:
- from timm.layers import DropPath, Mlp, trunc_normal_
+ from timm.layers import DropPath, trunc_normal_
except ModuleNotFoundError:
# compatibility for older timm versions
- from timm.models.layers import DropPath, Mlp, trunc_normal_
+ from timm.models.layers import DropPath, trunc_normal_
+from sam3.model.data_misc import NestedTensor
+from sam3.model.model_misc import AttentionType, LayerScale
+from sam3.perflib.fused import addmm_act
+from sam3.sam.rope import apply_rotary_enc_real, VisionRotaryEmbeddingVE
from torch import Tensor
-from .model_misc import LayerScale
+
+class Mlp(nn.Module):
+ """MLP as used in Vision Transformer, MLP-Mixer and related networks"""
+
+ def __init__(
+ self,
+ in_features,
+ hidden_features=None,
+ out_features=None,
+ act_layer=nn.GELU,
+ norm_layer=None,
+ bias=True,
+ drop=0.0,
+ use_conv=False,
+ ):
+ super().__init__()
+ out_features = out_features or in_features
+ hidden_features = hidden_features or in_features
+ if isinstance(bias, bool):
+ bias = (bias, bias)
+ if isinstance(drop, (int, float)):
+ drop_probs = (drop, drop)
+ else:
+ drop_probs = drop
+ linear_layer = partial(nn.Conv2d, kernel_size=1) if use_conv else nn.Linear
+
+ self.fc1 = linear_layer(in_features, hidden_features, bias=bias[0])
+ self.act = act_layer()
+ self.drop1 = nn.Dropout(drop_probs[0])
+ self.norm = (
+ norm_layer(hidden_features) if norm_layer is not None else nn.Identity()
+ )
+ self.fc2 = linear_layer(hidden_features, out_features, bias=bias[1])
+ self.drop2 = nn.Dropout(drop_probs[1])
+
+ def forward(self, x):
+ x = addmm_act(type(self.act), self.fc1, x)
+ x = self.drop1(x)
+ x = self.norm(x)
+ x = self.fc2(x)
+ x = self.drop2(x)
+ return x
def init_t_xy(
@@ -349,11 +394,16 @@ def __init__(
use_rel_pos: bool = False,
rel_pos_zero_init: bool = True,
input_size: Optional[Tuple[int, int]] = None,
+ attn_type: AttentionType = AttentionType.Vanilla,
cls_token: bool = False,
use_rope: bool = False,
rope_theta: float = 10000.0,
rope_pt_size: Optional[Tuple[int, int]] = None,
rope_interp: bool = False,
+ rope_tiled: bool = False,
+ use_ve_rope: bool = False,
+ use_fa3: bool = False,
+ use_rope_real: bool = False,
):
"""
Args:
@@ -369,7 +419,9 @@ def __init__(
use_rope: whether to use rope 2d (indep of use_rel_pos, as it can be used together)
rope_theta: control frequencies of rope
rope_pt_size: size of rope in previous stage of training, needed for interpolation or tiling
+ rope_tiled: whether to tile rope or not; tile expected to be of size rope_pt_size x rope_pt_size
rope_interp: whether to interpolate (or extrapolate) rope to match input size
+ use_ve_rope: use ve orig rope implementation, if small numerical differences are important (normally not)
"""
super().__init__()
self.num_heads = num_heads
@@ -377,6 +429,7 @@ def __init__(
self.scale = self.head_dim**-0.5
self.cls_token = cls_token
+ self.attn_type = attn_type
self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
self.proj = nn.Linear(dim, dim)
@@ -388,6 +441,10 @@ def __init__(
self.rope_theta = rope_theta
self.rope_pt_size = rope_pt_size
self.rope_interp = rope_interp
+ self.rope_tiled = rope_tiled
+ self.use_ve_rope = use_ve_rope
+ self.use_fa3 = use_fa3
+ self.use_rope_real = use_rope_real
# init rel_pos embeddings and rope
self._setup_rel_pos(rel_pos_zero_init)
@@ -430,6 +487,15 @@ def _setup_rope_freqs(self) -> None:
if self.rope_pt_size is None:
self.rope_pt_size = self.input_size
+ if self.use_ve_rope:
+ assert not self.rope_tiled, "not supported"
+ self.rope = VisionRotaryEmbeddingVE(
+ dim=self.head_dim // 2,
+ seq_len=self.input_size[0],
+ pt_seq_len=self.rope_pt_size[0],
+ )
+ return
+
# initialize 2d rope freqs
self.compute_cis = partial(
compute_axial_cis,
@@ -437,16 +503,40 @@ def _setup_rope_freqs(self) -> None:
theta=self.rope_theta,
)
- # interpolate rope
- scale_pos = 1.0
- if self.rope_interp:
- scale_pos = self.rope_pt_size[0] / self.input_size[0]
- # get scaled freqs_cis
- freqs_cis = self.compute_cis(
- end_x=self.input_size[0],
- end_y=self.input_size[1],
- scale_pos=scale_pos,
- )
+ if self.rope_pt_size != self.input_size and self.rope_tiled:
+ assert not self.rope_interp
+ # window/tiled rope
+ freqs_cis = self.compute_cis(
+ end_x=self.rope_pt_size[0], end_y=self.rope_pt_size[1]
+ )
+ # check dims are tileable
+ rh, rw = (
+ self.input_size[0] // self.rope_pt_size[0],
+ self.input_size[1] // self.rope_pt_size[1],
+ )
+ assert rh >= 1, rw >= 1
+ assert (
+ self.input_size[0] % self.rope_pt_size[0] == 0
+ and self.input_size[1] % self.rope_pt_size[1] == 0
+ )
+
+ # restore spatial shape, tile and then flatten spatial dims
+ freqs_cis = (
+ freqs_cis.reshape(self.rope_pt_size[0], self.rope_pt_size[1], -1)
+ .tile(rh, rw, 1)
+ .reshape(-1, freqs_cis.shape[-1])
+ )
+ else:
+ # interpolate rope
+ scale_pos = 1.0
+ if self.rope_interp:
+ scale_pos = self.rope_pt_size[0] / self.input_size[0]
+ # get scaled freqs_cis
+ freqs_cis = self.compute_cis(
+ end_x=self.input_size[0],
+ end_y=self.input_size[1],
+ scale_pos=scale_pos,
+ )
if self.cls_token:
t = torch.zeros(
self.head_dim // 2,
@@ -457,12 +547,27 @@ def _setup_rope_freqs(self) -> None:
freqs_cis = torch.cat([cls_freqs_cis, freqs_cis], dim=0)
self.register_buffer("freqs_cis", freqs_cis)
+ if self.use_rope_real:
+ self.register_buffer("freqs_cis_real", freqs_cis.real)
+ self.register_buffer("freqs_cis_imag", freqs_cis.imag)
def _apply_rope(self, q, k) -> Tuple[Tensor, Tensor]:
if not self.use_rope:
return q, k
+ if self.use_ve_rope:
+ dtype = q.dtype
+ return self.rope(q).to(dtype), self.rope(k).to(dtype)
+
assert self.freqs_cis is not None
+
+ if self.use_rope_real:
+ return apply_rotary_enc_real(
+ q,
+ k,
+ freqs_cis_imag=self.freqs_cis_imag,
+ freqs_cis_real=self.freqs_cis_real,
+ )
return apply_rotary_enc(q, k, freqs_cis=self.freqs_cis)
def forward(self, x: Tensor) -> Tensor:
@@ -501,7 +606,17 @@ def forward(self, x: Tensor) -> Tensor:
q = q.reshape(B, self.num_heads, H * W, -1)
k = k.reshape(B, self.num_heads, H * W, -1)
- x = F.scaled_dot_product_attention(q, k, v)
+ if self.attn_type == AttentionType.Vanilla:
+ if self.use_fa3:
+ from sam3.perflib.fa3 import flash_attn_func
+
+ x = flash_attn_func(
+ q.transpose(1, 2), k.transpose(1, 2), v.transpose(1, 2)
+ ).transpose(1, 2)
+ else:
+ x = F.scaled_dot_product_attention(q, k, v)
+ else:
+ raise NotImplementedError
if ndim == 4:
x = (
@@ -541,6 +656,9 @@ def __init__(
cls_token: bool = False,
dropout: float = 0.0,
init_values: Optional[float] = None,
+ attn_type: AttentionType = AttentionType.Vanilla,
+ use_fa3: bool = False,
+ use_rope_real: bool = False,
):
"""
Args:
@@ -561,8 +679,10 @@ def __init__(
cls_token: whether a cls_token is present.
use_rope: whether to use rope 2d (indep of use_rel_pos, as it can be used together)
rope_pt_size: size of rope in previous stage of training, needed for interpolation or tiling
+ rope_tiled: whether to tile rope or not; tile expected to be of size rope_pt_size x rope_pt_size
rope_interp: whether to interpolate (or extrapolate) rope to match target input size,
expected to specify source size as rope_pt_size.
+ use_ve_rope: use ve orig rope implementation, if small numerical differences are important (normally not)
"""
super().__init__()
self.norm1 = norm_layer(dim)
@@ -573,10 +693,15 @@ def __init__(
use_rel_pos=use_rel_pos,
rel_pos_zero_init=rel_pos_zero_init,
input_size=input_size if window_size == 0 else (window_size, window_size),
+ attn_type=attn_type,
use_rope=use_rope,
rope_pt_size=rope_pt_size,
+ rope_tiled=rope_tiled,
rope_interp=rope_interp,
+ use_ve_rope=use_ve_rope,
cls_token=cls_token,
+ use_fa3=use_fa3,
+ use_rope_real=use_rope_real,
)
self.ls1 = (
LayerScale(dim, init_values=init_values) if init_values else nn.Identity()
@@ -642,19 +767,24 @@ def __init__(
window_size: int = 14,
global_att_blocks: Tuple[int, ...] = (2, 5, 8, 11),
use_rope: bool = False,
+ use_tiled_rope: bool = False,
rope_pt_size: Optional[int] = None,
use_interp_rope: bool = False,
+ use_ve_rope: bool = False,
+ use_act_checkpoint: bool = True,
pretrain_img_size: int = 224,
pretrain_use_cls_token: bool = True,
retain_cls_token: bool = True,
dropout: float = 0.0,
return_interm_layers: bool = False,
init_values: Optional[float] = None, # for layerscale
+ attn_type: AttentionType = AttentionType.Vanilla,
ln_pre: bool = False,
ln_post: bool = False,
bias_patch_embed: bool = True,
compile_mode: Optional[str] = None,
- use_act_checkpoint: bool = True,
+ use_fa3: bool = False,
+ use_rope_real: bool = False,
):
"""
Args:
@@ -768,10 +898,15 @@ def __init__(
if rope_pt_size is None
else (rope_pt_size, rope_pt_size)
),
+ rope_tiled=use_tiled_rope,
+ use_ve_rope=use_ve_rope,
rope_interp=use_interp_rope,
cls_token=self.retain_cls_token,
dropout=dropout,
init_values=init_values,
+ attn_type=attn_type,
+ use_fa3=use_fa3,
+ use_rope_real=use_rope_real,
)
if i not in window_block_indexes:
@@ -812,7 +947,14 @@ def _init_weights(self, m: nn.Module) -> None:
nn.init.constant_(m.bias, 0)
nn.init.constant_(m.weight, 1.0)
- def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
+ def forward(self, tensor_list):
+ if isinstance(tensor_list, NestedTensor):
+ x = tensor_list.tensors
+ mask = tensor_list.mask
+ else:
+ x = tensor_list
+ mask = None
+
x = self.patch_embed(x)
h, w = x.shape[1], x.shape[2]
@@ -835,6 +977,7 @@ def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
x = self.ln_pre(x)
outputs = []
+ masks = None
for i, blk in enumerate(self.blocks):
if self.use_act_checkpoint and self.training:
x = checkpoint.checkpoint(blk, x, use_reentrant=False)
@@ -856,7 +999,15 @@ def forward(self, x: torch.Tensor) -> List[torch.Tensor]:
feats.shape[0], h, w, feats.shape[-1]
).permute(0, 3, 1, 2)
- outputs.append(feats)
+ if isinstance(tensor_list, NestedTensor):
+ # Optimization, if the mask is all False, just ignore it
+ if mask is not None and mask.any() and masks is None:
+ masks = F.interpolate(
+ mask[None].float(), size=feats.shape[-2:]
+ ).bool()[0]
+ outputs.append(NestedTensor(feats, masks))
+ else:
+ outputs.append(feats)
return outputs
diff --git a/sam3/model/vl_combiner.py b/sam3/model/vl_combiner.py
index faf5504..b2cf102 100644
--- a/sam3/model/vl_combiner.py
+++ b/sam3/model/vl_combiner.py
@@ -12,7 +12,8 @@
from torch.nn.attention import sdpa_kernel, SDPBackend
from .act_ckpt_utils import activation_ckpt_wrapper
-from .necks import Sam3DualViTDetNeck
+from .data_misc import NestedTensor
+from .necks import Sam3DualViTDetNeck, Sam3TriViTDetNeck
class SAM3VLBackbone(nn.Module):
@@ -175,3 +176,255 @@ def _forward_text_no_ack_ckpt(
)
return output
+
+
+class SAM3VLBackboneTri(SAM3VLBackbone):
+ """VL backbone with triple-head vision (sam3, interactive, propagation) + text encoder."""
+
+ def __init__(self, visual, text, compile_visual=False, scalp=0):
+ super().__init__(
+ visual=visual, text=text, compile_visual=compile_visual, scalp=scalp
+ )
+ assert isinstance(self.vision_backbone, Sam3TriViTDetNeck), (
+ f"Expected vision backbone to be of type Sam3TriViTDetNeck, got {type(self.vision_backbone)}"
+ )
+
+ def forward_image(
+ self,
+ samples,
+ *,
+ need_sam3_out: bool = True,
+ need_interactive_out: bool = True,
+ need_propagation_out: bool = True,
+ ):
+ return activation_ckpt_wrapper(self._forward_image_tri_no_act_ckpt)(
+ samples=samples,
+ need_sam3_out=need_sam3_out,
+ need_interactive_out=need_interactive_out,
+ need_propagation_out=need_propagation_out,
+ act_ckpt_enable=self.act_ckpt_whole_vision_backbone and self.training,
+ )
+
+ def _forward_image_tri_no_act_ckpt(
+ self,
+ samples,
+ need_sam3_out=True,
+ need_interactive_out=True,
+ need_propagation_out=True,
+ ):
+ (
+ sam3_features,
+ sam3_pos,
+ interactive_features,
+ interactive_pos,
+ propagation_features,
+ propagation_pos,
+ ) = self.vision_backbone.forward(
+ samples,
+ need_sam3_out=need_sam3_out,
+ need_interactive_out=need_interactive_out,
+ need_propagation_out=need_propagation_out,
+ )
+ if self.scalp > 0:
+ sam3_features, sam3_pos = (
+ sam3_features[: -self.scalp],
+ sam3_pos[: -self.scalp],
+ )
+ interactive_features, interactive_pos = (
+ interactive_features[: -self.scalp],
+ interactive_pos[: -self.scalp],
+ )
+ propagation_features, propagation_pos = (
+ propagation_features[: -self.scalp],
+ propagation_pos[: -self.scalp],
+ )
+
+ output = {}
+ if need_sam3_out:
+ sam3_last = sam3_features[-1]
+ output.update(
+ {
+ "vision_features": sam3_last.tensors,
+ "vision_mask": sam3_last.mask,
+ "vision_pos_enc": sam3_pos,
+ "backbone_fpn": sam3_features,
+ }
+ )
+ if need_interactive_out:
+ inte_last = interactive_features[-1]
+ output["interactive"] = {
+ "vision_features": inte_last.tensors,
+ "vision_mask": inte_last.mask,
+ "vision_pos_enc": interactive_pos,
+ "backbone_fpn": interactive_features,
+ }
+ if need_propagation_out:
+ prop_last = propagation_features[-1]
+ output["sam2_backbone_out"] = {
+ "vision_features": prop_last.tensors,
+ "vision_mask": prop_last.mask,
+ "vision_pos_enc": propagation_pos,
+ "backbone_fpn": propagation_features,
+ }
+ return output
+
+
+class VisionOnly(nn.Module):
+ def __init__(
+ self,
+ visual,
+ n_features,
+ forward_in_chunk_for_eval=False,
+ eval_chunk_size=4,
+ eval_cast_to_cpu=False,
+ scalp=0,
+ compile_mode: str = None,
+ compile_extra_args: Optional[dict] = None,
+ ):
+ super().__init__()
+ self.vision_backbone = visual
+ self.should_compile = compile_mode is not None or compile_extra_args is not None
+ self.compile_mode = compile_mode
+ self.compile_extra_args = compile_extra_args or {}
+ self.compiled = False
+ self.n_features = n_features
+ self.forward_in_chunk_for_eval = forward_in_chunk_for_eval
+ self.eval_chunk_size = eval_chunk_size
+ self.eval_cast_to_cpu = eval_cast_to_cpu
+ self.scalp = scalp
+
+ def _compile(self):
+ if self.should_compile and not self.compiled:
+ self.vision_backbone = torch.compile(
+ self.vision_backbone, mode=self.compile_mode, **self.compile_extra_args
+ )
+ self.compiled = True
+
+ def forward_image(self, samples):
+ self._compile()
+ # Forward through backbone
+ features, pos = self.vision_backbone(samples)
+ if self.scalp > 0:
+ features, pos = features[: -self.scalp], pos[: -self.scalp]
+ elif self.scalp < 0:
+ features.pop(self.scalp)
+ pos.pop(self.scalp)
+
+ src, mask = features[-1].decompose()
+ output = {
+ "vision_features": src,
+ "vision_mask": mask,
+ "vision_pos_enc": pos,
+ "backbone_fpn": features,
+ }
+ return output
+
+ def forward_text(
+ self,
+ captions,
+ input_boxes=None,
+ additional_text=None,
+ device="cuda",
+ ):
+ bs = len(captions)
+ output = {
+ "language_features": torch.zeros((0, bs, self.n_features), device=device),
+ "language_mask": torch.zeros((bs, 0), device=device),
+ }
+ return output
+
+
+class TriHeadVisionOnly(VisionOnly):
+ def __init__(
+ self,
+ visual,
+ n_features,
+ forward_in_chunk_for_eval=False,
+ eval_chunk_size=4,
+ eval_cast_to_cpu=False,
+ scalp=0,
+ compile_mode: str = None,
+ compile_extra_args: Optional[dict] = None,
+ ):
+ super().__init__(
+ visual=visual,
+ n_features=n_features,
+ forward_in_chunk_for_eval=forward_in_chunk_for_eval,
+ eval_chunk_size=eval_chunk_size,
+ eval_cast_to_cpu=eval_cast_to_cpu,
+ scalp=scalp,
+ compile_mode=compile_mode,
+ compile_extra_args=compile_extra_args,
+ )
+ assert isinstance(self.vision_backbone, Sam3TriViTDetNeck), (
+ f"Expected vision backbone to be of type Sam3TriViTDetNeck, got {type(self.vision_backbone)}"
+ )
+
+ def forward_image(
+ self,
+ samples,
+ *,
+ need_sam3_out: bool = True,
+ need_interactive_out: bool = True,
+ need_propagation_out: bool = True,
+ ):
+ self._compile()
+ # Forward through backbone
+ (
+ sam3_features,
+ sam3_pos,
+ interactive_features,
+ interactive_pos,
+ propagation_features,
+ propagation_pos,
+ ) = self.vision_backbone(
+ samples,
+ need_sam3_out=need_sam3_out,
+ need_interactive_out=need_interactive_out,
+ need_propagation_out=need_propagation_out,
+ )
+
+ if self.scalp > 0:
+ sam3_features, sam3_pos = (
+ sam3_features[: -self.scalp],
+ sam3_pos[: -self.scalp],
+ )
+ interactive_features, interactive_pos = (
+ interactive_features[: -self.scalp],
+ interactive_pos[: -self.scalp],
+ )
+ propagation_features, propagation_pos = (
+ propagation_features[: -self.scalp],
+ propagation_pos[: -self.scalp],
+ )
+
+ output = {}
+
+ if need_sam3_out:
+ sam3_last = sam3_features[-1]
+ output.update(
+ {
+ "vision_features": sam3_last.tensors,
+ "vision_mask": sam3_last.mask,
+ "vision_pos_enc": sam3_pos,
+ "backbone_fpn": sam3_features,
+ }
+ )
+ if need_interactive_out:
+ inte_last = interactive_features[-1]
+ output["interactive"] = {
+ "vision_features": inte_last.tensors,
+ "vision_mask": inte_last.mask,
+ "vision_pos_enc": interactive_pos,
+ "backbone_fpn": interactive_features,
+ }
+ if need_propagation_out:
+ prop_last = propagation_features[-1]
+ output["sam2_backbone_out"] = {
+ "vision_features": prop_last.tensors,
+ "vision_mask": prop_last.mask,
+ "vision_pos_enc": propagation_pos,
+ "backbone_fpn": propagation_features,
+ }
+
+ return output
diff --git a/sam3/model_builder.py b/sam3/model_builder.py
index 103b324..af15056 100644
--- a/sam3/model_builder.py
+++ b/sam3/model_builder.py
@@ -11,10 +11,13 @@
from huggingface_hub import hf_hub_download
from iopath.common.file_io import g_pathmgr
from sam3.model.decoder import (
+ DecoupledTransformerDecoderLayerv2,
+ SimpleRoPEAttention,
TransformerDecoder,
TransformerDecoderLayer,
TransformerDecoderLayerv2,
TransformerEncoderCrossAttention,
+ TransformerEncoderDecoupledCrossAttention,
)
from sam3.model.encoder import TransformerEncoderFusion, TransformerEncoderLayer
from sam3.model.geometry_encoders import SequenceGeometryEncoder
@@ -31,7 +34,8 @@
MultiheadAttentionWrapper as MultiheadAttention,
TransformerWrapper,
)
-from sam3.model.necks import Sam3DualViTDetNeck
+from sam3.model.multiplex_utils import MultiplexController
+from sam3.model.necks import Sam3DualViTDetNeck, Sam3TriViTDetNeck
from sam3.model.position_encoding import PositionEmbeddingSine
from sam3.model.sam1_task_predictor import SAM3InteractiveImagePredictor
from sam3.model.sam3_image import Sam3Image, Sam3ImageOnVideoMultiGPU
@@ -40,8 +44,9 @@
from sam3.model.sam3_video_predictor import Sam3VideoPredictorMultiGPU
from sam3.model.text_encoder_ve import VETextEncoder
from sam3.model.tokenizer_ve import SimpleTokenizer
+from sam3.model.video_tracking_multiplex import VideoTrackingDynamicMultiplex
from sam3.model.vitdet import ViT
-from sam3.model.vl_combiner import SAM3VLBackbone
+from sam3.model.vl_combiner import SAM3VLBackbone, SAM3VLBackboneTri, TriHeadVisionOnly
from sam3.sam.transformer import RoPEAttention
@@ -69,7 +74,7 @@ def _create_position_encoding(precompute_resolution=None):
)
-def _create_vit_backbone(compile_mode=None):
+def _create_vit_backbone(compile_mode=None, use_fa3=False, use_rope_real=False):
"""Create ViT backbone for visual feature extraction."""
return ViT(
img_size=1008,
@@ -96,6 +101,8 @@ def _create_vit_backbone(compile_mode=None):
return_interm_layers=False,
bias_patch_embed=False,
compile_mode=compile_mode,
+ use_fa3=use_fa3,
+ use_rope_real=use_rope_real,
)
@@ -115,7 +122,10 @@ def _create_vl_backbone(vit_neck, text_encoder):
return SAM3VLBackbone(visual=vit_neck, text=text_encoder, scalp=1)
-def _create_transformer_encoder() -> TransformerEncoderFusion:
+def _create_transformer_encoder(
+ use_fa3: bool = False,
+ num_feature_levels: int = 1,
+) -> TransformerEncoderFusion:
"""Create transformer encoder with its layer."""
encoder_layer = TransformerEncoderLayer(
activation="relu",
@@ -131,12 +141,14 @@ def _create_transformer_encoder() -> TransformerEncoderFusion:
dropout=0.1,
embed_dim=256,
batch_first=True,
+ use_fa3=use_fa3,
),
cross_attention=MultiheadAttention(
num_heads=8,
dropout=0.1,
embed_dim=256,
batch_first=True,
+ use_fa3=use_fa3,
),
)
@@ -144,7 +156,7 @@ def _create_transformer_encoder() -> TransformerEncoderFusion:
layer=encoder_layer,
num_layers=6,
d_model=256,
- num_feature_levels=1,
+ num_feature_levels=num_feature_levels,
frozen=False,
use_act_checkpoint=True,
add_pooled_text_to_img_feat=False,
@@ -153,7 +165,7 @@ def _create_transformer_encoder() -> TransformerEncoderFusion:
return encoder
-def _create_transformer_decoder() -> TransformerDecoder:
+def _create_transformer_decoder(use_fa3=False) -> TransformerDecoder:
"""Create transformer decoder with its layer."""
decoder_layer = TransformerDecoderLayer(
activation="relu",
@@ -164,6 +176,7 @@ def _create_transformer_decoder() -> TransformerDecoder:
num_heads=8,
dropout=0.1,
embed_dim=256,
+ use_fa3=use_fa3,
),
n_heads=8,
use_text_cross_attention=True,
@@ -204,7 +217,7 @@ def _create_dot_product_scoring():
return DotProductScoring(d_model=256, d_proj=256, prompt_mlp=prompt_mlp)
-def _create_segmentation_head(compile_mode=None):
+def _create_segmentation_head(compile_mode=None, use_fa3=False):
"""Create segmentation head with pixel decoder."""
pixel_decoder = PixelDecoder(
num_upsampling_stages=3,
@@ -217,6 +230,7 @@ def _create_segmentation_head(compile_mode=None):
num_heads=8,
dropout=0,
embed_dim=256,
+ use_fa3=use_fa3,
)
segmentation_head = UniversalSegmentationHead(
@@ -296,6 +310,7 @@ def _create_sam3_model(
dot_prod_scoring,
inst_interactive_predictor,
eval_mode,
+ num_feature_levels: int = 1,
):
"""Create the SAM3 image model."""
common_params = {
@@ -303,7 +318,7 @@ def _create_sam3_model(
"transformer": transformer,
"input_geometry_encoder": input_geometry_encoder,
"segmentation_head": segmentation_head,
- "num_feature_levels": 1,
+ "num_feature_levels": num_feature_levels,
"o2m_mask_predict": True,
"dot_prod_scoring": dot_prod_scoring,
"use_instance_query": False,
@@ -515,10 +530,16 @@ def _create_vision_backbone(
return vit_neck
-def _create_sam3_transformer(has_presence_token: bool = True) -> TransformerWrapper:
+def _create_sam3_transformer(
+ has_presence_token: bool = True,
+ use_fa3: bool = False,
+ num_feature_levels: int = 1,
+) -> TransformerWrapper:
"""Create SAM3 transformer encoder and decoder."""
- encoder: TransformerEncoderFusion = _create_transformer_encoder()
- decoder: TransformerDecoder = _create_transformer_decoder()
+ encoder: TransformerEncoderFusion = _create_transformer_encoder(
+ use_fa3=use_fa3, num_feature_levels=num_feature_levels
+ )
+ decoder: TransformerDecoder = _create_transformer_decoder(use_fa3=use_fa3)
return TransformerWrapper(encoder=encoder, decoder=decoder, d_model=256)
@@ -566,6 +587,7 @@ def build_sam3_image_model(
enable_segmentation=True,
enable_inst_interactivity=False,
compile=False,
+ num_feature_levels: int = 1,
):
"""
Build SAM3 image model
@@ -600,7 +622,7 @@ def build_sam3_image_model(
backbone = _create_vl_backbone(vision_encoder, text_encoder)
# Create transformer components
- transformer = _create_sam3_transformer()
+ transformer = _create_sam3_transformer(num_feature_levels=num_feature_levels)
# Create dot product scoring
dot_prod_scoring = _create_dot_product_scoring()
@@ -628,9 +650,10 @@ def build_sam3_image_model(
dot_prod_scoring,
inst_predictor,
eval_mode,
+ num_feature_levels=num_feature_levels,
)
if load_from_HF and checkpoint_path is None:
- checkpoint_path = download_ckpt_from_hf()
+ checkpoint_path = download_ckpt_from_hf(version="sam3")
# Load checkpoint if provided
if checkpoint_path is not None:
_load_checkpoint(model, checkpoint_path)
@@ -641,12 +664,22 @@ def build_sam3_image_model(
return model
-def download_ckpt_from_hf():
- SAM3_MODEL_ID = "facebook/sam3"
- SAM3_CKPT_NAME = "sam3.pt"
- SAM3_CFG_NAME = "config.json"
- _ = hf_hub_download(repo_id=SAM3_MODEL_ID, filename=SAM3_CFG_NAME)
- checkpoint_path = hf_hub_download(repo_id=SAM3_MODEL_ID, filename=SAM3_CKPT_NAME)
+def download_ckpt_from_hf(version="sam3"):
+ """Download model checkpoint from HuggingFace Hub.
+
+ Args:
+ version: "sam3" or "sam3.1"
+ """
+ if version == "sam3.1":
+ repo_id = "facebook/sam3.1"
+ ckpt_name = "sam3.1_multiplex.pt"
+ cfg_name = "config.json"
+ else:
+ repo_id = "facebook/sam3"
+ ckpt_name = "sam3.pt"
+ cfg_name = "config.json"
+ _ = hf_hub_download(repo_id=repo_id, filename=cfg_name)
+ checkpoint_path = hf_hub_download(repo_id=repo_id, filename=ckpt_name)
return checkpoint_path
@@ -683,7 +716,9 @@ def build_sam3_video_model(
visual_neck = _create_vision_backbone()
text_encoder = _create_text_encoder(bpe_path)
backbone = SAM3VLBackbone(scalp=1, visual=visual_neck, text=text_encoder)
- transformer = _create_sam3_transformer(has_presence_token=has_presence_token)
+ transformer = _create_sam3_transformer(
+ has_presence_token=has_presence_token, num_feature_levels=1
+ )
segmentation_head: UniversalSegmentationHead = _create_segmentation_head()
input_geometry_encoder = _create_geometry_encoder()
@@ -772,7 +807,7 @@ def build_sam3_video_model(
# Load checkpoint if provided
if load_from_HF and checkpoint_path is None:
- checkpoint_path = download_ckpt_from_hf()
+ checkpoint_path = download_ckpt_from_hf(version="sam3")
if checkpoint_path is not None:
with g_pathmgr.open(checkpoint_path, "rb") as f:
ckpt = torch.load(f, map_location="cpu", weights_only=True)
@@ -795,3 +830,501 @@ def build_sam3_video_predictor(*model_args, gpus_to_use=None, **model_kwargs):
return Sam3VideoPredictorMultiGPU(
*model_args, gpus_to_use=gpus_to_use, **model_kwargs
)
+
+
+def _create_multiplex_maskmem_backbone(multiplex_count=16):
+ """Create the multiplex memory encoder with per-object mask channels."""
+ position_encoding = PositionEmbeddingSine(
+ num_pos_feats=256,
+ normalize=True,
+ scale=None,
+ temperature=10000,
+ precompute_resolution=1008,
+ )
+
+ mask_downsampler = SimpleMaskDownSampler(
+ kernel_size=3,
+ stride=2,
+ padding=1,
+ interpol_size=[1152, 1152],
+ multiplex_count=multiplex_count,
+ starting_out_chan=4,
+ input_channel_multiplier=2,
+ )
+
+ cx_block_layer = CXBlock(
+ dim=256,
+ kernel_size=7,
+ padding=3,
+ layer_scale_init_value=1.0e-06,
+ use_dwconv=True,
+ )
+
+ fuser = SimpleFuser(layer=cx_block_layer, num_layers=2)
+
+ maskmem_backbone = SimpleMaskEncoder(
+ out_dim=256,
+ position_encoding=position_encoding,
+ mask_downsampler=mask_downsampler,
+ fuser=fuser,
+ )
+
+ return maskmem_backbone
+
+
+def _create_multiplex_transformer(use_fa3=False, use_rope_real=False):
+ """Create the decoupled transformer for multiplex memory attention."""
+ self_attention_rope = SimpleRoPEAttention(
+ d_model=256,
+ num_heads=8,
+ dropout_p=0.1,
+ rope_theta=10000.0,
+ feat_sizes=[72, 72],
+ use_fa3=use_fa3,
+ use_rope_real=use_rope_real,
+ )
+
+ cross_attention_rope = SimpleRoPEAttention(
+ d_model=256,
+ num_heads=8,
+ dropout_p=0.1,
+ rope_theta=10000.0,
+ feat_sizes=[72, 72],
+ rope_k_repeat=True,
+ use_fa3=use_fa3,
+ use_rope_real=use_rope_real,
+ )
+
+ encoder_layer = DecoupledTransformerDecoderLayerv2(
+ activation="gelu",
+ d_model=256,
+ num_heads=8,
+ dropout=0.1,
+ dim_feedforward=2048,
+ pos_enc_at_attn=False,
+ pre_norm=True,
+ pos_enc_at_cross_attn_keys=True,
+ pos_enc_at_cross_attn_queries=False,
+ self_attention_rope=self_attention_rope,
+ cross_attention_rope=cross_attention_rope,
+ )
+
+ encoder = TransformerEncoderDecoupledCrossAttention(
+ d_model=256,
+ frozen=False,
+ pos_enc_at_input=True,
+ use_image_in_output=False,
+ layer=encoder_layer,
+ num_layers=4,
+ use_act_checkpoint=False,
+ batch_first=True,
+ )
+
+ transformer = TransformerWrapper(
+ encoder=encoder,
+ decoder=None,
+ d_model=256,
+ )
+
+ return transformer
+
+
+def _create_multiplex_tri_backbone(
+ compile_mode=None, use_fa3=False, use_rope_real=False
+):
+ """Create the TriHead vision backbone for multiplex model."""
+ position_encoding = _create_position_encoding(precompute_resolution=1008)
+ vit_backbone = _create_vit_backbone(
+ compile_mode=compile_mode, use_fa3=use_fa3, use_rope_real=use_rope_real
+ )
+ tri_neck = Sam3TriViTDetNeck(
+ trunk=vit_backbone,
+ position_encoding=position_encoding,
+ d_model=256,
+ scale_factors=[4.0, 2.0, 1.0],
+ )
+ return tri_neck
+
+
+def build_sam3_multiplex_video_model(
+ checkpoint_path: Optional[str] = None,
+ load_from_HF=True,
+ multiplex_count: int = 16,
+ use_fa3: bool = False,
+ use_rope_real: bool = False,
+ strict_state_dict_loading: bool = True,
+ device="cuda" if torch.cuda.is_available() else "cpu",
+ compile=False,
+):
+ """
+ Build SAM3 multiplex video tracking model.
+
+ Args:
+ checkpoint_path: Optional path to checkpoint file
+ multiplex_count: Number of objects per multiplex bucket
+ use_fa3: Whether to use FlashAttention 3
+ use_rope_real: Whether to use real-valued RoPE (for compile compat)
+ strict_state_dict_loading: Whether to use strict state dict loading
+ device: Device to place model on
+ compile: Whether to compile model components
+
+ Returns:
+ VideoTrackingDynamicMultiplex: The instantiated multiplex tracking model
+ """
+ # Build multiplex-specific components
+ maskmem_backbone = _create_multiplex_maskmem_backbone(
+ multiplex_count=multiplex_count
+ )
+ transformer = _create_multiplex_transformer(
+ use_fa3=use_fa3, use_rope_real=use_rope_real
+ )
+ tri_neck = _create_multiplex_tri_backbone(
+ compile_mode="max-autotune" if compile else None
+ )
+ backbone = TriHeadVisionOnly(
+ visual=tri_neck,
+ n_features=256,
+ scalp=0,
+ )
+ multiplex_controller = MultiplexController(
+ multiplex_count=multiplex_count,
+ eval_multiplex_count=multiplex_count,
+ )
+
+ # Build the multiplex model (use demo class for init_state and other demo methods)
+ from sam3.model.video_tracking_multiplex_demo import Sam3VideoTrackingMultiplexDemo
+
+ model = Sam3VideoTrackingMultiplexDemo(
+ backbone=backbone,
+ transformer=transformer,
+ maskmem_backbone=maskmem_backbone,
+ multiplex_controller=multiplex_controller,
+ image_size=1008,
+ backbone_stride=14,
+ num_maskmem=7,
+ # Multiplex-specific settings
+ use_high_res_features_in_sam=True,
+ use_obj_ptrs_in_encoder=True,
+ max_obj_ptrs_in_encoder=16,
+ add_tpos_enc_to_obj_ptrs=True,
+ proj_tpos_enc_in_obj_ptrs=True,
+ use_mlp_for_obj_ptr_proj=True,
+ pred_obj_scores=True,
+ pred_obj_scores_mlp=True,
+ fixed_no_obj_ptr=True,
+ use_no_obj_ptr=True,
+ use_linear_no_obj_ptr=True,
+ no_obj_embed_spatial=True,
+ sincos_tpos_enc=True,
+ # Multimask settings
+ multimask_output_in_sam=True,
+ multimask_output_for_tracking=True,
+ multimask_min_pt_num=0,
+ multimask_max_pt_num=1,
+ use_multimask_token_for_obj_ptr=True,
+ num_multimask_outputs=3,
+ # Memory encoder settings
+ apply_sigmoid_to_mask_logits_for_mem_enc=True,
+ sigmoid_scale_for_mem_enc=2.0,
+ sigmoid_bias_for_mem_enc=-1.0,
+ non_overlap_masks_for_mem_enc=False,
+ # Suppression/conditional embeddings
+ add_output_suppression_embeddings=True,
+ add_object_conditional_embeddings=False,
+ condition_as_mask_input=True,
+ condition_as_mask_input_fg=1.0,
+ condition_as_mask_input_bg=0.0,
+ # Memory settings
+ use_maskmem_tpos_v2=True,
+ save_image_features=True,
+ randomness_fix=True,
+ # Interaction settings
+ use_mask_input_as_output_without_sam=True,
+ directly_add_no_mem_embed=True,
+ iou_prediction_use_sigmoid=False,
+ forward_backbone_per_frame_for_eval=True,
+ offload_output_to_cpu_for_eval=False,
+ trim_past_non_cond_mem_for_eval=False,
+ max_cond_frames_in_attn=4,
+ # Dynamic multiplex settings
+ is_dynamic_model=True,
+ # SAM mask decoder extra args
+ sam_mask_decoder_extra_args={
+ "dynamic_multimask_via_stability": True,
+ "dynamic_multimask_stability_delta": 0.05,
+ "dynamic_multimask_stability_thresh": 0.98,
+ },
+ compile_all_components=compile,
+ use_memory_selection=False,
+ )
+
+ # Load checkpoint if provided
+ if load_from_HF and checkpoint_path is None:
+ checkpoint_path = download_ckpt_from_hf(version="sam3.1")
+ if checkpoint_path is not None:
+ with g_pathmgr.open(checkpoint_path, "rb") as f:
+ ckpt = torch.load(f, map_location="cpu", weights_only=True)
+ if "model" in ckpt and isinstance(ckpt["model"], dict):
+ ckpt = ckpt["model"]
+
+ missing_keys, unexpected_keys = model.load_state_dict(
+ ckpt, strict=strict_state_dict_loading
+ )
+ if missing_keys:
+ print(f"Missing keys: {missing_keys}")
+ if unexpected_keys:
+ print(f"Unexpected keys: {unexpected_keys}")
+
+ model.to(device=device)
+ return model
+
+
+def build_sam3_multiplex_video_predictor(
+ checkpoint_path: Optional[str] = None,
+ bpe_path: Optional[str] = None,
+ max_num_objects: int = 16,
+ multiplex_count: int = 16,
+ use_fa3: bool = True,
+ use_rope_real: bool = True,
+ compile: bool = False,
+ warm_up: bool = False,
+ session_expiration_sec: int = 1200,
+ default_output_prob_thresh: float = 0.5,
+ async_loading_frames: bool = True,
+):
+ """
+ Build a fully-initialized Sam3MultiplexVideoPredictor.
+
+ This is the recommended entry point for SAM 3.1 multiplex video tracking.
+ It builds the full model stack (tracker + detector + demo model), loads
+ the checkpoint, and wraps everything in Sam3MultiplexVideoPredictor with
+ handle_request / handle_stream_request API.
+
+ Args:
+ checkpoint_path: Path to the merged multiplex checkpoint
+ bpe_path: Path to the BPE tokenizer vocabulary
+ max_num_objects: Maximum number of tracked objects
+ multiplex_count: Number of objects per multiplex bucket
+ use_fa3: Whether to use FlashAttention 3
+ use_rope_real: Whether to use real-valued RoPE (for compile compat)
+ compile: Whether to enable torch.compile on model components
+ warm_up: Whether to run warm-up compilation (requires compile=True)
+ session_expiration_sec: Session expiration timeout in seconds
+ default_output_prob_thresh: Default probability threshold for output masks
+ async_loading_frames: Whether to load frames asynchronously
+
+ Returns:
+ Sam3MultiplexVideoPredictor: The fully-initialized predictor
+ """
+ if bpe_path is None:
+ bpe_path = pkg_resources.resource_filename(
+ "sam3", "assets/bpe_simple_vocab_16e6.txt.gz"
+ )
+
+ from sam3.model.sam3_multiplex_base import Sam3MultiplexPredictorWrapper
+ from sam3.model.sam3_multiplex_detector import Sam3MultiplexDetector
+ from sam3.model.sam3_multiplex_tracking import (
+ Sam3MultiplexTrackingWithInteractivity,
+ )
+ from sam3.model.sam3_multiplex_video_predictor import Sam3MultiplexVideoPredictor
+
+ # Build tracker
+ tracker_model = build_sam3_multiplex_video_model(
+ checkpoint_path=checkpoint_path,
+ load_from_HF=False,
+ multiplex_count=multiplex_count,
+ use_fa3=use_fa3,
+ use_rope_real=use_rope_real,
+ compile=False,
+ strict_state_dict_loading=False,
+ )
+ del tracker_model.backbone
+ tracker_model.backbone = None
+
+ sam2_predictor = Sam3MultiplexPredictorWrapper(
+ model=tracker_model,
+ per_obj_inference=False,
+ fill_hole_area=0,
+ is_multiplex=True,
+ is_multiplex_dynamic=True,
+ )
+
+ # Build detector
+ tri_neck = _create_multiplex_tri_backbone(
+ compile_mode=None, use_fa3=use_fa3, use_rope_real=use_rope_real
+ )
+ text_encoder = _create_text_encoder(bpe_path)
+ backbone = SAM3VLBackboneTri(scalp=0, visual=tri_neck, text=text_encoder)
+ transformer = _create_sam3_transformer(use_fa3=use_fa3)
+ segmentation_head = _create_segmentation_head(use_fa3=use_fa3)
+ geometry_encoder = _create_geometry_encoder()
+ dot_prod_scoring = _create_dot_product_scoring()
+
+ detector = Sam3MultiplexDetector(
+ num_feature_levels=1,
+ backbone=backbone,
+ transformer=transformer,
+ segmentation_head=segmentation_head,
+ semantic_segmentation_head=None,
+ input_geometry_encoder=geometry_encoder,
+ use_early_fusion=True,
+ use_dot_prod_scoring=True,
+ dot_prod_scoring=dot_prod_scoring,
+ supervise_joint_box_scores=True,
+ is_multiplex=True,
+ )
+
+ # Assemble demo model
+ demo_model = Sam3MultiplexTrackingWithInteractivity(
+ tracker=sam2_predictor,
+ detector=detector,
+ score_threshold_detection=0.4,
+ det_nms_thresh=0.1,
+ det_nms_use_iom=True,
+ assoc_iou_thresh=0.1,
+ new_det_thresh=0.65,
+ hotstart_delay=15,
+ hotstart_unmatch_thresh=8,
+ hotstart_dup_thresh=8,
+ suppress_unmatched_only_within_hotstart=False,
+ suppress_overlapping_based_on_recent_occlusion_threshold=0.7,
+ suppress_det_close_to_boundary=True,
+ fill_hole_area=0, # OV effectively 0 (Sam3MultiplexTrackerPredictor Hydra override clobbers yaml's 16)
+ recondition_every_nth_frame=16,
+ use_iom_recondition=True,
+ iom_thresh_recondition=0.5,
+ masklet_confirmation_enable=True,
+ reconstruction_bbox_iou_thresh=-1,
+ reconstruction_bbox_det_score=0.8,
+ max_num_objects=max_num_objects,
+ postprocess_batch_size=16,
+ use_batched_grounding=True,
+ batched_grounding_batch_size=16,
+ max_num_kboxes=0,
+ sprinkle_removal_area=0,
+ is_multiplex=True,
+ image_size=1008,
+ image_mean=(0.5, 0.5, 0.5),
+ image_std=(0.5, 0.5, 0.5),
+ compile_model=compile,
+ )
+
+ # Load checkpoint (auto-download from HuggingFace if not provided)
+ if checkpoint_path is None:
+ checkpoint_path = download_ckpt_from_hf(version="sam3.1")
+ if checkpoint_path is not None:
+ ckpt = torch.load(checkpoint_path, map_location="cpu", weights_only=True)
+ if "model" in ckpt and isinstance(ckpt["model"], dict):
+ ckpt = ckpt["model"]
+ # Remap checkpoint keys if needed (internal naming -> OSS naming)
+ # HF checkpoints are already remapped; local checkpoints may use old naming
+ needs_remap = any(
+ k.startswith("sam3_model.") or k.startswith("sam2_predictor.") for k in ckpt
+ )
+ if needs_remap:
+ remapped_ckpt = {}
+ for k, v in ckpt.items():
+ new_k = k
+ if k.startswith("sam3_model."):
+ new_k = "detector." + k[len("sam3_model.") :]
+ elif k.startswith("sam2_predictor."):
+ new_k = "tracker." + k[len("sam2_predictor.") :]
+ remapped_ckpt[new_k] = v
+ ckpt = remapped_ckpt
+ missing_keys, unexpected_keys = demo_model.load_state_dict(ckpt, strict=False)
+ if missing_keys:
+ print(f"Missing keys ({len(missing_keys)}): {missing_keys[:10]}...")
+ if unexpected_keys:
+ print(
+ f"Unexpected keys ({len(unexpected_keys)}): {unexpected_keys[:10]}..."
+ )
+
+ demo_model.cuda().eval()
+
+ # Wrap in predictor
+ predictor = Sam3MultiplexVideoPredictor(
+ model=demo_model,
+ session_expiration_sec=session_expiration_sec,
+ default_output_prob_thresh=default_output_prob_thresh,
+ async_loading_frames=async_loading_frames,
+ warm_up=warm_up,
+ )
+ return predictor
+
+
+def build_sam3_predictor(
+ checkpoint_path: Optional[str] = None,
+ bpe_path: Optional[str] = None,
+ version: str = "sam3.1", # "sam3" or "sam3.1"
+ compile: bool = False,
+ warm_up: bool = False,
+ # SAM 3.1 specific
+ max_num_objects: int = 16,
+ multiplex_count: int = 16,
+ # Common
+ use_fa3: bool = True,
+ use_rope_real: bool = True,
+ async_loading_frames: bool = True,
+ **kwargs,
+):
+ """
+ Build a SAM3 video predictor.
+
+ Args:
+ checkpoint_path: Path to model checkpoint
+ bpe_path: Path to BPE tokenizer vocabulary
+ version: Model version - "sam3" for base or "sam3.1" for multiplex
+ compile: Enable torch.compile for ~2x speedup (SAM 3.1 only currently)
+ warm_up: Run warm-up compilation passes
+ max_num_objects: Maximum tracked objects (SAM 3.1 only)
+ multiplex_count: Objects per multiplex bucket (SAM 3.1 only)
+ use_fa3: Use Flash Attention 3
+ use_rope_real: Use real-valued RoPE
+ async_loading_frames: Load video frames asynchronously
+ **kwargs: Additional arguments passed to the underlying builder
+
+ Returns:
+ A predictor with handle_request() and handle_stream_request() API.
+ Both versions support: start_session, add_prompt, propagate_in_video,
+ remove_object, reset_session, close_session.
+
+ Example:
+ # SAM 3.1 (auto-downloads from HuggingFace):
+ predictor = build_sam3_predictor(version="sam3.1", compile=True)
+
+ # SAM 3 (auto-downloads from HuggingFace):
+ predictor = build_sam3_predictor(version="sam3")
+
+ # Or with a local checkpoint:
+ predictor = build_sam3_predictor(checkpoint_path="path/to/ckpt.pt", version="sam3.1")
+
+ # Both use the same API:
+ response = predictor.handle_request({"type": "start_session", "resource_path": video_dir})
+ session_id = response["session_id"]
+ predictor.handle_request({"type": "add_prompt", "session_id": session_id, "frame_index": 0, "text": "person"})
+ for out in predictor.handle_stream_request({"type": "propagate_in_video", "session_id": session_id}):
+ masks = out["out_binary_masks"]
+ """
+ if version == "sam3.1":
+ return build_sam3_multiplex_video_predictor(
+ checkpoint_path=checkpoint_path,
+ bpe_path=bpe_path,
+ max_num_objects=max_num_objects,
+ multiplex_count=multiplex_count,
+ use_fa3=use_fa3,
+ use_rope_real=use_rope_real,
+ compile=compile,
+ warm_up=warm_up,
+ async_loading_frames=async_loading_frames,
+ **kwargs,
+ )
+ elif version == "sam3":
+ return build_sam3_video_predictor(
+ checkpoint_path=checkpoint_path,
+ bpe_path=bpe_path,
+ compile=compile,
+ async_loading_frames=async_loading_frames,
+ **kwargs,
+ )
+ else:
+ raise ValueError(f"Unknown version: {version!r}. Use 'sam3' or 'sam3.1'.")
diff --git a/sam3/perflib/compile.py b/sam3/perflib/compile.py
index 914c31b..8471406 100644
--- a/sam3/perflib/compile.py
+++ b/sam3/perflib/compile.py
@@ -2,7 +2,11 @@
# pyre-unsafe
+from functools import wraps
+
import torch
+from sam3.model.data_misc import BatchedDatapoint, NestedTensor
+from torch.utils._pytree import tree_map_only
def recursive_fn_factory(fn):
@@ -13,15 +17,18 @@ def recursive_fn(b):
return [recursive_fn(t) for t in b]
if isinstance(b, tuple):
return tuple(recursive_fn(t) for t in b)
+ if isinstance(b, NestedTensor):
+ tensors = fn(b.tensors)
+ if b.mask is None:
+ mask = None
+ else:
+ mask = fn(b.mask)
+ return NestedTensor(tensors=tensors, mask=mask)
if isinstance(b, torch.Tensor):
return fn(b)
- # Yes, writing out an explicit white list of
- # trivial types is tedious, but so are bugs that
- # come from not applying fn, when expected to have
- # applied it.
if b is None:
return b
- trivial_types = [bool, int]
+ trivial_types = [bool, int, float]
for t in trivial_types:
if isinstance(b, t):
return b
@@ -34,20 +41,41 @@ def recursive_fn(b):
recursive_clone = recursive_fn_factory(torch.clone)
+def clone_output_wrapper(f):
+ """
+ Clone the CUDA output tensors of a function to avoid in-place operations.
+ Uses tree_map_only (C-optimized pytree traversal) matching onevision's pattern.
+ Requires NestedTensor to be registered as a pytree node (see data_misc.py).
+ """
+
+ @wraps(f)
+ def wrapped(*args, **kwargs):
+ outputs = f(*args, **kwargs)
+ return tree_map_only(
+ torch.Tensor, lambda t: t.clone() if t.is_cuda else t, outputs
+ )
+
+ return wrapped
+
+
def compile_wrapper(
fn, *, mode="max-autotune", fullgraph=True, dynamic=False, name=None
):
+ """Compile with recursive_contiguous on inputs and recursive_clone on outputs.
+ Used for SAM2 tracker components that need contiguous inputs for CUDA graphs."""
compiled_fn = torch.compile(fn, mode=mode, fullgraph=fullgraph, dynamic=dynamic)
def compiled_fn_wrapper(*args, **kwargs):
with torch.autograd.profiler.record_function(
f"compiled {fn}" if name is None else name
):
- cont_args = recursive_contiguous(args)
- cont_kwargs = recursive_contiguous(kwargs)
- result = compiled_fn(*cont_args, **cont_kwargs)
- cloned_result = recursive_clone(result)
- return cloned_result
+ CUDAGRAPH_MODES = ["max-autotune", "reduce-overhead"]
+ args = recursive_contiguous(args)
+ kwargs = recursive_contiguous(kwargs)
+ result = compiled_fn(*args, **kwargs)
+ if mode in CUDAGRAPH_MODES:
+ result = recursive_clone(result)
+ return result
return compiled_fn_wrapper
@@ -56,11 +84,6 @@ def shape_logging_wrapper(fn, keep_kwargs, enable_logging=False):
"""
Wraps a function and prints the shapes of all tensor inputs.
Only prints when a new combination of shapes is seen.
- Thread-safe.
-
- Args:
- fn: Function to wrap
- enable_logging: Boolean flag to enable/disable logging
"""
seen_shapes = set()
@@ -89,7 +112,6 @@ def wrapper(*args, **kwargs):
print(f"[ShapeLogger] New input shapes for {fn.__qualname__}: {shapes}")
return fn(*args, **kwargs)
- # Allow toggling the flag at runtime
wrapper.enable_logging = enable_logging
def set_logging(enabled=False):
diff --git a/sam3/perflib/fused.py b/sam3/perflib/fused.py
new file mode 100644
index 0000000..6800cca
--- /dev/null
+++ b/sam3/perflib/fused.py
@@ -0,0 +1,25 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
+
+# pyre-unsafe
+
+import torch
+
+addmm_act_op = torch.ops.aten._addmm_activation
+
+
+def addmm_act(activation, linear, mat1):
+ if torch.is_grad_enabled():
+ raise ValueError("Expected grad to be disabled.")
+ self = linear.bias.detach()
+ mat2 = linear.weight.detach()
+ self = self.to(torch.bfloat16)
+ mat1 = mat1.to(torch.bfloat16)
+ mat2 = mat2.to(torch.bfloat16)
+ mat1_flat = mat1.view(-1, mat1.shape[-1])
+ if activation in [torch.nn.functional.relu, torch.nn.ReLU]:
+ y = addmm_act_op(self, mat1_flat, mat2.t(), beta=1, alpha=1, use_gelu=False)
+ return y.view(mat1.shape[:-1] + (y.shape[-1],))
+ if activation in [torch.nn.functional.gelu, torch.nn.GELU]:
+ y = addmm_act_op(self, mat1_flat, mat2.t(), beta=1, alpha=1, use_gelu=True)
+ return y.view(mat1.shape[:-1] + (y.shape[-1],))
+ raise ValueError(f"Unexpected activation {activation}")
diff --git a/sam3/perflib/iou.py b/sam3/perflib/iou.py
new file mode 100644
index 0000000..2b32f80
--- /dev/null
+++ b/sam3/perflib/iou.py
@@ -0,0 +1,38 @@
+import torch
+
+
+def pairwise_iou(pred_masks, gt_masks, eps=1e-6):
+ N, H, W = pred_masks.shape
+ M = gt_masks.shape[0]
+ # Flatten and convert to float for matmul
+ pred_flat = pred_masks.reshape(N, -1).float()
+ gt_flat = gt_masks.reshape(M, -1).float()
+ # Intersection: (N, M)
+ intersection = torch.matmul(pred_flat, gt_flat.t())
+ # Areas
+ area_pred = pred_flat.sum(dim=1, keepdim=True) # (N, 1)
+ area_gt = gt_flat.sum(dim=1, keepdim=True) # (M, 1)
+ # Union: (N, M)
+ union = area_pred + area_gt.t() - intersection
+ if eps is None:
+ iou = intersection / union.clamp(min=1)
+ else:
+ iou = intersection / (union + eps)
+ return iou # shape: (N, M)
+
+
+def pairwise_iom(pred_masks, gt_masks, eps=1e-8):
+ N, H, W = pred_masks.shape
+ M = gt_masks.shape[0]
+ # Flatten and convert to float for matmul
+ pred_flat = pred_masks.reshape(N, -1).float()
+ gt_flat = gt_masks.reshape(M, -1).float()
+ # Intersection: (N, M)
+ intersection = torch.matmul(pred_flat, gt_flat.t())
+ # Areas
+ area_pred = pred_flat.sum(dim=1, keepdim=True) # (N, 1)
+ area_gt = gt_flat.sum(dim=1, keepdim=True) # (M, 1)
+ # Union: (N, M)
+ min_area = torch.min(area_pred, area_gt)
+ iou = intersection / (min_area + eps)
+ return iou # shape: (N, M)
diff --git a/sam3/perflib/masks_ops.py b/sam3/perflib/masks_ops.py
index 7946996..806172d 100644
--- a/sam3/perflib/masks_ops.py
+++ b/sam3/perflib/masks_ops.py
@@ -50,6 +50,8 @@ def masks_to_boxes(masks: torch.Tensor, obj_ids: list[int]):
def mask_iou(pred_masks: torch.Tensor, gt_masks: torch.Tensor) -> torch.Tensor:
"""
Compute the IoU (Intersection over Union) between predicted masks and ground truth masks.
+ Uses matmul-based vectorized intersection for Tensor Core acceleration.
+
Args:
- pred_masks: (N, H, W) bool Tensor, containing binary predicted segmentation masks
- gt_masks: (M, H, W) bool Tensor, containing binary ground truth segmentation masks
@@ -57,15 +59,14 @@ def mask_iou(pred_masks: torch.Tensor, gt_masks: torch.Tensor) -> torch.Tensor:
- ious: (N, M) float Tensor, containing IoUs for each pair of predicted and ground truth masks
"""
assert pred_masks.dtype == gt_masks.dtype == torch.bool
- N, H, W = pred_masks.shape
- M, _, _ = gt_masks.shape
+ assert pred_masks.shape[1:] == gt_masks.shape[1:]
- # Flatten masks: (N, 1, H*W) and (1, M, H*W)
- pred_flat = pred_masks.view(N, 1, H * W)
- gt_flat = gt_masks.view(1, M, H * W)
+ # Matmul-based intersection (uses Tensor Cores via float mm)
+ m1_flat = pred_masks.flatten(1).float()
+ m2_flat = gt_masks.flatten(1).float()
+ intersection = torch.mm(m1_flat, m2_flat.t())
- # Compute intersection and union: (N, M)
- intersection = (pred_flat & gt_flat).sum(dim=2).float()
- union = (pred_flat | gt_flat).sum(dim=2).float()
- ious = intersection / union.clamp(min=1)
- return ious # shape: (N, M)
+ area1 = m1_flat.sum(dim=1)
+ area2 = m2_flat.sum(dim=1)
+ union = area1[:, None] + area2[None, :] - intersection
+ return intersection / union.clamp(min=1)
diff --git a/sam3/train/data/collator.py b/sam3/train/data/collator.py
index 4a0f2e8..38d031d 100644
--- a/sam3/train/data/collator.py
+++ b/sam3/train/data/collator.py
@@ -194,7 +194,7 @@ def collate_fn_api(
offset_img_id = 0
offset_query_id = [0 for _ in range(num_stages)]
- for i, data in enumerate(batch):
+ for data in batch:
img_batch.extend([img.data for img in data.images])
if data.raw_images is not None:
@@ -209,7 +209,7 @@ def collate_fn_api(
datapoint_query_id_2_stage_query_id.append(offset_query_id[stage_id])
offset_query_id[stage_id] += 1
- for j, q in enumerate(data.find_queries):
+ for q in data.find_queries:
stage_id = q.query_processing_order
stages[stage_id].img_ids.append(q.image_id + offset_img_id)
if q.query_text not in text_batch:
diff --git a/sam3/train/data/sam3_image_dataset.py b/sam3/train/data/sam3_image_dataset.py
index c5a1c83..28e7425 100644
--- a/sam3/train/data/sam3_image_dataset.py
+++ b/sam3/train/data/sam3_image_dataset.py
@@ -17,7 +17,6 @@
import torch
import torch.utils.data
import torchvision
-from decord import cpu, VideoReader
from iopath.common.file_io import g_pathmgr
from PIL import Image as PILImage
from PIL.Image import DecompressionBombError
@@ -202,6 +201,8 @@ def _load_images(
try:
if ".mp4" in path and path[-4:] == ".mp4":
# Going to load a video frame
+ from decord import cpu, VideoReader
+
video_path, frame = path.split("@")
video = VideoReader(video_path, ctx=cpu(0))
# Convert to PIL image
@@ -328,7 +329,7 @@ def load_queries(self, pil_images, annotations, queries, img_metadata):
f"Number of queries in stage {stage} is {num_queries}, expected {num_queries_per_stage}"
)
- for query_id, query in enumerate(queries):
+ for query in queries:
h, w = id2imsize[query["image_id"]]
if (
"input_box" in query
diff --git a/sam3/train/masks_ops.py b/sam3/train/masks_ops.py
index 113a9c4..eefc17b 100644
--- a/sam3/train/masks_ops.py
+++ b/sam3/train/masks_ops.py
@@ -36,6 +36,23 @@ def instance_masks_to_semantic_masks(
return torch.stack([torch.any(masks, dim=0) for masks in masks_per_query], dim=0)
+def mask_intersection_vectorized(masks1, masks2):
+ """
+ Vectorized computation of mask intersection using Matrix Multiplication.
+
+ Args:
+ masks1: tensor of shape (N, H, W)
+ masks2: tensor of shape (M, H, W)
+ Returns:
+ tensor of shape (N, M)
+ """
+ # Cast to float for Tensor Core acceleration via torch.mm
+ m1_flat = masks1.flatten(1).float()
+ m2_flat = masks2.flatten(1).float()
+ intersection = torch.mm(m1_flat, m2_flat.t())
+ return intersection.long()
+
+
def mask_intersection(masks1, masks2, block_size=16):
"""Compute the intersection of two sets of masks, without blowing the memory"""
@@ -63,8 +80,7 @@ def mask_iom(masks1, masks2):
assert masks1.shape[1:] == masks2.shape[1:]
assert masks1.dtype == torch.bool and masks2.dtype == torch.bool
- # intersection = (masks1[:, None] * masks2[None]).flatten(-2).sum(-1)
- intersection = mask_intersection(masks1, masks2)
+ intersection = mask_intersection_vectorized(masks1, masks2)
area1 = masks1.flatten(-2).sum(-1)
area2 = masks2.flatten(-2).sum(-1)
min_area = torch.min(area1[:, None], area2[None, :])
diff --git a/scripts/benchmark_sam3_artifacts.py b/scripts/benchmark_sam3_artifacts.py
new file mode 100644
index 0000000..f0ea143
--- /dev/null
+++ b/scripts/benchmark_sam3_artifacts.py
@@ -0,0 +1,351 @@
+import argparse
+import sys
+import time
+from pathlib import Path
+
+import numpy as np
+import torch
+from PIL import Image
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(REPO_ROOT))
+
+from sam3.model_builder import build_sam3_image_model
+from sam3.model.data_misc import FindStage
+from sam3.model.geometry_encoders import Prompt
+
+
+def _load_image(path: Path, device: torch.device) -> torch.Tensor:
+ image = Image.open(path).convert("RGB")
+ np_image = np.array(image, dtype=np.float32) / 255.0
+ tensor = torch.from_numpy(np_image).permute(2, 0, 1).unsqueeze(0)
+ return tensor.to(device)
+
+
+def _prepare_image(image: torch.Tensor, size: int) -> torch.Tensor:
+ image = image.clamp(0, 1)
+ image = torch.nn.functional.interpolate(
+ image, size=(size, size), mode="bilinear", align_corners=False
+ )
+ mean = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ std = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ return (image - mean) / std
+
+
+def _make_inputs(model, image: torch.Tensor, prompts):
+ device = image.device
+
+ tokenizer = model.backbone.language_backbone.tokenizer
+ token_ids = tokenizer(prompts, context_length=32).to(device)
+
+ return (
+ image,
+ token_ids,
+ )
+
+
+def _run_full_model(model, inputs):
+ images, token_ids = inputs
+ num_images = images.shape[0]
+ num_prompts = token_ids.shape[0]
+ device = images.device
+ bs = num_images * num_prompts
+
+ img_ids = torch.arange(num_images, device=device, dtype=torch.long)
+ img_ids = img_ids.repeat_interleave(num_prompts)
+ text_ids = torch.arange(num_prompts, device=device, dtype=torch.long)
+ text_ids = text_ids.repeat(num_images)
+
+ box_embeddings = torch.zeros(1, bs, 4, device=device)
+ box_mask = torch.zeros(bs, 1, device=device, dtype=torch.bool)
+ box_labels = torch.zeros(1, bs, device=device, dtype=torch.long)
+ backbone_out = model.backbone.forward_image(images)
+ text_encoder = model.backbone.language_backbone
+ _, text_tokens = text_encoder.encoder(token_ids)
+ text_tokens = text_tokens.transpose(0, 1)
+ text_memory = text_encoder.resizer(text_tokens)
+ text_attention_mask = token_ids.ne(0)
+ text_attention_mask = text_attention_mask.ne(1)
+ backbone_out["language_features"] = text_memory
+ backbone_out["language_mask"] = text_attention_mask
+
+ find_input = FindStage(
+ img_ids=img_ids,
+ text_ids=text_ids,
+ input_boxes=box_embeddings,
+ input_boxes_mask=box_mask,
+ input_boxes_label=box_labels,
+ input_points=torch.zeros(0, int(token_ids.shape[0]), 2, device=images.device),
+ input_points_mask=torch.zeros(
+ int(token_ids.shape[0]), 0, device=images.device, dtype=torch.bool
+ ),
+ )
+ geometric_prompt = Prompt(
+ box_embeddings=box_embeddings,
+ box_mask=box_mask,
+ box_labels=box_labels,
+ )
+ out = model.forward_grounding(
+ backbone_out=backbone_out,
+ find_input=find_input,
+ find_target=None,
+ geometric_prompt=geometric_prompt,
+ )
+ return out["pred_masks"], out["pred_boxes"], out["pred_logits"]
+
+
+def _make_decoder_only_inputs_from_model(
+ model,
+ backbone_fpn,
+ vision_pos_enc,
+ text_memory,
+ text_attention_mask,
+ inputs,
+):
+ images, token_ids = inputs
+ num_images = images.shape[0]
+ num_prompts = token_ids.shape[0]
+ device = images.device
+ bs = num_images * num_prompts
+
+ img_ids = torch.arange(num_images, device=device, dtype=torch.long)
+ img_ids = img_ids.repeat_interleave(num_prompts)
+ text_ids = torch.arange(num_prompts, device=device, dtype=torch.long)
+ text_ids = text_ids.repeat(num_images)
+
+ box_embeddings = torch.zeros(1, bs, 4, device=device)
+ box_mask = torch.zeros(bs, 1, device=device, dtype=torch.bool)
+ box_labels = torch.zeros(1, bs, device=device, dtype=torch.long)
+ backbone_out = {
+ "backbone_fpn": backbone_fpn,
+ "vision_pos_enc": vision_pos_enc,
+ "language_features": text_memory,
+ "language_mask": text_attention_mask,
+ }
+ find_input = FindStage(
+ img_ids=img_ids,
+ text_ids=text_ids,
+ input_boxes=box_embeddings,
+ input_boxes_mask=box_mask,
+ input_boxes_label=box_labels,
+ input_points=torch.zeros(0, int(token_ids.shape[0]), 2, device=images.device),
+ input_points_mask=torch.zeros(
+ int(token_ids.shape[0]), 0, device=images.device, dtype=torch.bool
+ ),
+ )
+ geometric_prompt = Prompt(
+ box_embeddings=box_embeddings,
+ box_mask=box_mask,
+ box_labels=box_labels,
+ )
+ prompt, prompt_mask, backbone_out = model._encode_prompt(
+ backbone_out, find_input, geometric_prompt
+ )
+ backbone_out, encoder_out, _ = model._run_encoder(backbone_out, find_input, prompt, prompt_mask)
+ return (
+ backbone_out["backbone_fpn"],
+ img_ids,
+ encoder_out["encoder_hidden_states"],
+ encoder_out["pos_embed"],
+ prompt,
+ prompt_mask,
+ encoder_out["level_start_index"],
+ encoder_out["spatial_shapes"],
+ encoder_out["valid_ratios"],
+ )
+
+
+def _load_export(path: Path):
+ exported = torch.export.load(str(path))
+ return exported.module()
+
+
+def _timeit(fn, iters: int, device: torch.device):
+ if device.type == "cuda":
+ torch.cuda.synchronize()
+ start = time.perf_counter()
+ for _ in range(iters):
+ fn()
+ if device.type == "cuda":
+ torch.cuda.synchronize()
+ end = time.perf_counter()
+ return (end - start) / iters
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--image",
+ type=Path,
+ default=Path("assets/images/cat_dog.jpg"),
+ help="Path to input image",
+ )
+ parser.add_argument(
+ "--prompts",
+ type=str,
+ default="cat,dog",
+ help="Comma-separated text prompts",
+ )
+ parser.add_argument(
+ "--device",
+ type=str,
+ default="cuda" if torch.cuda.is_available() else "cpu",
+ )
+ parser.add_argument(
+ "--artifact-dir",
+ type=Path,
+ default=Path("artifacts/export"),
+ help="Directory with exported artifacts",
+ )
+ parser.add_argument(
+ "--num-feature-levels",
+ type=int,
+ default=1,
+ help="Number of feature levels to use",
+ )
+ parser.add_argument("--warmup", type=int, default=3)
+ parser.add_argument("--iters", type=int, default=10)
+ args = parser.parse_args()
+
+ prompts = [p.strip() for p in args.prompts.split(",") if p.strip()]
+ if not prompts:
+ raise ValueError("Provide at least one prompt")
+
+ model = build_sam3_image_model(
+ device=args.device,
+ eval_mode=True,
+ enable_segmentation=True,
+ num_feature_levels=args.num_feature_levels,
+ )
+ model.eval()
+
+ device = torch.device(args.device)
+ image = _load_image(args.image, device)
+ image = _prepare_image(image, size=1008)
+ inputs = _make_inputs(model, image, prompts)
+
+ print("Device (eager):", next(model.parameters()).device)
+
+ def eager_fn():
+ _run_full_model(model, inputs)
+
+ with torch.inference_mode():
+ for _ in range(args.warmup):
+ eager_fn()
+ eager_ms = _timeit(eager_fn, args.iters, device) * 1000
+
+ if device.type == "cuda":
+ torch.cuda.empty_cache()
+
+ image_module = _load_export(args.artifact_dir / "image_encoder.pt2")
+ text_module = _load_export(args.artifact_dir / "text_encoder.pt2")
+ encoder_module = _load_export(args.artifact_dir / "encoder_fusion.pt2")
+ pipeline_module = _load_export(args.artifact_dir / "full_sam3_pipeline.pt2")
+ decoder_module = _load_export(args.artifact_dir / "decoder_only.pt2")
+ print("Device (export):", inputs[0].device)
+
+ def image_fn():
+ return image_module(inputs[0])
+
+ def text_fn():
+ return text_module(inputs[1])
+
+ def encoder_from_outputs(image_out, text_out):
+ vision_pos_enc = image_out[1]
+ backbone_fpn = image_out[2]
+ text_attention_mask, text_memory = text_out
+ img_feats = backbone_fpn[-1]
+ img_pos = vision_pos_enc[-1]
+ prompt_batch = text_attention_mask.shape[0]
+ if img_feats.shape[0] != prompt_batch:
+ if img_feats.shape[0] != 1:
+ raise ValueError("Image batch does not match prompt batch")
+ img_feats = img_feats.repeat(prompt_batch, 1, 1, 1)
+ img_pos = img_pos.repeat(prompt_batch, 1, 1, 1)
+ img_mask = torch.zeros(
+ img_feats.shape[0],
+ img_feats.shape[2],
+ img_feats.shape[3],
+ device=img_feats.device,
+ dtype=torch.bool,
+ )
+ encoder_module(img_feats, img_pos, img_mask, text_memory, text_attention_mask)
+
+ with torch.inference_mode():
+ cached_image_out = image_fn()
+ cached_text_out = text_fn()
+
+ def encoder_fn():
+ encoder_from_outputs(cached_image_out, cached_text_out)
+
+ pipeline_inputs = inputs
+
+ def pipeline_fn():
+ pipeline_module(*pipeline_inputs)
+
+ with torch.inference_mode():
+ cached_image_out = image_fn()
+ cached_text_out = text_fn()
+ decoder_only_inputs = _make_decoder_only_inputs_from_model(
+ model,
+ cached_image_out[2],
+ cached_image_out[1],
+ cached_text_out[1],
+ cached_text_out[0],
+ inputs,
+ )
+ (
+ decoder_backbone_fpn,
+ decoder_img_ids,
+ decoder_memory,
+ decoder_pos_embed,
+ decoder_prompt,
+ decoder_prompt_mask,
+ decoder_level_start_index,
+ decoder_spatial_shapes,
+ decoder_valid_ratios,
+ ) = decoder_only_inputs
+ if decoder_img_ids.shape[0] < 2:
+ repeat = 2 // decoder_img_ids.shape[0]
+ decoder_img_ids = decoder_img_ids.repeat(repeat)
+ decoder_memory = decoder_memory.repeat(1, repeat, 1)
+ decoder_pos_embed = decoder_pos_embed.repeat(1, repeat, 1)
+ decoder_prompt = decoder_prompt.repeat(1, repeat, 1)
+ decoder_prompt_mask = decoder_prompt_mask.repeat(repeat, 1)
+ decoder_valid_ratios = decoder_valid_ratios.repeat(repeat, 1, 1)
+ decoder_backbone_fpn = [feat.repeat(repeat, 1, 1, 1) for feat in decoder_backbone_fpn]
+ decoder_only_inputs = (
+ decoder_backbone_fpn,
+ decoder_img_ids,
+ decoder_memory,
+ decoder_pos_embed,
+ decoder_prompt,
+ decoder_prompt_mask,
+ decoder_level_start_index,
+ decoder_spatial_shapes,
+ decoder_valid_ratios,
+ )
+
+ def decoder_only_fn():
+ decoder_module(*decoder_only_inputs)
+
+ with torch.inference_mode():
+ for _ in range(args.warmup):
+ pipeline_fn()
+ image_ms = _timeit(image_fn, args.iters, device) * 1000
+ text_ms = _timeit(text_fn, args.iters, device) * 1000
+ encoder_ms = _timeit(encoder_fn, args.iters, device) * 1000
+ pipeline_ms = _timeit(pipeline_fn, args.iters, device) * 1000
+ decoder_only_ms = _timeit(decoder_only_fn, args.iters, device) * 1000
+
+ print("Eager total (ms):", round(eager_ms, 2))
+ print("Export full pipeline total (ms):", round(pipeline_ms, 2))
+ print("Export image encoder (ms):", round(image_ms, 2))
+ print("Export text encoder (ms):", round(text_ms, 2))
+ print("Export encoder fusion (ms):", round(encoder_ms, 2))
+ print("Export decoder only (ms):", round(decoder_only_ms, 2))
+ del model
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/benchmark_sam3_export_times.py b/scripts/benchmark_sam3_export_times.py
new file mode 100644
index 0000000..b45b6a8
--- /dev/null
+++ b/scripts/benchmark_sam3_export_times.py
@@ -0,0 +1,170 @@
+import argparse
+import sys
+import time
+from pathlib import Path
+
+import numpy as np
+import torch
+from PIL import Image
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(REPO_ROOT))
+
+from sam3.model_builder import build_sam3_image_model
+from tests.export.test_decoder_export import (
+ _export_decoder_only,
+ _export_full_sam3_pipeline,
+ _make_decoder_only_inputs,
+ _make_inputs,
+)
+from tests.export.test_encoder_export import EncoderFusionWrapper
+from tests.export.test_image_encoder_export import _export_image_encoder
+from tests.export.test_text_encoder_export import _export_text_encoder
+
+
+def _load_image(path: Path, device: torch.device) -> torch.Tensor:
+ image = Image.open(path).convert("RGB")
+ np_image = np.array(image, dtype=np.float32) / 255.0
+ return torch.from_numpy(np_image).permute(2, 0, 1).unsqueeze(0).to(device)
+
+
+def _prepare_image(image: torch.Tensor, size: int) -> torch.Tensor:
+ image = image.clamp(0, 1)
+ image = torch.nn.functional.interpolate(
+ image, size=(size, size), mode="bilinear", align_corners=False
+ )
+ mean = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ std = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ return (image - mean) / std
+
+
+def _time(label: str, fn) -> float:
+ start = time.perf_counter()
+ fn()
+ elapsed = time.perf_counter() - start
+ print(f"{label}: {elapsed:.3f}s")
+ return elapsed
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--image",
+ type=Path,
+ default=Path("assets/images/cat_dog.jpg"),
+ help="Path to input image",
+ )
+ parser.add_argument(
+ "--device",
+ type=str,
+ default="cuda" if torch.cuda.is_available() else "cpu",
+ )
+ parser.add_argument(
+ "--num-feature-levels",
+ type=int,
+ default=1,
+ help="Number of feature levels to use",
+ )
+ args = parser.parse_args()
+
+ device = torch.device(args.device)
+ model = build_sam3_image_model(
+ device=args.device,
+ eval_mode=True,
+ enable_segmentation=True,
+ num_feature_levels=args.num_feature_levels,
+ )
+ model.eval()
+
+ image = _prepare_image(_load_image(args.image, device), size=1008)
+ inputs = _make_inputs(1, 1008, 1008, str(device))
+
+ decoder_inputs = None
+ decoder_inputs_error = None
+ with torch.no_grad():
+ backbone_out = model.backbone.forward_image(image)
+ text_encoder = model.backbone.language_backbone
+ _, text_tokens = text_encoder.encoder(inputs[1])
+ text_tokens = text_tokens.transpose(0, 1)
+ text_memory = text_encoder.resizer(text_tokens)
+ text_attention_mask = inputs[1].ne(0)
+ text_attention_mask = text_attention_mask.ne(1)
+ img_feats = backbone_out["backbone_fpn"][-1]
+ img_pos = backbone_out["vision_pos_enc"][-1]
+ if img_feats.shape[0] < 2:
+ repeat = 2 // img_feats.shape[0]
+ img_feats = img_feats.repeat(repeat, 1, 1, 1)
+ img_pos = img_pos.repeat(repeat, 1, 1, 1)
+ text_memory = text_memory.repeat(1, repeat, 1)
+ text_attention_mask = text_attention_mask.repeat(repeat, 1)
+ img_mask = torch.zeros(
+ img_feats.shape[0],
+ img_feats.shape[2],
+ img_feats.shape[3],
+ device=img_feats.device,
+ dtype=torch.bool,
+ )
+ try:
+ decoder_inputs = _make_decoder_only_inputs(model, inputs)
+ except Exception as exc:
+ decoder_inputs_error = exc
+
+ def export_image_encoder():
+ _export_image_encoder(model, image)
+
+ def export_text_encoder():
+ _export_text_encoder(model, inputs[1])
+
+ def export_encoder_fusion():
+ encoder_wrapper = (
+ EncoderFusionWrapper(model.transformer.encoder).to(img_feats.device).eval()
+ )
+ if args.num_feature_levels != 1:
+ raise RuntimeError("encoder_fusion export currently expects num_feature_levels=1")
+ torch.export.export(
+ encoder_wrapper,
+ (img_feats, img_pos, img_mask, text_memory, text_attention_mask),
+ dynamic_shapes={
+ "img_feats": {0: torch.export.Dim.AUTO},
+ "img_pos": {0: torch.export.Dim.AUTO},
+ "img_mask": {0: torch.export.Dim.AUTO},
+ "prompt": {
+ 0: torch.export.Dim("seq", min=1, max=64),
+ 1: torch.export.Dim.AUTO,
+ },
+ "prompt_mask": {
+ 0: torch.export.Dim.AUTO,
+ 1: torch.export.Dim("seq", min=1, max=64),
+ },
+ },
+ strict=False,
+ prefer_deferred_runtime_asserts_over_guards=True,
+ )
+
+ def export_full_pipeline():
+ _export_full_sam3_pipeline(model, inputs)
+
+ def export_decoder_only():
+ if decoder_inputs_error is not None:
+ raise decoder_inputs_error
+ _export_decoder_only(model, decoder_inputs)
+
+ with torch.no_grad():
+ _time("Export image encoder", export_image_encoder)
+ _time("Export text encoder", export_text_encoder)
+ if args.num_feature_levels == 1:
+ _time("Export encoder fusion", export_encoder_fusion)
+ else:
+ print("Export encoder fusion: skipped (num_feature_levels != 1)")
+ try:
+ _time("Export full pipeline", export_full_pipeline)
+ except Exception as exc:
+ print(f"Export full pipeline: failed ({type(exc).__name__}: {exc})")
+ try:
+ _time("Export decoder only", export_decoder_only)
+ except Exception as exc:
+ print(f"Export decoder only: failed ({type(exc).__name__}: {exc})")
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/benchmark_sam3_load_time.py b/scripts/benchmark_sam3_load_time.py
new file mode 100644
index 0000000..47f5586
--- /dev/null
+++ b/scripts/benchmark_sam3_load_time.py
@@ -0,0 +1,46 @@
+import argparse
+import sys
+import time
+from pathlib import Path
+
+import torch
+import torchvision.ops # noqa: F401
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(REPO_ROOT))
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--artifact",
+ type=Path,
+ default=Path("artifacts/export/full_sam3_pipeline.pt2"),
+ help="Path to exported full pipeline artifact",
+ )
+ parser.add_argument(
+ "--device",
+ type=str,
+ default="cuda" if torch.cuda.is_available() else "cpu",
+ )
+ parser.add_argument(
+ "--num-feature-levels",
+ type=int,
+ default=1,
+ help="Unused; kept for parity with other benchmarks",
+ )
+ args = parser.parse_args()
+
+ if args.device.startswith("cuda"):
+ torch.cuda.synchronize()
+ start = time.perf_counter()
+ module = torch.export.load(str(args.artifact)).module()
+ module.to(args.device)
+ if args.device.startswith("cuda"):
+ torch.cuda.synchronize()
+ elapsed = time.perf_counter() - start
+ print(f"Load full pipeline to {args.device}: {elapsed:.3f}s")
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/export_sam3_artifacts.py b/scripts/export_sam3_artifacts.py
new file mode 100644
index 0000000..7cc0426
--- /dev/null
+++ b/scripts/export_sam3_artifacts.py
@@ -0,0 +1,185 @@
+import argparse
+import sys
+from pathlib import Path
+
+import numpy as np
+import torch
+from PIL import Image
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(REPO_ROOT))
+
+from sam3.model_builder import build_sam3_image_model
+from tests.export.test_decoder_export import (
+ _export_decoder_only,
+ _export_full_sam3_pipeline,
+ _make_decoder_only_inputs,
+)
+from tests.export.test_encoder_export import EncoderFusionWrapper
+from tests.export.test_image_encoder_export import _export_image_encoder
+from tests.export.test_text_encoder_export import _export_text_encoder
+
+
+def _load_image(path: Path, device: torch.device) -> torch.Tensor:
+ image = Image.open(path).convert("RGB")
+ np_image = np.array(image, dtype=np.float32) / 255.0
+ tensor = torch.from_numpy(np_image).permute(2, 0, 1).unsqueeze(0)
+ return tensor.to(device)
+
+
+def _prepare_image(image: torch.Tensor, size: int) -> torch.Tensor:
+ image = image.clamp(0, 1)
+ image = torch.nn.functional.interpolate(
+ image, size=(size, size), mode="bilinear", align_corners=False
+ )
+ mean = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ std = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ return (image - mean) / std
+
+
+def _make_inputs(model, image: torch.Tensor, prompts):
+ device = image.device
+
+ tokenizer = model.backbone.language_backbone.tokenizer
+ token_ids = tokenizer(prompts, context_length=32).to(device)
+
+ return (
+ image,
+ token_ids,
+ )
+
+
+def _save_export(exported, path: Path) -> None:
+ path.parent.mkdir(parents=True, exist_ok=True)
+ torch.export.save(exported, str(path))
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--image",
+ type=Path,
+ default=Path("assets/images/cat_dog.jpg"),
+ help="Path to input image",
+ )
+ parser.add_argument(
+ "--prompts",
+ type=str,
+ default="cat,dog",
+ help="Comma-separated text prompts",
+ )
+ parser.add_argument(
+ "--device",
+ type=str,
+ default="cuda" if torch.cuda.is_available() else "cpu",
+ )
+ parser.add_argument(
+ "--out-dir",
+ type=Path,
+ default=Path("artifacts/export"),
+ help="Directory to write exported artifacts",
+ )
+ args = parser.parse_args()
+
+ prompts = [p.strip() for p in args.prompts.split(",") if p.strip()]
+ if not prompts:
+ raise ValueError("Provide at least one prompt")
+
+ model = build_sam3_image_model(device=args.device, eval_mode=True, enable_segmentation=True)
+ model.eval()
+
+ image = _load_image(args.image, torch.device(args.device))
+ image = _prepare_image(image, size=1008)
+ inputs = _make_inputs(model, image, prompts)
+
+ print("Exporting image encoder...")
+ image_encoder = _export_image_encoder(model, inputs[0])
+ print("Exporting text encoder...")
+ text_encoder = _export_text_encoder(model, inputs[1])
+ print("Exporting encoder fusion...")
+ with torch.no_grad():
+ image_module = image_encoder.module()
+ text_module = text_encoder.module()
+ _, vision_pos_enc, backbone_fpn = image_module(inputs[0])
+ text_attention_mask, text_memory = text_module(inputs[1])
+ prompt = text_memory
+ prompt_mask = text_attention_mask
+ img_feats = backbone_fpn[-1]
+ img_pos = vision_pos_enc[-1]
+ img_mask = torch.zeros(
+ img_feats.shape[0],
+ img_feats.shape[2],
+ img_feats.shape[3],
+ device=img_feats.device,
+ dtype=torch.bool,
+ )
+ prompt_batch = prompt.shape[1]
+ if img_feats.shape[0] != prompt_batch:
+ if img_feats.shape[0] != 1:
+ raise ValueError("Image batch does not match prompt batch")
+ img_feats = img_feats.repeat(prompt_batch, 1, 1, 1)
+ img_pos = img_pos.repeat(prompt_batch, 1, 1, 1)
+ img_mask = img_mask.repeat(prompt_batch, 1, 1)
+
+ encoder_wrapper = EncoderFusionWrapper(model.transformer.encoder).to(img_feats.device).eval()
+ encoder = torch.export.export(
+ encoder_wrapper,
+ (img_feats, img_pos, img_mask, prompt, prompt_mask),
+ dynamic_shapes={
+ "img_feats": {0: torch.export.Dim("batch", min=1, max=4)},
+ "img_pos": {0: torch.export.Dim("batch", min=1, max=4)},
+ "img_mask": {0: torch.export.Dim("batch", min=1, max=4)},
+ "prompt": {0: 32, 1: torch.export.Dim("batch", min=1, max=4)},
+ "prompt_mask": {0: torch.export.Dim("batch", min=1, max=4), 1: 32},
+ },
+ strict=False,
+ prefer_deferred_runtime_asserts_over_guards=True,
+ )
+ print("Exporting full pipeline...")
+ pipeline_inputs = _make_inputs(model, image, prompts[:1])
+ full_pipeline = _export_full_sam3_pipeline(model, pipeline_inputs)
+ print("Exporting decoder only...")
+ decoder_inputs = _make_decoder_only_inputs(model, pipeline_inputs)
+ (
+ backbone_fpn,
+ img_ids,
+ memory,
+ pos_embed,
+ prompt,
+ prompt_mask,
+ level_start_index,
+ spatial_shapes,
+ valid_ratios,
+ ) = decoder_inputs
+ if img_ids.shape[0] < 2:
+ repeat = 2 // img_ids.shape[0]
+ img_ids = img_ids.repeat(repeat)
+ memory = memory.repeat(1, repeat, 1)
+ pos_embed = pos_embed.repeat(1, repeat, 1)
+ prompt = prompt.repeat(1, repeat, 1)
+ prompt_mask = prompt_mask.repeat(repeat, 1)
+ valid_ratios = valid_ratios.repeat(repeat, 1, 1)
+ backbone_fpn = [feat.repeat(repeat, 1, 1, 1) for feat in backbone_fpn]
+ decoder_inputs = (
+ backbone_fpn,
+ img_ids,
+ memory,
+ pos_embed,
+ prompt,
+ prompt_mask,
+ level_start_index,
+ spatial_shapes,
+ valid_ratios,
+ )
+ decoder_only = _export_decoder_only(model, decoder_inputs)
+
+ _save_export(image_encoder, args.out_dir / "image_encoder.pt2")
+ _save_export(text_encoder, args.out_dir / "text_encoder.pt2")
+ _save_export(encoder, args.out_dir / "encoder_fusion.pt2")
+ _save_export(full_pipeline, args.out_dir / "full_sam3_pipeline.pt2")
+ _save_export(decoder_only, args.out_dir / "decoder_only.pt2")
+ print("Saved exports to", args.out_dir)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/export_sam3_full_pipeline.py b/scripts/export_sam3_full_pipeline.py
new file mode 100644
index 0000000..9f50f3e
--- /dev/null
+++ b/scripts/export_sam3_full_pipeline.py
@@ -0,0 +1,188 @@
+import argparse
+import sys
+from pathlib import Path
+
+import numpy as np
+import torch
+from PIL import Image
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(REPO_ROOT))
+
+from typing import Any, cast
+
+from sam3.model_builder import build_sam3_image_model
+from sam3.model.data_misc import FindStage
+from sam3.model.geometry_encoders import Prompt
+
+
+class FullSam3PipelineWrapper(torch.nn.Module):
+ def __init__(self, model: torch.nn.Module):
+ super().__init__()
+ self.model = model
+
+ def forward(
+ self,
+ images: torch.Tensor,
+ token_ids: torch.Tensor,
+ ):
+ model = cast(Any, self.model)
+ num_images = images.shape[0]
+ num_prompts = token_ids.shape[0]
+ device = images.device
+ bs = num_images * num_prompts
+
+ img_ids = torch.arange(num_images, device=device, dtype=torch.long)
+ img_ids = img_ids.repeat_interleave(num_prompts)
+ text_ids = torch.arange(num_prompts, device=device, dtype=torch.long)
+ text_ids = text_ids.repeat(num_images)
+
+ box_embeddings = torch.zeros(1, bs, 4, device=device)
+ box_mask = torch.zeros(bs, 1, device=device, dtype=torch.bool)
+ box_labels = torch.zeros(1, bs, device=device, dtype=torch.long)
+
+ backbone_out = model.backbone.forward_image(images)
+ text_encoder = model.backbone.language_backbone
+ _, text_tokens = text_encoder.encoder(token_ids)
+ text_tokens = text_tokens.transpose(0, 1)
+ text_memory = text_encoder.resizer(text_tokens)
+ text_attention_mask = token_ids.ne(0)
+ text_attention_mask = text_attention_mask.ne(1)
+ backbone_out["language_features"] = text_memory
+ backbone_out["language_mask"] = text_attention_mask
+
+ find_input = FindStage(
+ img_ids=img_ids,
+ text_ids=text_ids,
+ input_boxes=box_embeddings,
+ input_boxes_mask=box_mask,
+ input_boxes_label=box_labels,
+ input_points=torch.zeros(0, bs, 2, device=device),
+ input_points_mask=torch.zeros(bs, 0, device=device, dtype=torch.bool),
+ )
+ geometric_prompt = Prompt(
+ box_embeddings=box_embeddings,
+ box_mask=box_mask,
+ box_labels=box_labels,
+ )
+ out = model.forward_grounding(
+ backbone_out=backbone_out,
+ find_input=find_input,
+ find_target=None,
+ geometric_prompt=geometric_prompt,
+ )
+ return (
+ out["pred_logits"],
+ out["pred_boxes"],
+ out["pred_masks"],
+ out.get("presence_logit_dec"),
+ )
+
+
+def _load_image(path: Path, device: torch.device) -> torch.Tensor:
+ image = Image.open(path).convert("RGB")
+ np_image = np.array(image, dtype=np.float32) / 255.0
+ return torch.from_numpy(np_image).permute(2, 0, 1).unsqueeze(0).to(device)
+
+
+def _prepare_image(image: torch.Tensor, size: int) -> torch.Tensor:
+ image = image.clamp(0, 1)
+ image = torch.nn.functional.interpolate(
+ image, size=(size, size), mode="bilinear", align_corners=False
+ )
+ mean = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ std = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ return (image - mean) / std
+
+
+def _make_inputs(model, image: torch.Tensor, prompts):
+ device = image.device
+ token_ids = model.backbone.language_backbone.tokenizer(prompts, context_length=32).to(device)
+ return (
+ image,
+ token_ids,
+ )
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--image",
+ type=Path,
+ default=Path("assets/images/cat_dog.jpg"),
+ help="Path to input image",
+ )
+ parser.add_argument(
+ "--prompts",
+ type=str,
+ default="cat,dog",
+ help="Comma-separated text prompts",
+ )
+ parser.add_argument(
+ "--device",
+ type=str,
+ default="cuda" if torch.cuda.is_available() else "cpu",
+ )
+ parser.add_argument(
+ "--num-feature-levels",
+ type=int,
+ default=1,
+ help="Number of feature levels to use",
+ )
+ parser.add_argument(
+ "--out-dir",
+ type=Path,
+ default=Path("artifacts/export"),
+ help="Directory to write exported artifact",
+ )
+ args = parser.parse_args()
+
+ prompts = [p.strip() for p in args.prompts.split(",") if p.strip()]
+ if not prompts:
+ raise ValueError("Provide at least one prompt")
+
+ model = build_sam3_image_model(
+ device=args.device,
+ eval_mode=True,
+ enable_segmentation=True,
+ num_feature_levels=args.num_feature_levels,
+ )
+ model.eval()
+
+ image = _prepare_image(_load_image(args.image, torch.device(args.device)), size=1008)
+ inputs = _make_inputs(model, image, prompts)
+ wrapper = FullSam3PipelineWrapper(model).to(image.device).eval()
+ if image.shape[0] < 2:
+ repeat = 2 // image.shape[0]
+ export_inputs = (
+ image.repeat(repeat, 1, 1, 1),
+ inputs[1].repeat(repeat, 1),
+ )
+ else:
+ export_inputs = inputs
+ with torch.no_grad():
+ exported = torch.export.export(
+ wrapper,
+ export_inputs,
+ dynamic_shapes={
+ "images": {
+ 0: torch.export.Dim.AUTO,
+ 2: 1008,
+ 3: 1008,
+ },
+ "token_ids": {
+ 0: torch.export.Dim("num_prompts", min=1),
+ 1: 32,
+ },
+ },
+ strict=False,
+ prefer_deferred_runtime_asserts_over_guards=True,
+ )
+ out_path = args.out_dir / "full_sam3_pipeline.pt2"
+ out_path.parent.mkdir(parents=True, exist_ok=True)
+ torch.export.save(exported, str(out_path))
+ print("Saved export to", out_path)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/measure_speed.py b/scripts/measure_speed.py
new file mode 100644
index 0000000..bacea1a
--- /dev/null
+++ b/scripts/measure_speed.py
@@ -0,0 +1,304 @@
+"""
+SAM3 Speed Test — supports both SAM3 and SAM3.1 (multiplex).
+
+Generates synthetic video with moving circles, runs text-prompt detection
++ propagation, and measures FPS. Checkpoints are auto-downloaded from
+HuggingFace if not provided.
+
+Usage:
+ # SAM 3.1 (default, auto-downloads from HuggingFace):
+ python scripts/measure_speed.py
+
+ # SAM 3 (non-multiplex):
+ python scripts/measure_speed.py --version sam3
+
+ # Custom settings:
+ python scripts/measure_speed.py --num_objects 32 --n_frames 100 --no-compile
+ python scripts/measure_speed.py --version sam3.1 --compile --num_objects 5
+"""
+
+import argparse
+import getpass
+import os
+import shutil
+import time
+
+import numpy as np
+import torch
+from PIL import Image, ImageDraw
+
+
+def max_memory_allocated():
+ max_memory_allocated_bytes = torch.cuda.max_memory_allocated()
+ _, total_memory = torch.cuda.mem_get_info()
+ max_memory_allocated_percentage = int(
+ 100 * (max_memory_allocated_bytes / total_memory)
+ )
+ max_memory_allocated_bytes = max_memory_allocated_bytes >> 20
+ print(
+ f"max_memory_allocated_bytes: {max_memory_allocated_bytes}MiB or {max_memory_allocated_percentage}%"
+ )
+
+
+def synthesize_video_data(
+ num_objects: int,
+ out_dir: str,
+ radius: int,
+ speed: int,
+ width: int,
+ height: int,
+ n_frames: int,
+):
+ circle_colors = [
+ tuple(np.random.randint(0, 256, size=3).tolist()) for _ in range(num_objects)
+ ]
+
+ if os.path.exists(out_dir):
+ shutil.rmtree(out_dir)
+ os.makedirs(out_dir, exist_ok=True)
+
+ positions = []
+ velocities = []
+ for _ in range(num_objects):
+ px = float(np.random.randint(radius, width - radius))
+ py = float(np.random.randint(radius, height - radius))
+ vx = np.random.choice([-1, 1]) * speed
+ vy = np.random.choice([-1, 1]) * speed
+ positions.append([px, py])
+ velocities.append([vx, vy])
+
+ print(f"Generate {n_frames} frames with {num_objects} objects")
+ for i in range(n_frames):
+ img = Image.new("RGB", (width, height), (0, 0, 0))
+ draw = ImageDraw.Draw(img)
+ for obj_idx in range(num_objects):
+ x, y = positions[obj_idx]
+ rx, ry = round(x), round(y)
+ draw.ellipse(
+ [(rx - radius, ry - radius), (rx + radius, ry + radius)],
+ fill=circle_colors[obj_idx],
+ )
+ vx, vy = velocities[obj_idx]
+ x += vx
+ y += vy
+ positions[obj_idx] = [
+ np.clip(x, radius, width - radius),
+ np.clip(y, radius, height - radius),
+ ]
+ if x - radius < 0 or x + radius > width:
+ vx *= -1
+ if y - radius < 0 or y + radius > height:
+ vy *= -1
+ velocities[obj_idx] = [vx, vy]
+
+ img.save(os.path.join(out_dir, f"{i:03d}.jpg"))
+
+
+def profiler_runner(fn, profile_save_dir=None, profile_end_frame=-1, *args, **kwargs):
+ if profile_save_dir is None:
+ profile_save_dir = os.path.expanduser("~/traces")
+
+ os.environ["ENABLE_PROFILING"] = "1"
+ os.environ["PROFILE_SAVE_DIR"] = profile_save_dir
+ if profile_end_frame >= 0:
+ os.environ["PROFILE_END_FRAME"] = str(profile_end_frame)
+
+ print(f"Profiling enabled. Traces will be saved to: {profile_save_dir}")
+ if profile_end_frame >= 0:
+ print(f"Profiling will stop at frame: {profile_end_frame}")
+
+ try:
+ result = fn(*args, **kwargs)
+ finally:
+ os.environ.pop("ENABLE_PROFILING", None)
+ os.environ.pop("PROFILE_SAVE_DIR", None)
+ os.environ.pop("PROFILE_END_FRAME", None)
+
+ return result
+
+
+def main_loop(model_wrapper, session_id, text_prompt):
+ model_wrapper.handle_request({"type": "reset_session", "session_id": session_id})
+ model_wrapper.handle_request(
+ {
+ "type": "add_prompt",
+ "session_id": session_id,
+ "frame_index": 0,
+ "text": text_prompt,
+ }
+ )
+
+ t0 = time.perf_counter()
+ frame_count = 0
+ for _response in model_wrapper.handle_stream_request(
+ {"type": "propagate_in_video", "session_id": session_id}
+ ):
+ frame_count += 1
+ torch.cuda.synchronize()
+ t1 = time.perf_counter()
+
+ if frame_count > 0:
+ return frame_count / (t1 - t0)
+ return -1
+
+
+def run_test(
+ version: str,
+ profile: bool,
+ video_dir: str,
+ num_objects: int,
+ radius: int,
+ speed: int,
+ width: int,
+ height: int,
+ n_frames: int,
+ synthesize_data: bool = True,
+ profile_save_dir: str = None,
+ profile_end_frame: int = -1,
+ do_compile: bool = True,
+ checkpoint_path: str = None,
+) -> float:
+ torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
+
+ if synthesize_data:
+ synthesize_video_data(
+ num_objects=num_objects,
+ out_dir=video_dir,
+ radius=radius,
+ speed=speed,
+ width=width,
+ height=height,
+ n_frames=n_frames,
+ )
+
+ from sam3 import build_sam3_predictor
+
+ print(f"Building {version} model...")
+ build_kwargs = dict(
+ version=version,
+ compile=do_compile,
+ async_loading_frames=False,
+ )
+ if checkpoint_path:
+ build_kwargs["checkpoint_path"] = checkpoint_path
+ if version == "sam3.1":
+ build_kwargs["warm_up"] = do_compile
+ build_kwargs["max_num_objects"] = num_objects
+
+ model_wrapper = build_sam3_predictor(**build_kwargs)
+
+ # Initialize session
+ response = model_wrapper.handle_request(
+ {"type": "start_session", "resource_path": video_dir}
+ )
+ session_id = response["session_id"]
+
+ print("\nWarm-up round.")
+ NUM_WARMUP_TRIES = 3
+ fps = 0
+ for _ in range(NUM_WARMUP_TRIES):
+ fps = max(
+ main_loop(
+ model_wrapper=model_wrapper, session_id=session_id, text_prompt="circle"
+ ),
+ fps,
+ )
+
+ print("\nProfile round.")
+ if profile:
+ profiler_runner(
+ main_loop,
+ profile_save_dir=profile_save_dir or os.path.expanduser("~/traces"),
+ profile_end_frame=profile_end_frame,
+ model_wrapper=model_wrapper,
+ session_id=session_id,
+ text_prompt="circle",
+ )
+ else:
+ fps = max(
+ main_loop(
+ model_wrapper=model_wrapper, session_id=session_id, text_prompt="circle"
+ ),
+ fps,
+ )
+
+ NUM_TRIES = 10
+ for i in range(NUM_TRIES):
+ torch.cuda.empty_cache()
+ torch.cuda.reset_peak_memory_stats()
+ print(f"\nTiming round {i + 1} ")
+ fps = max(
+ main_loop(
+ model_wrapper=model_wrapper, session_id=session_id, text_prompt="circle"
+ ),
+ fps,
+ )
+ print(f"Frames per second (FPS): {fps:.2f}")
+ max_memory_allocated()
+
+ if synthesize_data:
+ print("\nDeleting temporary video directory.")
+ shutil.rmtree(video_dir)
+
+ return fps
+
+
+if __name__ == "__main__":
+ username = getpass.getuser()
+ os.environ["TORCHINDUCTOR_CACHE_DIR"] = f"/tmp/torchinductor_cache_{username}"
+ os.environ["USE_PERFLIB"] = "1"
+
+ parser = argparse.ArgumentParser(description="SAM3 Speed Test")
+ parser.add_argument(
+ "--version",
+ type=str,
+ default="sam3.1",
+ choices=["sam3", "sam3.1"],
+ help="Model version (default: sam3.1)",
+ )
+ parser.add_argument(
+ "--checkpoint",
+ type=str,
+ default=None,
+ help="Path to checkpoint (auto-downloads from HuggingFace if not provided)",
+ )
+ parser.add_argument(
+ "--video_dir", type=str, default="/tmp/segment-anything-3/synth_video"
+ )
+ parser.add_argument("--num_objects", type=int, default=5)
+ parser.add_argument("--n_frames", type=int, default=50)
+ parser.add_argument("--radius", type=int, default=50)
+ parser.add_argument("--speed", type=int, default=20)
+ parser.add_argument("--width", type=int, default=1024)
+ parser.add_argument("--height", type=int, default=1024)
+ parser.add_argument(
+ "--no-compile",
+ action="store_false",
+ dest="compile",
+ help="Disable torch.compile",
+ )
+ parser.add_argument("--no-torch-profiling", action="store_false", dest="profile")
+ parser.add_argument(
+ "--no-data-synthesis", action="store_false", dest="synthesize_data"
+ )
+ parser.add_argument("--profile-save-dir", type=str, default=None)
+ parser.add_argument("--profile-end-frame", type=int, default=-1)
+
+ args = parser.parse_args()
+
+ run_test(
+ version=args.version,
+ profile=args.profile,
+ num_objects=args.num_objects,
+ video_dir=args.video_dir,
+ radius=args.radius,
+ speed=args.speed,
+ width=args.width,
+ height=args.height,
+ n_frames=args.n_frames,
+ synthesize_data=args.synthesize_data,
+ profile_save_dir=args.profile_save_dir,
+ profile_end_frame=args.profile_end_frame,
+ do_compile=args.compile,
+ checkpoint_path=args.checkpoint,
+ )
diff --git a/scripts/qualitative_test.py b/scripts/qualitative_test.py
new file mode 100644
index 0000000..2b5a881
--- /dev/null
+++ b/scripts/qualitative_test.py
@@ -0,0 +1,312 @@
+"""
+SAM3 Qualitative Test — supports both SAM3 and SAM3.1.
+
+Tests text prompt detection + propagation on a synthetic video.
+Checkpoints are auto-downloaded from HuggingFace.
+
+Usage:
+ python scripts/qualitative_test.py # SAM 3.1 default
+ python scripts/qualitative_test.py --version sam3 # SAM 3
+ python scripts/qualitative_test.py --video /path/to/video.mp4
+"""
+
+import argparse
+import getpass
+import os
+import shutil
+
+import cv2
+import matplotlib
+import numpy as np
+import torch
+from PIL import Image as PIL_Image, ImageDraw
+
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+from PIL import Image as PIL_Image, ImageDraw
+
+
+OUTPUT_DIR = "/tmp/sam3_qualitative_test"
+
+MASK_COLORS = [
+ (255, 0, 0),
+ (0, 255, 0),
+ (0, 0, 255),
+ (255, 255, 0),
+ (255, 0, 255),
+ (0, 255, 255),
+ (255, 128, 0),
+ (128, 0, 255),
+ (0, 128, 255),
+ (255, 64, 128),
+ (128, 255, 0),
+ (64, 128, 255),
+ (255, 200, 0),
+ (0, 200, 128),
+ (200, 0, 128),
+ (128, 128, 255),
+ (255, 128, 128),
+ (128, 255, 128),
+ (128, 128, 0),
+ (0, 128, 128),
+]
+
+
+def extract_frames(video_path, output_dir):
+ if os.path.exists(output_dir) and len(os.listdir(output_dir)) > 0:
+ n = len([f for f in os.listdir(output_dir) if f.endswith(".jpg")])
+ print(f"Using existing {n} frames in {output_dir}")
+ return n
+ if os.path.exists(output_dir):
+ shutil.rmtree(output_dir)
+ os.makedirs(output_dir)
+ cap = cv2.VideoCapture(video_path)
+ idx = 0
+ while True:
+ ret, frame = cap.read()
+ if not ret:
+ break
+ cv2.imwrite(os.path.join(output_dir, f"{idx:05d}.jpg"), frame)
+ idx += 1
+ cap.release()
+ print(f"Extracted {idx} frames to {output_dir}")
+ return idx
+
+
+def synthesize_video(out_dir, num_objects=5, n_frames=30, width=1024, height=1024):
+ if os.path.exists(out_dir):
+ shutil.rmtree(out_dir)
+ os.makedirs(out_dir)
+ colors = [
+ tuple(np.random.randint(0, 256, size=3).tolist()) for _ in range(num_objects)
+ ]
+ positions = [
+ [
+ float(np.random.randint(80, width - 80)),
+ float(np.random.randint(80, height - 80)),
+ ]
+ for _ in range(num_objects)
+ ]
+ velocities = [
+ [np.random.choice([-1, 1]) * 15, np.random.choice([-1, 1]) * 15]
+ for _ in range(num_objects)
+ ]
+ for i in range(n_frames):
+ img = PIL_Image.new("RGB", (width, height), (0, 0, 0))
+ draw = ImageDraw.Draw(img)
+ for j in range(num_objects):
+ x, y = positions[j]
+ draw.ellipse([(x - 50, y - 50), (x + 50, y + 50)], fill=colors[j])
+ vx, vy = velocities[j]
+ positions[j] = [
+ np.clip(x + vx, 50, width - 50),
+ np.clip(y + vy, 50, height - 50),
+ ]
+ if x < 50 or x > width - 50:
+ velocities[j][0] *= -1
+ if y < 50 or y > height - 50:
+ velocities[j][1] *= -1
+ img.save(os.path.join(out_dir, f"{i:05d}.jpg"))
+ print(f"Generated {n_frames} synthetic frames with {num_objects} circles")
+ return n_frames
+
+
+def load_frame(frame_dir, frame_idx):
+ return cv2.cvtColor(
+ cv2.imread(os.path.join(frame_dir, f"{frame_idx:05d}.jpg")),
+ cv2.COLOR_BGR2RGB,
+ )
+
+
+def render_overlay(frame_rgb, masks_by_obj, alpha=0.4):
+ overlay = frame_rgb.copy().astype(np.float32)
+ for obj_id, mask in sorted(masks_by_obj.items()):
+ color = MASK_COLORS[obj_id % len(MASK_COLORS)]
+ mask_bool = mask.astype(bool)
+ for c in range(3):
+ overlay[:, :, c] = np.where(
+ mask_bool,
+ overlay[:, :, c] * (1 - alpha) + color[c] * alpha,
+ overlay[:, :, c],
+ )
+ return overlay.astype(np.uint8)
+
+
+def save_overlay(frame_rgb, masks_by_obj, output_path, title=None):
+ overlay = render_overlay(frame_rgb, masks_by_obj)
+ fig, ax = plt.subplots(1, 1, figsize=(12, 7), dpi=100)
+ ax.imshow(overlay)
+ for obj_id, mask in sorted(masks_by_obj.items()):
+ mask_bool = mask.astype(bool)
+ if mask_bool.any():
+ ys, xs = np.where(mask_bool)
+ cx, cy = int(xs.mean()), int(ys.mean())
+ color_rgb = MASK_COLORS[obj_id % len(MASK_COLORS)]
+ facecolor = (color_rgb[0] / 255, color_rgb[1] / 255, color_rgb[2] / 255)
+ ax.text(
+ cx,
+ cy,
+ str(obj_id),
+ color="white",
+ fontsize=10,
+ ha="center",
+ va="center",
+ fontweight="bold",
+ bbox=dict(boxstyle="round,pad=0.2", facecolor=facecolor, alpha=0.8),
+ )
+ if title:
+ ax.set_title(title, fontsize=12, fontweight="bold", pad=8)
+ ax.axis("off")
+ fig.tight_layout(pad=0)
+ fig.savefig(output_path, bbox_inches="tight", pad_inches=0)
+ plt.close(fig)
+
+
+def collect_propagation(model, session_id):
+ mask_dict = {}
+ for response in model.handle_stream_request(
+ {"type": "propagate_in_video", "session_id": session_id}
+ ):
+ frame_idx = response.get("frame_index")
+ if frame_idx is None:
+ continue
+ outputs = response.get("outputs", {})
+ obj_ids = outputs.get("out_obj_ids", [])
+ binary_masks = outputs.get("out_binary_masks")
+ if binary_masks is None:
+ mask_dict[frame_idx] = {}
+ continue
+ if isinstance(obj_ids, torch.Tensor):
+ obj_ids = obj_ids.cpu().numpy()
+ if isinstance(binary_masks, torch.Tensor):
+ binary_masks = binary_masks.cpu().numpy()
+ masks = {}
+ for i, oid in enumerate(obj_ids):
+ m = binary_masks[i]
+ if m.ndim == 3:
+ m = m[0]
+ masks[int(oid)] = m
+ mask_dict[frame_idx] = masks
+ torch.cuda.synchronize()
+ return mask_dict
+
+
+def main():
+ parser = argparse.ArgumentParser(description="SAM3 Qualitative Test")
+ parser.add_argument(
+ "--version", type=str, default="sam3.1", choices=["sam3", "sam3.1"]
+ )
+ parser.add_argument(
+ "--video",
+ type=str,
+ default=None,
+ help="Path to video file. If not provided, generates synthetic video.",
+ )
+ parser.add_argument(
+ "--checkpoint",
+ type=str,
+ default=None,
+ help="Path to checkpoint (auto-downloads from HuggingFace if not provided)",
+ )
+ parser.add_argument(
+ "--text_prompt", type=str, default="circle", help="Text prompt for detection"
+ )
+ parser.add_argument(
+ "--n_frames", type=int, default=30, help="Number of frames for synthetic video"
+ )
+ args = parser.parse_args()
+
+ username = getpass.getuser()
+ os.environ["TORCHINDUCTOR_CACHE_DIR"] = f"/tmp/torchinductor_cache_{username}"
+ os.environ["USE_PERFLIB"] = "1"
+ torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
+
+ # Prepare video frames
+ frame_dir = "/tmp/sam3_qualitative_frames"
+ if args.video:
+ n_frames = extract_frames(args.video, frame_dir)
+ else:
+ n_frames = synthesize_video(frame_dir, n_frames=args.n_frames)
+
+ img = load_frame(frame_dir, 0)
+ img_h, img_w = img.shape[:2]
+ print(f"Video: {img_w}x{img_h}, {n_frames} frames")
+
+ # Build model
+ from sam3 import build_sam3_predictor
+
+ print(f"\nBuilding {args.version} model...")
+ build_kwargs = dict(version=args.version, compile=False, async_loading_frames=False)
+ if args.checkpoint:
+ build_kwargs["checkpoint_path"] = args.checkpoint
+ model = build_sam3_predictor(**build_kwargs)
+
+ # Start session
+ response = model.handle_request(
+ {"type": "start_session", "resource_path": frame_dir}
+ )
+ session_id = response["session_id"]
+ print(f"Session: {session_id}")
+
+ # Test: text prompt -> propagate
+ out_dir = os.path.join(OUTPUT_DIR, f"{args.version}_text_{args.text_prompt}")
+ if os.path.exists(out_dir):
+ shutil.rmtree(out_dir)
+ os.makedirs(out_dir)
+
+ print(f"\nTest: text prompt '{args.text_prompt}' -> propagate")
+ model.handle_request(
+ {
+ "type": "add_prompt",
+ "session_id": session_id,
+ "frame_index": 0,
+ "text": args.text_prompt,
+ }
+ )
+
+ mask_dict = collect_propagation(model, session_id)
+ print(f"Propagated through {len(mask_dict)} frames")
+
+ # Save overlays
+ saved = 0
+ for frame_idx in sorted(mask_dict.keys()):
+ if frame_idx % 5 != 0:
+ continue
+ masks = mask_dict[frame_idx]
+ if not masks:
+ continue
+ frame_rgb = load_frame(frame_dir, frame_idx)
+ save_overlay(
+ frame_rgb,
+ masks,
+ os.path.join(out_dir, f"frame_{frame_idx:05d}.png"),
+ title=f"{args.version} | frame {frame_idx} | {len(masks)} objects",
+ )
+ saved += 1
+
+ # Print results
+ frame0 = mask_dict.get(0, {})
+ print(f"\nDetected {len(frame0)} objects on frame 0:")
+ for obj_id, mask in sorted(frame0.items()):
+ mask_bool = mask.astype(bool)
+ n_pixels = int(mask_bool.sum())
+ if mask_bool.any():
+ ys, xs = np.where(mask_bool)
+ print(
+ f" obj {obj_id}: centroid ({int(xs.mean())}, {int(ys.mean())}), {n_pixels} pixels"
+ )
+
+ print(f"\nSaved {saved} overlay images to {out_dir}")
+ print(
+ "QUALITATIVE TEST PASSED"
+ if len(frame0) > 0
+ else "WARNING: No objects detected!"
+ )
+
+ # Cleanup
+ if not args.video:
+ shutil.rmtree(frame_dir)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/test_sam3_artifacts.py b/scripts/test_sam3_artifacts.py
new file mode 100644
index 0000000..5d6c678
--- /dev/null
+++ b/scripts/test_sam3_artifacts.py
@@ -0,0 +1,447 @@
+import argparse
+import sys
+from pathlib import Path
+
+import numpy as np
+import torch
+from PIL import Image
+from PIL import ImageDraw
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(REPO_ROOT))
+
+from sam3.model_builder import build_sam3_image_model
+from sam3.model.data_misc import FindStage
+from sam3.model.geometry_encoders import Prompt
+
+
+def _load_pil_image(path: Path) -> Image.Image:
+ return Image.open(path).convert("RGB")
+
+
+def _pil_to_tensor(image: Image.Image, device: torch.device) -> torch.Tensor:
+ np_image = np.array(image, dtype=np.float32) / 255.0
+ tensor = torch.from_numpy(np_image).permute(2, 0, 1).unsqueeze(0)
+ return tensor.to(device)
+
+
+def _load_image(path: Path, device: torch.device) -> torch.Tensor:
+ return _pil_to_tensor(_load_pil_image(path), device)
+
+
+def _prepare_image(image: torch.Tensor, size: int) -> torch.Tensor:
+ image = image.clamp(0, 1)
+ image = torch.nn.functional.interpolate(
+ image, size=(size, size), mode="bilinear", align_corners=False
+ )
+ mean = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ std = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ return (image - mean) / std
+
+
+def _make_inputs(model, image: torch.Tensor, prompts):
+ device = image.device
+
+ tokenizer = model.backbone.language_backbone.tokenizer
+ token_ids = tokenizer(prompts, context_length=32).to(device)
+
+ return (
+ image,
+ token_ids,
+ )
+
+
+def _run_full_model(model, inputs):
+ images, token_ids = inputs
+ num_images = images.shape[0]
+ num_prompts = token_ids.shape[0]
+ device = images.device
+ bs = num_images * num_prompts
+
+ img_ids = torch.arange(num_images, device=device, dtype=torch.long)
+ img_ids = img_ids.repeat_interleave(num_prompts)
+ text_ids = torch.arange(num_prompts, device=device, dtype=torch.long)
+ text_ids = text_ids.repeat(num_images)
+
+ box_embeddings = torch.zeros(1, bs, 4, device=device)
+ box_mask = torch.zeros(bs, 1, device=device, dtype=torch.bool)
+ box_labels = torch.zeros(1, bs, device=device, dtype=torch.long)
+ backbone_out = model.backbone.forward_image(images)
+ text_encoder = model.backbone.language_backbone
+ _, text_tokens = text_encoder.encoder(token_ids)
+ text_tokens = text_tokens.transpose(0, 1)
+ text_memory = text_encoder.resizer(text_tokens)
+ text_attention_mask = token_ids.ne(0)
+ text_attention_mask = text_attention_mask.ne(1)
+ backbone_out["language_features"] = text_memory
+ backbone_out["language_mask"] = text_attention_mask
+
+ find_input = FindStage(
+ img_ids=img_ids,
+ text_ids=text_ids,
+ input_boxes=box_embeddings,
+ input_boxes_mask=box_mask,
+ input_boxes_label=box_labels,
+ input_points=torch.zeros(0, int(token_ids.shape[0]), 2, device=images.device),
+ input_points_mask=torch.zeros(
+ int(token_ids.shape[0]), 0, device=images.device, dtype=torch.bool
+ ),
+ )
+ geometric_prompt = Prompt(
+ box_embeddings=box_embeddings,
+ box_mask=box_mask,
+ box_labels=box_labels,
+ )
+ out = model.forward_grounding(
+ backbone_out=backbone_out,
+ find_input=find_input,
+ find_target=None,
+ geometric_prompt=geometric_prompt,
+ )
+ return (
+ out["pred_masks"],
+ out["pred_boxes"],
+ out["pred_logits"],
+ out["pred_boxes_xyxy"],
+ )
+
+
+def _make_decoder_only_inputs_from_model(
+ model,
+ backbone_fpn,
+ vision_pos_enc,
+ text_memory,
+ text_attention_mask,
+ inputs,
+):
+ images, token_ids = inputs
+ num_images = images.shape[0]
+ num_prompts = token_ids.shape[0]
+ device = images.device
+ bs = num_images * num_prompts
+
+ img_ids = torch.arange(num_images, device=device, dtype=torch.long)
+ img_ids = img_ids.repeat_interleave(num_prompts)
+ text_ids = torch.arange(num_prompts, device=device, dtype=torch.long)
+ text_ids = text_ids.repeat(num_images)
+
+ box_embeddings = torch.zeros(1, bs, 4, device=device)
+ box_mask = torch.zeros(bs, 1, device=device, dtype=torch.bool)
+ box_labels = torch.zeros(1, bs, device=device, dtype=torch.long)
+ backbone_out = {
+ "backbone_fpn": backbone_fpn,
+ "vision_pos_enc": vision_pos_enc,
+ "language_features": text_memory,
+ "language_mask": text_attention_mask,
+ }
+ find_input = FindStage(
+ img_ids=img_ids,
+ text_ids=text_ids,
+ input_boxes=box_embeddings,
+ input_boxes_mask=box_mask,
+ input_boxes_label=box_labels,
+ input_points=torch.zeros(0, int(token_ids.shape[0]), 2, device=images.device),
+ input_points_mask=torch.zeros(
+ int(token_ids.shape[0]), 0, device=images.device, dtype=torch.bool
+ ),
+ )
+ geometric_prompt = Prompt(
+ box_embeddings=box_embeddings,
+ box_mask=box_mask,
+ box_labels=box_labels,
+ )
+ prompt, prompt_mask, backbone_out = model._encode_prompt(
+ backbone_out, find_input, geometric_prompt
+ )
+ backbone_out, encoder_out, _ = model._run_encoder(backbone_out, find_input, prompt, prompt_mask)
+ return (
+ backbone_out["backbone_fpn"],
+ img_ids,
+ encoder_out["encoder_hidden_states"],
+ encoder_out["pos_embed"],
+ prompt,
+ prompt_mask,
+ encoder_out["level_start_index"],
+ encoder_out["spatial_shapes"],
+ encoder_out["valid_ratios"],
+ )
+
+
+def _load_export(path: Path):
+ exported = torch.export.load(str(path))
+ return exported.module()
+
+
+def _to_pil_image(image: torch.Tensor) -> Image.Image:
+ image = image.detach().cpu().clamp(0, 1)
+ np_image = (image.permute(1, 2, 0).numpy() * 255.0).astype(np.uint8)
+ return Image.fromarray(np_image)
+
+
+def _color_palette(num_colors: int):
+ base = [
+ (255, 99, 71),
+ (65, 105, 225),
+ (60, 179, 113),
+ (238, 130, 238),
+ (255, 215, 0),
+ (255, 165, 0),
+ ]
+ return [base[i % len(base)] for i in range(num_colors)]
+
+
+def _overlay_masks(image: Image.Image, masks: torch.Tensor, scores: torch.Tensor, out_path: Path):
+ num_prompts, num_queries = scores.shape[:2]
+ best_idx = scores.squeeze(-1).argmax(dim=1)
+ colors = _color_palette(num_prompts)
+ base = image.copy().convert("RGBA")
+ overlay = Image.new("RGBA", base.size, (0, 0, 0, 0))
+ for i in range(num_prompts):
+ mask = masks[i, best_idx[i]].detach().cpu()
+ mask = mask > 0
+ mask_img = Image.fromarray((mask.numpy() * 255).astype(np.uint8), mode="L")
+ if mask_img.size != base.size:
+ mask_img = mask_img.resize(base.size, resample=Image.Resampling.NEAREST)
+ color = colors[i]
+ color_img = Image.new("RGBA", base.size, (*color, 120))
+ overlay = Image.composite(color_img, overlay, mask_img)
+ blended = Image.alpha_composite(base, overlay)
+ blended.convert("RGB").save(out_path)
+
+
+def _draw_boxes(image: Image.Image, boxes_xyxy: torch.Tensor, scores: torch.Tensor, out_path: Path):
+ num_prompts, num_queries = scores.shape[:2]
+ best_idx = scores.squeeze(-1).argmax(dim=1).clamp(max=boxes_xyxy.shape[1] - 1)
+ colors = _color_palette(num_prompts)
+ draw = ImageDraw.Draw(image)
+ for i in range(num_prompts):
+ box_tensor = boxes_xyxy[i, best_idx[i]].detach().cpu().flatten()
+ if box_tensor.numel() != 4:
+ continue
+ box = box_tensor.tolist()
+ width, height = image.size
+ if max(box) <= 1.0:
+ box = [
+ box[0] * width,
+ box[1] * height,
+ box[2] * width,
+ box[3] * height,
+ ]
+ box = [
+ max(0.0, min(box[0], width)),
+ max(0.0, min(box[1], height)),
+ max(0.0, min(box[2], width)),
+ max(0.0, min(box[3], height)),
+ ]
+ color = colors[i]
+ draw.rectangle(box, outline=color, width=3)
+ image.save(out_path)
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--image",
+ type=Path,
+ default=Path("assets/images/cat_dog.jpg"),
+ help="Path to input image",
+ )
+ parser.add_argument(
+ "--prompts",
+ type=str,
+ default="cat,dog",
+ help="Comma-separated text prompts",
+ )
+ parser.add_argument(
+ "--device",
+ type=str,
+ default="cuda" if torch.cuda.is_available() else "cpu",
+ )
+ parser.add_argument(
+ "--artifact-dir",
+ type=Path,
+ default=Path("artifacts/export"),
+ help="Directory with exported artifacts",
+ )
+ args = parser.parse_args()
+
+ prompts = [p.strip() for p in args.prompts.split(",") if p.strip()]
+ if not prompts:
+ raise ValueError("Provide at least one prompt")
+ prompt_count = len(prompts)
+
+ model = build_sam3_image_model(device=args.device, eval_mode=True, enable_segmentation=True)
+ model.eval()
+
+ pil_image = _load_pil_image(args.image)
+ image = _pil_to_tensor(pil_image, torch.device(args.device))
+ image = _prepare_image(image, size=1008)
+ inputs = _make_inputs(model, image, prompts)
+
+ with torch.no_grad():
+ eager_masks, eager_boxes, eager_logits, eager_boxes_xyxy = _run_full_model(model, inputs)
+
+ image_module = _load_export(args.artifact_dir / "image_encoder.pt2")
+ text_module = _load_export(args.artifact_dir / "text_encoder.pt2")
+ encoder_module = _load_export(args.artifact_dir / "encoder_fusion.pt2")
+ pipeline_module = _load_export(args.artifact_dir / "full_sam3_pipeline.pt2")
+ decoder_module = _load_export(args.artifact_dir / "decoder_only.pt2")
+
+ with torch.no_grad():
+ _, vision_pos_enc, backbone_fpn = image_module(inputs[0])
+ text_attention_mask, text_memory = text_module(inputs[1])
+ img_feats = backbone_fpn[-1]
+ img_pos = vision_pos_enc[-1]
+ prompt_batch = text_attention_mask.shape[0]
+ if img_feats.shape[0] != prompt_batch:
+ if img_feats.shape[0] != 1:
+ raise ValueError("Image batch does not match prompt batch")
+ img_feats = img_feats.repeat(prompt_batch, 1, 1, 1)
+ img_pos = img_pos.repeat(prompt_batch, 1, 1, 1)
+ img_mask = torch.zeros(
+ img_feats.shape[0],
+ img_feats.shape[2],
+ img_feats.shape[3],
+ device=img_feats.device,
+ dtype=torch.bool,
+ )
+ enc_out = encoder_module(img_feats, img_pos, img_mask, text_memory, text_attention_mask)
+ assert isinstance(enc_out, tuple)
+ pipeline_logits, pipeline_boxes, pipeline_masks, pipeline_boxes_xyxy = pipeline_module(
+ *inputs
+ )
+ (
+ decoder_backbone_fpn,
+ decoder_img_ids,
+ decoder_memory,
+ decoder_pos_embed,
+ decoder_prompt,
+ decoder_prompt_mask,
+ decoder_level_start_index,
+ decoder_spatial_shapes,
+ decoder_valid_ratios,
+ ) = _make_decoder_only_inputs_from_model(
+ model,
+ backbone_fpn,
+ vision_pos_enc,
+ text_memory,
+ text_attention_mask,
+ inputs,
+ )
+ batch_target = decoder_img_ids.shape[0]
+ if batch_target < 2:
+ repeat = 2 // batch_target
+ decoder_img_ids = decoder_img_ids.repeat(repeat)
+ decoder_memory = decoder_memory.repeat(1, repeat, 1)
+ decoder_pos_embed = decoder_pos_embed.repeat(1, repeat, 1)
+ decoder_prompt = decoder_prompt.repeat(1, repeat, 1)
+ decoder_prompt_mask = decoder_prompt_mask.repeat(repeat, 1)
+ decoder_valid_ratios = decoder_valid_ratios.repeat(repeat, 1, 1)
+ decoder_backbone_fpn = [feat.repeat(repeat, 1, 1, 1) for feat in decoder_backbone_fpn]
+ pred_logits, pred_boxes, pred_masks, pred_boxes_xyxy = decoder_module(
+ decoder_backbone_fpn,
+ decoder_img_ids,
+ decoder_memory,
+ decoder_pos_embed,
+ decoder_prompt,
+ decoder_prompt_mask,
+ decoder_level_start_index,
+ decoder_spatial_shapes,
+ decoder_valid_ratios,
+ )
+ eager_ref_masks, eager_ref_boxes, eager_ref_logits, eager_ref_boxes_xyxy = _run_full_model(
+ model, inputs
+ )
+
+ pred_logits = pred_logits[:prompt_count]
+ pred_boxes = pred_boxes[:prompt_count]
+ pred_masks = pred_masks[:prompt_count]
+ pred_boxes_xyxy = pred_boxes_xyxy[:prompt_count]
+ pipeline_logits = pipeline_logits[:prompt_count]
+ pipeline_masks = pipeline_masks[:prompt_count]
+ pipeline_boxes_xyxy = pipeline_boxes_xyxy[:prompt_count]
+ eager_ref_logits = eager_ref_logits[:prompt_count]
+
+ print("Prompt count:", prompt_count)
+ print("Pred logits shape:", pred_logits.shape)
+ print("Pred boxes shape:", pred_boxes.shape)
+ print("Pred masks shape:", pred_masks.shape)
+ pred_scores = pred_logits.squeeze(-1)
+ eager_scores = eager_ref_logits.squeeze(-1)
+ pred_max = pred_scores.max(dim=1).values
+ eager_max = eager_scores.max(dim=1).values
+ pred_best_idx = pred_scores.argmax(dim=1)
+ eager_best_idx = eager_scores.argmax(dim=1)
+ for idx, prompt_text in enumerate(prompts):
+ print(
+ f"Prompt '{prompt_text}' max logit: "
+ f"export={pred_max[idx].item():.4f} (idx {pred_best_idx[idx].item()}), "
+ f"eager={eager_max[idx].item():.4f} (idx {eager_best_idx[idx].item()})"
+ )
+ torch.testing.assert_close(pred_logits, eager_ref_logits, rtol=1e-3, atol=1e-3)
+ torch.testing.assert_close(pipeline_logits, eager_ref_logits, rtol=1e-3, atol=1e-3)
+ print("Eager vs export logits match")
+
+ out_dir = args.artifact_dir
+ out_dir.mkdir(parents=True, exist_ok=True)
+ base_image = pil_image.copy()
+ _overlay_masks(
+ base_image.copy(),
+ eager_masks,
+ eager_logits,
+ out_dir / "eager_masks_overlay.jpg",
+ )
+ _overlay_masks(
+ base_image.copy(),
+ pipeline_masks,
+ pipeline_logits,
+ out_dir / "pipeline_masks_overlay.jpg",
+ )
+ _overlay_masks(
+ base_image.copy(),
+ pred_masks,
+ pred_logits,
+ out_dir / "decoder_only_masks_overlay.jpg",
+ )
+ for idx, prompt_text in enumerate(prompts):
+ _overlay_masks(
+ base_image.copy(),
+ eager_masks[idx : idx + 1],
+ eager_logits[idx : idx + 1],
+ out_dir / f"eager_mask_overlay_{idx}.jpg",
+ )
+ _overlay_masks(
+ base_image.copy(),
+ pipeline_masks[idx : idx + 1],
+ pipeline_logits[idx : idx + 1],
+ out_dir / f"pipeline_mask_overlay_{idx}.jpg",
+ )
+ _overlay_masks(
+ base_image.copy(),
+ pred_masks[idx : idx + 1],
+ pred_logits[idx : idx + 1],
+ out_dir / f"decoder_only_mask_overlay_{idx}.jpg",
+ )
+ _draw_boxes(
+ base_image.copy(),
+ eager_boxes_xyxy,
+ eager_logits,
+ out_dir / "eager_boxes_overlay.jpg",
+ )
+ _draw_boxes(
+ base_image.copy(),
+ pipeline_boxes_xyxy,
+ pipeline_logits,
+ out_dir / "pipeline_boxes_overlay.jpg",
+ )
+ _draw_boxes(
+ base_image.copy(),
+ pred_boxes_xyxy,
+ pred_logits,
+ out_dir / "decoder_only_boxes_overlay.jpg",
+ )
+ print("Saved overlays to", out_dir)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/test_sam3_full_pipeline.py b/scripts/test_sam3_full_pipeline.py
new file mode 100644
index 0000000..6118e01
--- /dev/null
+++ b/scripts/test_sam3_full_pipeline.py
@@ -0,0 +1,121 @@
+import argparse
+import sys
+from pathlib import Path
+
+import numpy as np
+import torch
+from PIL import Image
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+sys.path.insert(0, str(REPO_ROOT))
+
+from sam3.model_builder import build_sam3_image_model
+
+
+def _load_image(path: Path, device: torch.device) -> torch.Tensor:
+ image = Image.open(path).convert("RGB")
+ np_image = np.array(image, dtype=np.float32) / 255.0
+ return torch.from_numpy(np_image).permute(2, 0, 1).unsqueeze(0).to(device)
+
+
+def _prepare_image(image: torch.Tensor, size: int) -> torch.Tensor:
+ image = image.clamp(0, 1)
+ image = torch.nn.functional.interpolate(
+ image, size=(size, size), mode="bilinear", align_corners=False
+ )
+ mean = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ std = torch.tensor([0.5, 0.5, 0.5], device=image.device).view(1, 3, 1, 1)
+ return (image - mean) / std
+
+
+def _make_inputs(model, image: torch.Tensor, prompts):
+ device = image.device
+ num_prompts = len(prompts)
+ num_images = int(image.shape[0])
+ token_ids = model.backbone.language_backbone.tokenizer(
+ prompts, context_length=32
+ ).to(device)
+ img_ids = torch.arange(num_images, device=device, dtype=torch.long)
+ img_ids = img_ids.repeat_interleave(num_prompts)
+ text_ids = torch.arange(num_prompts, device=device, dtype=torch.long)
+ text_ids = text_ids.repeat(num_images)
+ return (
+ image,
+ token_ids,
+ img_ids,
+ text_ids,
+ torch.zeros(1, num_prompts, 4, device=device),
+ torch.zeros(num_prompts, 1, device=device, dtype=torch.bool),
+ torch.zeros(1, num_prompts, device=device, dtype=torch.long),
+ )
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ "--image",
+ type=Path,
+ default=Path("assets/images/cat_dog.jpg"),
+ help="Path to input image",
+ )
+ parser.add_argument(
+ "--prompts",
+ type=str,
+ default="cat,dog",
+ help="Comma-separated text prompts",
+ )
+ parser.add_argument(
+ "--device",
+ type=str,
+ default="cuda" if torch.cuda.is_available() else "cpu",
+ )
+ parser.add_argument(
+ "--num-feature-levels",
+ type=int,
+ default=1,
+ help="Number of feature levels to use",
+ )
+ parser.add_argument(
+ "--artifact",
+ type=Path,
+ default=Path("artifacts/export/full_sam3_pipeline.pt2"),
+ help="Path to exported full pipeline artifact",
+ )
+ args = parser.parse_args()
+
+ prompts = [p.strip() for p in args.prompts.split(",") if p.strip()]
+ if not prompts:
+ raise ValueError("Provide at least one prompt")
+
+ model = build_sam3_image_model(
+ device=args.device,
+ eval_mode=True,
+ enable_segmentation=True,
+ num_feature_levels=args.num_feature_levels,
+ )
+ model.eval()
+
+ image = _prepare_image(
+ _load_image(args.image, torch.device(args.device)), size=1008
+ )
+ inputs = _make_inputs(model, image, prompts)
+
+ module = torch.export.load(str(args.artifact)).module()
+ with torch.no_grad():
+ pred_logits, pred_boxes, pred_masks, pred_presence = module(*inputs)
+
+ pred_scores = pred_logits.squeeze(-1)
+ print("Prompt count:", len(prompts))
+ print("Pred logits shape:", pred_logits.shape)
+ print("Pred boxes shape:", pred_boxes.shape)
+ print("Pred masks shape:", pred_masks.shape)
+ if pred_presence is not None:
+ print("Presence logits shape:", pred_presence.shape)
+ for idx, prompt_text in enumerate(prompts):
+ max_val = pred_scores[idx].max().item()
+ max_idx = pred_scores[idx].argmax().item()
+ print(f"Prompt '{prompt_text}' max logit: {max_val:.4f} (idx {max_idx})")
+
+
+if __name__ == "__main__":
+ main()
diff --git a/test/test_io_utils.py b/test/test_io_utils.py
new file mode 100644
index 0000000..780b8af
--- /dev/null
+++ b/test/test_io_utils.py
@@ -0,0 +1,122 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates. All Rights Reserved
+
+"""Tests for io_utils extensionless video file handling (D99228861)."""
+
+import tempfile
+import unittest
+from unittest.mock import MagicMock, patch
+
+from sam3.model.io_utils import load_video_frames
+
+
+class TestLoadVideoFramesRouting(unittest.TestCase):
+ """Test that load_video_frames routes paths correctly based on extension."""
+
+ @patch("sam3.model.io_utils.load_video_frames_from_video_file")
+ def test_mp4_extension_routes_to_video_loader(
+ self, mock_load_video: MagicMock
+ ) -> None:
+ """Paths with .mp4 extension should route to load_video_frames_from_video_file."""
+ mock_load_video.return_value = ("frames", 480, 640)
+ result = load_video_frames(
+ video_path="/tmp/test_video.mp4",
+ image_size=256,
+ offload_video_to_cpu=True,
+ )
+ mock_load_video.assert_called_once()
+ self.assertEqual(result, ("frames", 480, 640))
+
+ @patch("sam3.model.io_utils.load_video_frames_from_video_file")
+ def test_mov_extension_routes_to_video_loader(
+ self, mock_load_video: MagicMock
+ ) -> None:
+ """Paths with .mov extension should route to load_video_frames_from_video_file."""
+ mock_load_video.return_value = ("frames", 480, 640)
+ load_video_frames(
+ video_path="/tmp/test_video.mov",
+ image_size=256,
+ offload_video_to_cpu=True,
+ )
+ mock_load_video.assert_called_once()
+
+ @patch("sam3.model.io_utils.load_video_frames_from_video_file")
+ def test_extensionless_oil_path_routes_to_video_loader(
+ self, mock_load_video: MagicMock
+ ) -> None:
+ """Extensionless OIL paths should attempt video loading (D99228861 fix)."""
+ mock_load_video.return_value = ("frames", 480, 640)
+ result = load_video_frames(
+ video_path="oil://fb_permanent/abc123def456",
+ image_size=256,
+ offload_video_to_cpu=True,
+ )
+ mock_load_video.assert_called_once()
+ self.assertEqual(result, ("frames", 480, 640))
+
+ @patch("sam3.model.io_utils.load_video_frames_from_video_file")
+ def test_extensionless_bare_hash_routes_to_video_loader(
+ self, mock_load_video: MagicMock
+ ) -> None:
+ """Bare hash paths without extension should attempt video loading."""
+ mock_load_video.return_value = ("frames", 480, 640)
+ result = load_video_frames(
+ video_path="/data/videos/abc123def456",
+ image_size=256,
+ offload_video_to_cpu=True,
+ )
+ mock_load_video.assert_called_once()
+ self.assertEqual(result, ("frames", 480, 640))
+
+ @patch("sam3.model.io_utils.load_video_frames_from_video_file")
+ def test_extensionless_path_raises_on_decode_failure(
+ self, mock_load_video: MagicMock
+ ) -> None:
+ """Extensionless path that fails to decode should raise NotImplementedError."""
+ mock_load_video.side_effect = RuntimeError("Could not decode video")
+ with self.assertRaises(NotImplementedError) as ctx:
+ load_video_frames(
+ video_path="oil://fb_permanent/corrupted_file",
+ image_size=256,
+ offload_video_to_cpu=True,
+ )
+ self.assertIn("failed to load", str(ctx.exception))
+ self.assertIn("oil://fb_permanent/corrupted_file", str(ctx.exception))
+
+ @patch("sam3.model.io_utils.load_video_frames_from_image_folder")
+ def test_directory_routes_to_image_folder_loader(
+ self, mock_load_folder: MagicMock
+ ) -> None:
+ """Directory paths should route to load_video_frames_from_image_folder."""
+ mock_load_folder.return_value = ("frames", 480, 640)
+ with tempfile.TemporaryDirectory() as tmpdir:
+ load_video_frames(
+ video_path=tmpdir,
+ image_size=256,
+ offload_video_to_cpu=True,
+ )
+ mock_load_folder.assert_called_once()
+
+ def test_dummy_video_pattern(self) -> None:
+ """ pattern should return dummy frames."""
+ frames, h, w = load_video_frames(
+ video_path="",
+ image_size=64,
+ offload_video_to_cpu=True,
+ )
+ self.assertEqual(frames.shape[0], 5) # 5 frames
+ self.assertEqual(h, 480)
+ self.assertEqual(w, 640)
+
+ @patch("sam3.model.io_utils.load_video_frames_from_video_file")
+ def test_unknown_extension_routes_to_video_loader(
+ self, mock_load_video: MagicMock
+ ) -> None:
+ """Paths with unrecognized extensions should attempt video loading."""
+ mock_load_video.return_value = ("frames", 480, 640)
+ result = load_video_frames(
+ video_path="/tmp/video.xyz",
+ image_size=256,
+ offload_video_to_cpu=True,
+ )
+ mock_load_video.assert_called_once()
+ self.assertEqual(result, ("frames", 480, 640))
diff --git a/tests/__init__.py b/tests/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/export/__init__.py b/tests/export/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/export/conftest.py b/tests/export/conftest.py
new file mode 100644
index 0000000..00ab475
--- /dev/null
+++ b/tests/export/conftest.py
@@ -0,0 +1,31 @@
+from __future__ import annotations
+
+import os
+import sys
+from pathlib import Path
+
+import pytest
+import torch
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+sys.path.insert(0, str(REPO_ROOT))
+
+from sam3.model_builder import build_sam3_image_model
+
+
+@pytest.fixture(scope="session")
+def sam3_model() -> torch.nn.Module:
+ force_cpu = os.getenv("SAM3_EXPORT_FORCE_CPU", "0") == "1"
+ device = os.getenv("SAM3_EXPORT_DEVICE")
+ if device is None:
+ device = "cuda" if torch.cuda.is_available() and not force_cpu else "cpu"
+ try:
+ model = build_sam3_image_model(
+ device=device, eval_mode=True, enable_segmentation=True
+ )
+ except torch.OutOfMemoryError:
+ if device == "cuda":
+ torch.cuda.empty_cache()
+ pytest.skip("CUDA OOM while loading SAM3 model; free GPU memory or set SAM3_EXPORT_FORCE_CPU=1")
+ model.eval()
+ return model
diff --git a/tests/export/test_decoder_export.py b/tests/export/test_decoder_export.py
new file mode 100644
index 0000000..998cf6f
--- /dev/null
+++ b/tests/export/test_decoder_export.py
@@ -0,0 +1,370 @@
+from __future__ import annotations
+
+from typing import Any, cast
+
+import pytest
+import torch
+
+from sam3.model.data_misc import FindStage
+from sam3.model.geometry_encoders import Prompt
+from tests.export.utils import capture_stderr_on_fail, get_device, save_output_shapes
+
+
+class FullSam3PipelineWrapper(torch.nn.Module):
+ def __init__(self, model: Any):
+ super().__init__()
+ self.model = model
+
+ def forward(
+ self,
+ images: torch.Tensor,
+ token_ids: torch.Tensor,
+ ):
+ model = cast(Any, self.model)
+ num_images = images.shape[0]
+ num_prompts = token_ids.shape[0]
+ device = images.device
+ bs = num_images * num_prompts
+
+ img_ids = torch.arange(num_images, device=device, dtype=torch.long)
+ img_ids = img_ids.repeat_interleave(num_prompts)
+ text_ids = torch.arange(num_prompts, device=device, dtype=torch.long)
+ text_ids = text_ids.repeat(num_images)
+
+ box_embeddings = torch.zeros(1, bs, 4, device=device)
+ box_mask = torch.zeros(bs, 1, device=device, dtype=torch.bool)
+ box_labels = torch.zeros(1, bs, device=device, dtype=torch.long)
+
+ backbone_out = model.backbone.forward_image(images)
+ text_encoder = model.backbone.language_backbone
+ _, text_tokens = text_encoder.encoder(token_ids)
+ text_tokens = text_tokens.transpose(0, 1)
+ text_memory = text_encoder.resizer(text_tokens)
+ text_attention_mask = token_ids.ne(0)
+ text_attention_mask = text_attention_mask.ne(1)
+ backbone_out["language_features"] = text_memory
+ backbone_out["language_mask"] = text_attention_mask
+
+ find_input = FindStage(
+ img_ids=img_ids,
+ text_ids=text_ids,
+ input_boxes=box_embeddings,
+ input_boxes_mask=box_mask,
+ input_boxes_label=box_labels,
+ input_points=torch.zeros(0, bs, 2, device=device),
+ input_points_mask=torch.zeros(bs, 0, device=device, dtype=torch.bool),
+ )
+
+ geometric_prompt = Prompt(
+ box_embeddings=box_embeddings,
+ box_mask=box_mask,
+ box_labels=box_labels,
+ )
+
+ out = model.forward_grounding(
+ backbone_out=backbone_out,
+ find_input=find_input,
+ find_target=None,
+ geometric_prompt=geometric_prompt,
+ )
+
+ return (
+ out["pred_logits"],
+ out["pred_boxes"],
+ out["pred_masks"],
+ out.get("presence_logit_dec"),
+ )
+
+
+def _make_inputs(batch: int, height: int, width: int, device: str):
+ images = torch.randn(batch, 3, height, width, device=device)
+ token_ids = torch.ones(batch, 32, device=device, dtype=torch.long)
+ token_ids[:, -1] = 0
+ return images, token_ids
+
+
+def _make_decoder_only_inputs(model: Any, inputs):
+ images, token_ids = inputs
+ model_any = cast(Any, model)
+ model_any = model_any.eval()
+ num_images = images.shape[0]
+ num_prompts = token_ids.shape[0]
+ device = images.device
+ bs = num_images * num_prompts
+
+ img_ids = torch.arange(num_images, device=device, dtype=torch.long)
+ img_ids = img_ids.repeat_interleave(num_prompts)
+ text_ids = torch.arange(num_prompts, device=device, dtype=torch.long)
+ text_ids = text_ids.repeat(num_images)
+
+ box_embeddings = torch.zeros(1, bs, 4, device=device)
+ box_mask = torch.zeros(bs, 1, device=device, dtype=torch.bool)
+ box_labels = torch.zeros(1, bs, device=device, dtype=torch.long)
+
+ backbone_out = model_any.backbone.forward_image(images)
+ text_encoder = model_any.backbone.language_backbone
+ _, text_tokens = text_encoder.encoder(token_ids)
+ text_tokens = text_tokens.transpose(0, 1)
+ text_memory = text_encoder.resizer(text_tokens)
+ text_attention_mask = token_ids.ne(0)
+ text_attention_mask = text_attention_mask.ne(1)
+ backbone_out["language_features"] = text_memory
+ backbone_out["language_mask"] = text_attention_mask
+
+ find_input = FindStage(
+ img_ids=img_ids,
+ text_ids=text_ids,
+ input_boxes=box_embeddings,
+ input_boxes_mask=box_mask,
+ input_boxes_label=box_labels,
+ input_points=torch.zeros(0, bs, 2, device=device),
+ input_points_mask=torch.zeros(bs, 0, device=device, dtype=torch.bool),
+ )
+ geometric_prompt = Prompt(
+ box_embeddings=box_embeddings,
+ box_mask=box_mask,
+ box_labels=box_labels,
+ )
+
+ prompt, prompt_mask, backbone_out = model_any._encode_prompt(
+ backbone_out, find_input, geometric_prompt
+ )
+ backbone_out, encoder_out, _ = model_any._run_encoder(
+ backbone_out, find_input, prompt, prompt_mask
+ )
+ return (
+ backbone_out["backbone_fpn"],
+ img_ids,
+ encoder_out["encoder_hidden_states"],
+ encoder_out["pos_embed"],
+ prompt,
+ prompt_mask,
+ encoder_out["level_start_index"],
+ encoder_out["spatial_shapes"],
+ encoder_out["valid_ratios"],
+ )
+
+
+def _export_full_sam3_pipeline(model: Any, inputs):
+ images, token_ids = inputs
+ device = images.device
+ wrapper = FullSam3PipelineWrapper(model).to(device).eval() # type: ignore[arg-type]
+ if images.shape[0] < 2:
+ repeat = 2 // images.shape[0]
+ export_inputs = (
+ images.repeat(repeat, 1, 1, 1),
+ token_ids.repeat(repeat, 1),
+ )
+ else:
+ export_inputs = inputs
+ with torch.no_grad():
+ return torch.export.export(
+ wrapper,
+ export_inputs,
+ dynamic_shapes={
+ "images": {
+ 0: torch.export.Dim.AUTO,
+ 2: 1008,
+ 3: 1008,
+ },
+ "token_ids": {
+ 0: torch.export.Dim("num_prompts", min=1),
+ 1: 32,
+ },
+ },
+ strict=False,
+ prefer_deferred_runtime_asserts_over_guards=True,
+ )
+
+
+class DecoderOnlyWrapper(torch.nn.Module):
+ def __init__(self, model: torch.nn.Module):
+ super().__init__()
+ self.model = model
+
+ def forward(
+ self,
+ backbone_fpn,
+ img_ids: torch.Tensor,
+ memory: torch.Tensor,
+ pos_embed: torch.Tensor,
+ prompt: torch.Tensor,
+ prompt_mask: torch.Tensor,
+ level_start_index: torch.Tensor,
+ spatial_shapes: torch.Tensor,
+ valid_ratios: torch.Tensor,
+ ):
+ model = cast(Any, self.model)
+ vis_feat_sizes = [(feat.shape[-2], feat.shape[-1]) for feat in backbone_fpn]
+ encoder_out = {
+ "pos_embed": pos_embed,
+ "padding_mask": None,
+ "level_start_index": level_start_index,
+ "spatial_shapes": spatial_shapes,
+ "valid_ratios": valid_ratios,
+ "vis_feat_sizes": vis_feat_sizes,
+ }
+ out = {"encoder_hidden_states": memory}
+ out, hs = model._run_decoder(
+ memory=memory,
+ pos_embed=pos_embed,
+ src_mask=None,
+ out=out,
+ prompt=prompt,
+ prompt_mask=prompt_mask,
+ encoder_out=encoder_out,
+ )
+ backbone_out = {"backbone_fpn": backbone_fpn}
+ model._run_segmentation_heads(
+ out=out,
+ backbone_out=backbone_out,
+ img_ids=img_ids,
+ vis_feat_sizes=vis_feat_sizes,
+ encoder_hidden_states=out["encoder_hidden_states"],
+ prompt=prompt,
+ prompt_mask=prompt_mask,
+ hs=hs,
+ )
+ return (
+ out["pred_logits"],
+ out["pred_boxes"],
+ out["pred_masks"],
+ out["pred_boxes_xyxy"],
+ )
+
+
+def _export_decoder_only(model: Any, inputs):
+ (
+ backbone_fpn,
+ img_ids,
+ memory,
+ pos_embed,
+ prompt,
+ prompt_mask,
+ level_start_index,
+ spatial_shapes,
+ valid_ratios,
+ ) = inputs
+ device = memory.device
+ wrapper = DecoderOnlyWrapper(model).to(device).eval() # type: ignore[arg-type]
+ dynamic_shapes = [
+ [{0: torch.export.Dim.AUTO} for _ in backbone_fpn],
+ {0: torch.export.Dim.AUTO},
+ {1: torch.export.Dim.AUTO},
+ {1: torch.export.Dim.AUTO},
+ {1: torch.export.Dim.AUTO},
+ {0: torch.export.Dim.AUTO},
+ {},
+ {},
+ {0: torch.export.Dim.AUTO},
+ ]
+ with torch.no_grad():
+ exported = torch.export.export(
+ wrapper,
+ (
+ backbone_fpn,
+ img_ids,
+ memory,
+ pos_embed,
+ prompt,
+ prompt_mask,
+ level_start_index,
+ spatial_shapes,
+ valid_ratios,
+ ),
+ dynamic_shapes=dynamic_shapes,
+ strict=False,
+ prefer_deferred_runtime_asserts_over_guards=True,
+ )
+ return exported
+
+
+@pytest.mark.slow
+def test_decoder_export_static(sam3_model):
+ device = get_device()
+ inputs = _make_inputs(1, 1008, 1008, device)
+ with capture_stderr_on_fail("export_static"):
+ exported = _export_full_sam3_pipeline(sam3_model, inputs)
+ assert exported is not None
+
+
+@pytest.mark.slow
+def test_decoder_export_loads(sam3_model):
+ device = get_device()
+ inputs = _make_inputs(1, 1008, 1008, device)
+ with capture_stderr_on_fail("export_loads"):
+ exported = _export_full_sam3_pipeline(sam3_model, inputs)
+ module = exported.module()
+ with torch.no_grad():
+ out = module(*inputs)
+ assert isinstance(out, tuple)
+ assert len(out) == 4
+
+
+@pytest.mark.slow
+def test_decoder_export_matches_eager(sam3_model):
+ device = get_device()
+ inputs = _make_inputs(1, 1008, 1008, device)
+ wrapper = FullSam3PipelineWrapper(sam3_model).to(device).eval()
+ with torch.no_grad():
+ eager_out = wrapper(*inputs)
+ with capture_stderr_on_fail("export_match"):
+ exported = _export_full_sam3_pipeline(sam3_model, inputs)
+ module = exported.module()
+ with torch.no_grad():
+ export_out = module(*inputs)
+ save_output_shapes("full_pipeline_eager", inputs, eager_out)
+ save_output_shapes("full_pipeline_export", inputs, export_out)
+ for eager, compiled in zip(eager_out, export_out):
+ if eager is None:
+ assert compiled is None
+ else:
+ torch.testing.assert_close(eager, compiled, rtol=1e-3, atol=1e-3)
+
+
+@pytest.mark.slow
+@pytest.mark.parametrize("batch", [1, 2])
+def test_full_sam3_pipeline_export_inference_shapes(sam3_model, batch: int):
+ device = get_device()
+ inputs = _make_inputs(1, 1008, 1008, device)
+ with capture_stderr_on_fail("export_inference_shapes"):
+ exported = _export_full_sam3_pipeline(sam3_model, inputs)
+ module = exported.module()
+ new_inputs = _make_inputs(batch, 1008, 1008, device)
+ with torch.no_grad():
+ out = module(*new_inputs)
+ save_output_shapes(f"full_pipeline_export_batch_{batch}", new_inputs, out)
+ assert isinstance(out, tuple)
+
+
+@pytest.mark.slow
+def test_decoder_only_export_loads(sam3_model):
+ device = get_device()
+ inputs = _make_inputs(1, 1008, 1008, device)
+ decoder_inputs = _make_decoder_only_inputs(sam3_model, inputs)
+ with capture_stderr_on_fail("export_decoder_only_loads"):
+ exported = _export_decoder_only(sam3_model, decoder_inputs)
+ module = exported.module()
+ with torch.no_grad():
+ out = module(*decoder_inputs)
+ assert isinstance(out, tuple)
+ assert len(out) == 4
+
+
+@pytest.mark.slow
+def test_decoder_only_export_matches_eager(sam3_model):
+ device = get_device()
+ inputs = _make_inputs(1, 1008, 1008, device)
+ decoder_inputs = _make_decoder_only_inputs(sam3_model, inputs)
+ wrapper = DecoderOnlyWrapper(sam3_model).to(device).eval()
+ with torch.no_grad():
+ eager_out = wrapper(*decoder_inputs)
+ with capture_stderr_on_fail("export_decoder_only_match"):
+ exported = _export_decoder_only(sam3_model, decoder_inputs)
+ module = exported.module()
+ with torch.no_grad():
+ export_out = module(*decoder_inputs)
+ save_output_shapes("decoder_only_eager", None, eager_out)
+ save_output_shapes("decoder_only_export", None, export_out)
+ for eager, compiled in zip(eager_out, export_out):
+ torch.testing.assert_close(eager, compiled, rtol=1e-3, atol=1e-3)
diff --git a/tests/export/test_encoder_export.py b/tests/export/test_encoder_export.py
new file mode 100644
index 0000000..3285628
--- /dev/null
+++ b/tests/export/test_encoder_export.py
@@ -0,0 +1,156 @@
+from __future__ import annotations
+
+import pytest
+import torch
+
+from sam3.model.encoder import TransformerEncoderFusion
+from tests.export.utils import capture_stderr_on_fail, get_device
+
+
+class EncoderFusionWrapper(torch.nn.Module):
+ def __init__(self, encoder: TransformerEncoderFusion):
+ super().__init__()
+ self.encoder = encoder
+
+ def forward(
+ self,
+ img_feats: torch.Tensor,
+ img_pos: torch.Tensor,
+ img_mask: torch.Tensor,
+ prompt: torch.Tensor,
+ prompt_mask: torch.Tensor,
+ ):
+ out = self.encoder(
+ src=[img_feats],
+ src_pos=[img_pos],
+ src_key_padding_mask=[img_mask],
+ prompt=prompt,
+ prompt_key_padding_mask=prompt_mask,
+ )
+ return (
+ out["memory"],
+ out["pos_embed"],
+ out["padding_mask"],
+ out["level_start_index"],
+ out["spatial_shapes"],
+ out["valid_ratios"],
+ )
+
+
+def _make_image_tokens(batch: int, height: int, width: int, device: str):
+ channels = 256
+ img_feats = torch.randn(batch, channels, height, width, device=device)
+ img_pos = torch.randn(batch, channels, height, width, device=device)
+ img_mask = torch.zeros(batch, height, width, dtype=torch.bool, device=device)
+ return img_feats, img_pos, img_mask
+
+
+def _make_prompt(batch: int, seq_len: int, device: str):
+ prompt = torch.randn(seq_len, batch, 256, device=device)
+ prompt_mask = torch.zeros(batch, seq_len, dtype=torch.bool, device=device)
+ return prompt, prompt_mask
+
+
+def _export_encoder(model: torch.nn.Module, img_feats, img_pos, prompt, prompt_mask):
+ device = img_feats.device
+ wrapper = EncoderFusionWrapper(model.transformer.encoder).to(device).eval() # type: ignore[arg-type]
+ if img_feats.shape[0] == 1:
+ img_feats = img_feats.repeat(2, 1, 1, 1)
+ img_pos = img_pos.repeat(2, 1, 1, 1)
+ img_mask = torch.zeros(
+ 2, img_feats.shape[2], img_feats.shape[3], dtype=torch.bool, device=device
+ )
+ prompt = prompt.repeat(1, 2, 1)
+ prompt_mask = prompt_mask.repeat(2, 1)
+ else:
+ img_mask = torch.zeros(
+ img_feats.shape[0],
+ img_feats.shape[2],
+ img_feats.shape[3],
+ dtype=torch.bool,
+ device=device,
+ )
+ with torch.no_grad():
+ exported = torch.export.export(
+ wrapper,
+ (img_feats, img_pos, img_mask, prompt, prompt_mask),
+ dynamic_shapes={
+ "img_feats": {
+ 0: torch.export.Dim.AUTO,
+ },
+ "img_pos": {
+ 0: torch.export.Dim.AUTO,
+ },
+ "img_mask": {
+ 0: torch.export.Dim.AUTO,
+ },
+ "prompt": {
+ 0: torch.export.Dim("seq", min=1, max=64),
+ 1: torch.export.Dim.AUTO,
+ },
+ "prompt_mask": {
+ 0: torch.export.Dim.AUTO,
+ 1: torch.export.Dim("seq", min=1, max=64),
+ },
+ },
+ strict=False,
+ prefer_deferred_runtime_asserts_over_guards=True,
+ )
+ return exported
+
+
+@pytest.mark.slow
+def test_encoder_export_static(sam3_model):
+ device = get_device()
+ img_feats, img_pos, img_mask = _make_image_tokens(1, 72, 72, device)
+ prompt, prompt_mask = _make_prompt(1, 4, device)
+ with capture_stderr_on_fail("export_static"):
+ exported = _export_encoder(sam3_model, img_feats, img_pos, prompt, prompt_mask)
+ assert exported is not None
+
+
+@pytest.mark.slow
+def test_encoder_export_loads(sam3_model):
+ device = get_device()
+ img_feats, img_pos, img_mask = _make_image_tokens(1, 72, 72, device)
+ prompt, prompt_mask = _make_prompt(1, 4, device)
+ with capture_stderr_on_fail("export_loads"):
+ exported = _export_encoder(sam3_model, img_feats, img_pos, prompt, prompt_mask)
+ module = exported.module()
+ with torch.no_grad():
+ out = module(img_feats, img_pos, img_mask, prompt, prompt_mask)
+ assert isinstance(out, tuple)
+ assert len(out) == 6
+
+
+@pytest.mark.slow
+def test_encoder_export_matches_eager(sam3_model):
+ device = get_device()
+ img_feats, img_pos, img_mask = _make_image_tokens(1, 72, 72, device)
+ prompt, prompt_mask = _make_prompt(1, 4, device)
+ wrapper = EncoderFusionWrapper(sam3_model.transformer.encoder).to(device).eval() # type: ignore[arg-type]
+ with torch.no_grad():
+ eager_out = wrapper(img_feats, img_pos, img_mask, prompt, prompt_mask)
+ with capture_stderr_on_fail("export_match"):
+ exported = _export_encoder(sam3_model, img_feats, img_pos, prompt, prompt_mask)
+ module = exported.module()
+ with torch.no_grad():
+ export_out = module(img_feats, img_pos, img_mask, prompt, prompt_mask)
+ for eager, compiled in zip(eager_out, export_out):
+ torch.testing.assert_close(eager, compiled, rtol=1e-3, atol=1e-3)
+
+
+@pytest.mark.slow
+@pytest.mark.parametrize("batch,seq_len", [(1, 4), (2, 8)])
+def test_encoder_export_inference_shapes(sam3_model, batch: int, seq_len: int):
+ device = get_device()
+ img_feats, img_pos, img_mask = _make_image_tokens(1, 72, 72, device)
+ prompt, prompt_mask = _make_prompt(1, 4, device)
+ with capture_stderr_on_fail("export_inference_shapes"):
+ exported = _export_encoder(sam3_model, img_feats, img_pos, prompt, prompt_mask)
+ module = exported.module()
+ img_feats2, img_pos2, img_mask2 = _make_image_tokens(batch, 72, 72, device)
+ prompt2, prompt_mask2 = _make_prompt(batch, seq_len, device)
+ with torch.no_grad():
+ out = module(img_feats2, img_pos2, img_mask2, prompt2, prompt_mask2)
+ assert isinstance(out, tuple)
diff --git a/tests/export/test_image_encoder_export.py b/tests/export/test_image_encoder_export.py
new file mode 100644
index 0000000..9597691
--- /dev/null
+++ b/tests/export/test_image_encoder_export.py
@@ -0,0 +1,105 @@
+from __future__ import annotations
+
+import pytest
+import torch
+
+from sam3.model.vl_combiner import SAM3VLBackbone
+from tests.export.utils import capture_stderr_on_fail, get_device
+
+
+class ImageEncoderWrapper(torch.nn.Module):
+ def __init__(self, backbone: SAM3VLBackbone):
+ super().__init__()
+ self.backbone = backbone
+
+ def forward(self, images: torch.Tensor):
+ out = self.backbone._forward_image_no_act_ckpt(images)
+ return (
+ out["vision_features"],
+ out["vision_pos_enc"],
+ out["backbone_fpn"],
+ )
+
+
+def _make_images(batch: int, height: int, width: int, device: str) -> torch.Tensor:
+ return torch.randn(batch, 3, height, width, device=device, dtype=torch.float32)
+
+
+def _export_image_encoder(model: torch.nn.Module, images: torch.Tensor):
+ device = images.device
+ wrapper = ImageEncoderWrapper(model.backbone).to(device).eval() # type: ignore[arg-type]
+ export_images = images
+ if images.shape[0] == 1:
+ export_images = images.repeat(2, 1, 1, 1)
+ with torch.no_grad():
+ exported = torch.export.export(
+ wrapper,
+ (export_images,),
+ dynamic_shapes={
+ "images": {
+ 0: torch.export.Dim("batch", min=1, max=4),
+ 2: 1008,
+ 3: 1008,
+ }
+ },
+ strict=False,
+ prefer_deferred_runtime_asserts_over_guards=True,
+ )
+ return exported
+
+
+@pytest.mark.slow
+def test_image_encoder_export_static(sam3_model):
+ device = get_device()
+ images = _make_images(1, 1008, 1008, device)
+ with capture_stderr_on_fail("export_static"):
+ exported = _export_image_encoder(sam3_model, images)
+ assert exported is not None
+
+
+@pytest.mark.slow
+def test_image_encoder_export_loads(sam3_model):
+ device = get_device()
+ images = _make_images(1, 1008, 1008, device)
+ with capture_stderr_on_fail("export_loads"):
+ exported = _export_image_encoder(sam3_model, images)
+ module = exported.module()
+ with torch.no_grad():
+ out = module(images)
+ assert isinstance(out, tuple)
+ assert len(out) == 3
+
+
+@pytest.mark.slow
+def test_image_encoder_export_matches_eager(sam3_model):
+ device = get_device()
+ images = _make_images(1, 1008, 1008, device)
+ wrapper = ImageEncoderWrapper(sam3_model.backbone).to(device).eval()
+ with torch.no_grad():
+ eager_out = wrapper(images)
+ with capture_stderr_on_fail("export_match"):
+ exported = _export_image_encoder(sam3_model, images)
+ module = exported.module()
+ with torch.no_grad():
+ export_out = module(images)
+ for eager, compiled in zip(eager_out, export_out):
+ if isinstance(eager, (list, tuple)):
+ for e_item, c_item in zip(eager, compiled):
+ torch.testing.assert_close(e_item, c_item, rtol=1e-3, atol=1e-3)
+ else:
+ torch.testing.assert_close(eager, compiled, rtol=1e-3, atol=1e-3)
+
+
+@pytest.mark.slow
+@pytest.mark.parametrize("batch,height,width", [(2, 1008, 1008)])
+def test_image_encoder_export_inference_shapes(
+ sam3_model, batch: int, height: int, width: int
+):
+ device = get_device()
+ images = _make_images(1, 1008, 1008, device)
+ with capture_stderr_on_fail("export_inference_shapes"):
+ exported = _export_image_encoder(sam3_model, images)
+ module = exported.module()
+ with torch.no_grad():
+ out = module(_make_images(batch, height, width, device))
+ assert isinstance(out, tuple)
diff --git a/tests/export/test_text_encoder_export.py b/tests/export/test_text_encoder_export.py
new file mode 100644
index 0000000..79ec249
--- /dev/null
+++ b/tests/export/test_text_encoder_export.py
@@ -0,0 +1,111 @@
+from __future__ import annotations
+
+from typing import Any, cast
+
+import pytest
+import torch
+
+from sam3.model.text_encoder_ve import VETextEncoder
+from tests.export.utils import capture_stderr_on_fail, get_device
+
+
+class TextEncoderWrapper(torch.nn.Module):
+ def __init__(self, text_encoder: VETextEncoder):
+ super().__init__()
+ self.text_encoder = text_encoder
+
+ def forward(self, token_ids: torch.Tensor):
+ _, text_tokens = self.text_encoder.encoder(token_ids)
+ text_tokens = text_tokens.transpose(0, 1)
+ text_memory = self.text_encoder.resizer(text_tokens)
+ text_attention_mask = token_ids.ne(0)
+ text_attention_mask = text_attention_mask.ne(1)
+ return text_attention_mask, text_memory
+
+
+def _make_tokens(
+ batch: int, seq_len: int, vocab_size: int, device: str
+) -> torch.Tensor:
+ token_ids = torch.randint(0, vocab_size, (batch, seq_len), device=device)
+ token_ids[:, -1] = 1
+ return token_ids
+
+
+def _export_text_encoder(model: Any, token_ids: torch.Tensor):
+ device = token_ids.device
+ model_any = cast(Any, model)
+ text_encoder = cast(VETextEncoder, model_any.backbone.language_backbone)
+ wrapper = TextEncoderWrapper(text_encoder).to(device).eval()
+ export_tokens = token_ids
+ if token_ids.shape[0] == 1:
+ export_tokens = token_ids.repeat(2, 1)
+ with torch.no_grad():
+ exported = torch.export.export(
+ wrapper,
+ (export_tokens,),
+ dynamic_shapes={
+ "token_ids": {
+ 0: torch.export.Dim("batch", min=1, max=4),
+ 1: torch.export.Dim.AUTO,
+ }
+ },
+ strict=False,
+ prefer_deferred_runtime_asserts_over_guards=True,
+ )
+ return exported
+
+
+@pytest.mark.slow
+def test_text_encoder_export_static(sam3_model):
+ device = get_device()
+ vocab_size = sam3_model.backbone.language_backbone.encoder.vocab_size
+ token_ids = _make_tokens(1, 32, vocab_size, device)
+ with capture_stderr_on_fail("export_static"):
+ exported = _export_text_encoder(sam3_model, token_ids)
+ assert exported is not None
+
+
+@pytest.mark.slow
+def test_text_encoder_export_loads(sam3_model):
+ device = get_device()
+ vocab_size = sam3_model.backbone.language_backbone.encoder.vocab_size
+ token_ids = _make_tokens(1, 32, vocab_size, device)
+ with capture_stderr_on_fail("export_loads"):
+ exported = _export_text_encoder(sam3_model, token_ids)
+ module = exported.module()
+ with torch.no_grad():
+ out = module(token_ids)
+ assert isinstance(out, tuple)
+ assert len(out) == 2
+
+
+@pytest.mark.slow
+def test_text_encoder_export_matches_eager(sam3_model):
+ device = get_device()
+ vocab_size = sam3_model.backbone.language_backbone.encoder.vocab_size
+ token_ids = _make_tokens(1, 32, vocab_size, device)
+ text_encoder = cast(VETextEncoder, sam3_model.backbone.language_backbone)
+ wrapper = TextEncoderWrapper(text_encoder).to(device).eval()
+ with torch.no_grad():
+ eager_out = wrapper(token_ids)
+ with capture_stderr_on_fail("export_match"):
+ exported = _export_text_encoder(sam3_model, token_ids)
+ module = exported.module()
+ with torch.no_grad():
+ export_out = module(token_ids)
+ torch.testing.assert_close(eager_out[0], export_out[0])
+ torch.testing.assert_close(eager_out[1], export_out[1], rtol=1e-3, atol=1e-3)
+
+
+@pytest.mark.slow
+@pytest.mark.parametrize("batch,seq_len", [(1, 32), (2, 32)])
+def test_text_encoder_export_inference_shapes(sam3_model, batch: int, seq_len: int):
+ device = get_device()
+ vocab_size = sam3_model.backbone.language_backbone.encoder.vocab_size
+ token_ids = _make_tokens(1, 32, vocab_size, device)
+ with capture_stderr_on_fail("export_inference_shapes"):
+ exported = _export_text_encoder(sam3_model, token_ids)
+ module = exported.module()
+ with torch.no_grad():
+ out = module(_make_tokens(batch, seq_len, vocab_size, device))
+ assert isinstance(out, tuple)
diff --git a/tests/export/utils.py b/tests/export/utils.py
new file mode 100644
index 0000000..bddf172
--- /dev/null
+++ b/tests/export/utils.py
@@ -0,0 +1,67 @@
+from __future__ import annotations
+
+import io
+import os
+import re
+from contextlib import contextmanager, redirect_stderr
+from typing import Generator
+from pathlib import Path
+
+import torch
+
+LOG_DIR = Path(__file__).resolve().parent / "export_logs"
+
+
+def _sanitize_test_name(name: str) -> str:
+ return re.sub(r"[^A-Za-z0-9_.-]+", "_", name)
+
+
+def _current_test_name() -> str:
+ raw = os.getenv("PYTEST_CURRENT_TEST", "unknown")
+ return _sanitize_test_name(raw.split(" ")[0])
+
+
+@contextmanager
+def capture_stderr_on_fail(suffix: str) -> Generator[None, None, None]:
+ LOG_DIR.mkdir(parents=True, exist_ok=True)
+ buffer = io.StringIO()
+ with redirect_stderr(buffer):
+ try:
+ yield
+ except Exception:
+ log_path = LOG_DIR / f"{_current_test_name()}-{suffix}.stderr.txt"
+ with log_path.open("w", encoding="utf-8") as handle:
+ handle.write(buffer.getvalue())
+ raise
+
+
+def get_device() -> str:
+ force_cpu = os.getenv("SAM3_EXPORT_FORCE_CPU", "0") == "1"
+ device = os.getenv("SAM3_EXPORT_DEVICE")
+ if device is None:
+ device = "cuda" if torch.cuda.is_available() and not force_cpu else "cpu"
+ return device
+
+
+def save_output_shapes(
+ suffix: str,
+ inputs: tuple[torch.Tensor, ...] | None,
+ outputs: tuple[torch.Tensor | None, ...],
+) -> None:
+ LOG_DIR.mkdir(parents=True, exist_ok=True)
+ lines: list[str] = []
+ if inputs is not None:
+ for idx, value in enumerate(inputs):
+ lines.append(
+ f"input[{idx}] shape={tuple(value.shape)} dtype={value.dtype} device={value.device}"
+ )
+ for idx, value in enumerate(outputs):
+ if value is None:
+ lines.append(f"output[{idx}] None")
+ else:
+ lines.append(
+ f"output[{idx}] shape={tuple(value.shape)} dtype={value.dtype} device={value.device}"
+ )
+ log_path = LOG_DIR / f"{_current_test_name()}-{suffix}.shapes.txt"
+ with log_path.open("w", encoding="utf-8") as handle:
+ handle.write("\n".join(lines))
diff --git a/uv.lock b/uv.lock
new file mode 100644
index 0000000..5cd9d5a
--- /dev/null
+++ b/uv.lock
@@ -0,0 +1,3920 @@
+version = 1
+revision = 3
+requires-python = ">=3.10, <3.13"
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+
+[[package]]
+name = "absl-py"
+version = "2.4.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/64/c7/8de93764ad66968d19329a7e0c147a2bb3c7054c554d4a119111b8f9440f/absl_py-2.4.0.tar.gz", hash = "sha256:8c6af82722b35cf71e0f4d1d47dcaebfff286e27110a99fc359349b247dfb5d4", size = 116543, upload-time = "2026-01-28T10:17:05.322Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/18/a6/907a406bb7d359e6a63f99c313846d9eec4f7e6f7437809e03aa00fa3074/absl_py-2.4.0-py3-none-any.whl", hash = "sha256:88476fd881ca8aab94ffa78b7b6c632a782ab3ba1cd19c9bd423abc4fb4cd28d", size = 135750, upload-time = "2026-01-28T10:17:04.19Z" },
+]
+
+[[package]]
+name = "antlr4-python3-runtime"
+version = "4.9.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/3e/38/7859ff46355f76f8d19459005ca000b6e7012f2f1ca597746cbcd1fbfe5e/antlr4-python3-runtime-4.9.3.tar.gz", hash = "sha256:f224469b4168294902bb1efa80a8bf7855f24c99aef99cbefc1bcd3cce77881b", size = 117034, upload-time = "2021-11-06T17:52:23.524Z" }
+
+[[package]]
+name = "anyio"
+version = "4.12.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "exceptiongroup", marker = "python_full_version < '3.11'" },
+ { name = "idna" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/96/f0/5eb65b2bb0d09ac6776f2eb54adee6abe8228ea05b20a5ad0e4945de8aac/anyio-4.12.1.tar.gz", hash = "sha256:41cfcc3a4c85d3f05c932da7c26d0201ac36f72abd4435ba90d0464a3ffed703", size = 228685, upload-time = "2026-01-06T11:45:21.246Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/38/0e/27be9fdef66e72d64c0cdc3cc2823101b80585f8119b5c112c2e8f5f7dab/anyio-4.12.1-py3-none-any.whl", hash = "sha256:d405828884fc140aa80a3c667b8beed277f1dfedec42ba031bd6ac3db606ab6c", size = 113592, upload-time = "2026-01-06T11:45:19.497Z" },
+]
+
+[[package]]
+name = "appnope"
+version = "0.1.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/35/5d/752690df9ef5b76e169e68d6a129fa6d08a7100ca7f754c89495db3c6019/appnope-0.1.4.tar.gz", hash = "sha256:1de3860566df9caf38f01f86f65e0e13e379af54f9e4bee1e66b48f2efffd1ee", size = 4170, upload-time = "2024-02-06T09:43:11.258Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/81/29/5ecc3a15d5a33e31b26c11426c45c501e439cb865d0bff96315d86443b78/appnope-0.1.4-py2.py3-none-any.whl", hash = "sha256:502575ee11cd7a28c0205f379b525beefebab9d161b7c964670864014ed7213c", size = 4321, upload-time = "2024-02-06T09:43:09.663Z" },
+]
+
+[[package]]
+name = "argon2-cffi"
+version = "25.1.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "argon2-cffi-bindings" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/0e/89/ce5af8a7d472a67cc819d5d998aa8c82c5d860608c4db9f46f1162d7dab9/argon2_cffi-25.1.0.tar.gz", hash = "sha256:694ae5cc8a42f4c4e2bf2ca0e64e51e23a040c6a517a85074683d3959e1346c1", size = 45706, upload-time = "2025-06-03T06:55:32.073Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/4f/d3/a8b22fa575b297cd6e3e3b0155c7e25db170edf1c74783d6a31a2490b8d9/argon2_cffi-25.1.0-py3-none-any.whl", hash = "sha256:fdc8b074db390fccb6eb4a3604ae7231f219aa669a2652e0f20e16ba513d5741", size = 14657, upload-time = "2025-06-03T06:55:30.804Z" },
+]
+
+[[package]]
+name = "argon2-cffi-bindings"
+version = "25.1.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "cffi" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5c/2d/db8af0df73c1cf454f71b2bbe5e356b8c1f8041c979f505b3d3186e520a9/argon2_cffi_bindings-25.1.0.tar.gz", hash = "sha256:b957f3e6ea4d55d820e40ff76f450952807013d361a65d7f28acc0acbf29229d", size = 1783441, upload-time = "2025-07-30T10:02:05.147Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1d/57/96b8b9f93166147826da5f90376e784a10582dd39a393c99bb62cfcf52f0/argon2_cffi_bindings-25.1.0-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:aecba1723ae35330a008418a91ea6cfcedf6d31e5fbaa056a166462ff066d500", size = 54121, upload-time = "2025-07-30T10:01:50.815Z" },
+ { url = "https://files.pythonhosted.org/packages/0a/08/a9bebdb2e0e602dde230bdde8021b29f71f7841bd54801bcfd514acb5dcf/argon2_cffi_bindings-25.1.0-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:2630b6240b495dfab90aebe159ff784d08ea999aa4b0d17efa734055a07d2f44", size = 29177, upload-time = "2025-07-30T10:01:51.681Z" },
+ { url = "https://files.pythonhosted.org/packages/b6/02/d297943bcacf05e4f2a94ab6f462831dc20158614e5d067c35d4e63b9acb/argon2_cffi_bindings-25.1.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:7aef0c91e2c0fbca6fc68e7555aa60ef7008a739cbe045541e438373bc54d2b0", size = 31090, upload-time = "2025-07-30T10:01:53.184Z" },
+ { url = "https://files.pythonhosted.org/packages/c1/93/44365f3d75053e53893ec6d733e4a5e3147502663554b4d864587c7828a7/argon2_cffi_bindings-25.1.0-cp39-abi3-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1e021e87faa76ae0d413b619fe2b65ab9a037f24c60a1e6cc43457ae20de6dc6", size = 81246, upload-time = "2025-07-30T10:01:54.145Z" },
+ { url = "https://files.pythonhosted.org/packages/09/52/94108adfdd6e2ddf58be64f959a0b9c7d4ef2fa71086c38356d22dc501ea/argon2_cffi_bindings-25.1.0-cp39-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d3e924cfc503018a714f94a49a149fdc0b644eaead5d1f089330399134fa028a", size = 87126, upload-time = "2025-07-30T10:01:55.074Z" },
+ { url = "https://files.pythonhosted.org/packages/72/70/7a2993a12b0ffa2a9271259b79cc616e2389ed1a4d93842fac5a1f923ffd/argon2_cffi_bindings-25.1.0-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:c87b72589133f0346a1cb8d5ecca4b933e3c9b64656c9d175270a000e73b288d", size = 80343, upload-time = "2025-07-30T10:01:56.007Z" },
+ { url = "https://files.pythonhosted.org/packages/78/9a/4e5157d893ffc712b74dbd868c7f62365618266982b64accab26bab01edc/argon2_cffi_bindings-25.1.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:1db89609c06afa1a214a69a462ea741cf735b29a57530478c06eb81dd403de99", size = 86777, upload-time = "2025-07-30T10:01:56.943Z" },
+ { url = "https://files.pythonhosted.org/packages/74/cd/15777dfde1c29d96de7f18edf4cc94c385646852e7c7b0320aa91ccca583/argon2_cffi_bindings-25.1.0-cp39-abi3-win32.whl", hash = "sha256:473bcb5f82924b1becbb637b63303ec8d10e84c8d241119419897a26116515d2", size = 27180, upload-time = "2025-07-30T10:01:57.759Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/c6/a759ece8f1829d1f162261226fbfd2c6832b3ff7657384045286d2afa384/argon2_cffi_bindings-25.1.0-cp39-abi3-win_amd64.whl", hash = "sha256:a98cd7d17e9f7ce244c0803cad3c23a7d379c301ba618a5fa76a67d116618b98", size = 31715, upload-time = "2025-07-30T10:01:58.56Z" },
+ { url = "https://files.pythonhosted.org/packages/42/b9/f8d6fa329ab25128b7e98fd83a3cb34d9db5b059a9847eddb840a0af45dd/argon2_cffi_bindings-25.1.0-cp39-abi3-win_arm64.whl", hash = "sha256:b0fdbcf513833809c882823f98dc2f931cf659d9a1429616ac3adebb49f5db94", size = 27149, upload-time = "2025-07-30T10:01:59.329Z" },
+ { url = "https://files.pythonhosted.org/packages/11/2d/ba4e4ca8d149f8dcc0d952ac0967089e1d759c7e5fcf0865a317eb680fbb/argon2_cffi_bindings-25.1.0-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:6dca33a9859abf613e22733131fc9194091c1fa7cb3e131c143056b4856aa47e", size = 24549, upload-time = "2025-07-30T10:02:00.101Z" },
+ { url = "https://files.pythonhosted.org/packages/5c/82/9b2386cc75ac0bd3210e12a44bfc7fd1632065ed8b80d573036eecb10442/argon2_cffi_bindings-25.1.0-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:21378b40e1b8d1655dd5310c84a40fc19a9aa5e6366e835ceb8576bf0fea716d", size = 25539, upload-time = "2025-07-30T10:02:00.929Z" },
+ { url = "https://files.pythonhosted.org/packages/31/db/740de99a37aa727623730c90d92c22c9e12585b3c98c54b7960f7810289f/argon2_cffi_bindings-25.1.0-pp310-pypy310_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5d588dec224e2a83edbdc785a5e6f3c6cd736f46bfd4b441bbb5aa1f5085e584", size = 28467, upload-time = "2025-07-30T10:02:02.08Z" },
+ { url = "https://files.pythonhosted.org/packages/71/7a/47c4509ea18d755f44e2b92b7178914f0c113946d11e16e626df8eaa2b0b/argon2_cffi_bindings-25.1.0-pp310-pypy310_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5acb4e41090d53f17ca1110c3427f0a130f944b896fc8c83973219c97f57b690", size = 27355, upload-time = "2025-07-30T10:02:02.867Z" },
+ { url = "https://files.pythonhosted.org/packages/ee/82/82745642d3c46e7cea25e1885b014b033f4693346ce46b7f47483cf5d448/argon2_cffi_bindings-25.1.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:da0c79c23a63723aa5d782250fbf51b768abca630285262fb5144ba5ae01e520", size = 29187, upload-time = "2025-07-30T10:02:03.674Z" },
+]
+
+[[package]]
+name = "arrow"
+version = "1.4.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "python-dateutil" },
+ { name = "tzdata" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b9/33/032cdc44182491aa708d06a68b62434140d8c50820a087fac7af37703357/arrow-1.4.0.tar.gz", hash = "sha256:ed0cc050e98001b8779e84d461b0098c4ac597e88704a655582b21d116e526d7", size = 152931, upload-time = "2025-10-18T17:46:46.761Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ed/c9/d7977eaacb9df673210491da99e6a247e93df98c715fc43fd136ce1d3d33/arrow-1.4.0-py3-none-any.whl", hash = "sha256:749f0769958ebdc79c173ff0b0670d59051a535fa26e8eba02953dc19eb43205", size = 68797, upload-time = "2025-10-18T17:46:45.663Z" },
+]
+
+[[package]]
+name = "asttokens"
+version = "3.0.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/be/a5/8e3f9b6771b0b408517c82d97aed8f2036509bc247d46114925e32fe33f0/asttokens-3.0.1.tar.gz", hash = "sha256:71a4ee5de0bde6a31d64f6b13f2293ac190344478f081c3d1bccfcf5eacb0cb7", size = 62308, upload-time = "2025-11-15T16:43:48.578Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/d2/39/e7eaf1799466a4aef85b6a4fe7bd175ad2b1c6345066aa33f1f58d4b18d0/asttokens-3.0.1-py3-none-any.whl", hash = "sha256:15a3ebc0f43c2d0a50eeafea25e19046c68398e487b9f1f5b517f7c0f40f976a", size = 27047, upload-time = "2025-11-15T16:43:16.109Z" },
+]
+
+[[package]]
+name = "async-lru"
+version = "2.1.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "typing-extensions", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ef/c3/bbf34f15ea88dfb649ab2c40f9d75081784a50573a9ea431563cab64adb8/async_lru-2.1.0.tar.gz", hash = "sha256:9eeb2fecd3fe42cc8a787fc32ead53a3a7158cc43d039c3c55ab3e4e5b2a80ed", size = 12041, upload-time = "2026-01-17T22:52:18.931Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/2e/e9/eb6a5db5ac505d5d45715388e92bced7a5bb556facc4d0865d192823f2d2/async_lru-2.1.0-py3-none-any.whl", hash = "sha256:fa12dcf99a42ac1280bc16c634bbaf06883809790f6304d85cdab3f666f33a7e", size = 6933, upload-time = "2026-01-17T22:52:17.389Z" },
+]
+
+[[package]]
+name = "attrs"
+version = "25.4.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/6b/5c/685e6633917e101e5dcb62b9dd76946cbb57c26e133bae9e0cd36033c0a9/attrs-25.4.0.tar.gz", hash = "sha256:16d5969b87f0859ef33a48b35d55ac1be6e42ae49d5e853b597db70c35c57e11", size = 934251, upload-time = "2025-10-06T13:54:44.725Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3a/2a/7cc015f5b9f5db42b7d48157e23356022889fc354a2813c15934b7cb5c0e/attrs-25.4.0-py3-none-any.whl", hash = "sha256:adcf7e2a1fb3b36ac48d97835bb6d8ade15b8dcce26aba8bf1d14847b57a3373", size = 67615, upload-time = "2025-10-06T13:54:43.17Z" },
+]
+
+[[package]]
+name = "babel"
+version = "2.18.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/7d/b2/51899539b6ceeeb420d40ed3cd4b7a40519404f9baf3d4ac99dc413a834b/babel-2.18.0.tar.gz", hash = "sha256:b80b99a14bd085fcacfa15c9165f651fbb3406e66cc603abf11c5750937c992d", size = 9959554, upload-time = "2026-02-01T12:30:56.078Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/77/f5/21d2de20e8b8b0408f0681956ca2c69f1320a3848ac50e6e7f39c6159675/babel-2.18.0-py3-none-any.whl", hash = "sha256:e2b422b277c2b9a9630c1d7903c2a00d0830c409c59ac8cae9081c92f1aeba35", size = 10196845, upload-time = "2026-02-01T12:30:53.445Z" },
+]
+
+[[package]]
+name = "beautifulsoup4"
+version = "4.14.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "soupsieve" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c3/b0/1c6a16426d389813b48d95e26898aff79abbde42ad353958ad95cc8c9b21/beautifulsoup4-4.14.3.tar.gz", hash = "sha256:6292b1c5186d356bba669ef9f7f051757099565ad9ada5dd630bd9de5fa7fb86", size = 627737, upload-time = "2025-11-30T15:08:26.084Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1a/39/47f9197bdd44df24d67ac8893641e16f386c984a0619ef2ee4c51fbbc019/beautifulsoup4-4.14.3-py3-none-any.whl", hash = "sha256:0918bfe44902e6ad8d57732ba310582e98da931428d231a5ecb9e7c703a735bb", size = 107721, upload-time = "2025-11-30T15:08:24.087Z" },
+]
+
+[[package]]
+name = "black"
+version = "24.2.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "click" },
+ { name = "mypy-extensions" },
+ { name = "packaging" },
+ { name = "pathspec" },
+ { name = "platformdirs" },
+ { name = "tomli", marker = "python_full_version < '3.11'" },
+ { name = "typing-extensions", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/29/69/f3ab49cdb938b3eecb048fa64f86bdadb1fac26e92c435d287181d543b0a/black-24.2.0.tar.gz", hash = "sha256:bce4f25c27c3435e4dace4815bcb2008b87e167e3bf4ee47ccdc5ce906eb4894", size = 631598, upload-time = "2024-02-12T20:21:26.969Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/eb/75/25d478751e03a46663b3ae9ff961deb3c2ce1b24885f6fc010df5ba08b1d/black-24.2.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:6981eae48b3b33399c8757036c7f5d48a535b962a7c2310d19361edeef64ce29", size = 1575905, upload-time = "2024-02-12T20:32:30.688Z" },
+ { url = "https://files.pythonhosted.org/packages/88/12/59b6ed8eefa43a0b131a97db5820141b94f6722c627cb10346c5408cf7f3/black-24.2.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:d533d5e3259720fdbc1b37444491b024003e012c5173f7d06825a77508085430", size = 1423215, upload-time = "2024-02-12T20:31:24.135Z" },
+ { url = "https://files.pythonhosted.org/packages/46/b8/70a3cab340301d480f601d483452e6e68da61202abad881f1a91250c2d27/black-24.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:61a0391772490ddfb8a693c067df1ef5227257e72b0e4108482b8d41b5aee13f", size = 1730127, upload-time = "2024-02-12T20:23:44.397Z" },
+ { url = "https://files.pythonhosted.org/packages/64/5f/74be45f22ac104b4759084c792798f6b5695a5b4b2064e31222fea7fd3c7/black-24.2.0-cp310-cp310-win_amd64.whl", hash = "sha256:992e451b04667116680cb88f63449267c13e1ad134f30087dec8527242e9862a", size = 1355109, upload-time = "2024-02-12T20:25:09.905Z" },
+ { url = "https://files.pythonhosted.org/packages/6c/de/938fd8c271ee903212f3599d7ece0d4a930af667c51135d4e6d1fac50455/black-24.2.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:163baf4ef40e6897a2a9b83890e59141cc8c2a98f2dda5080dc15c00ee1e62cd", size = 1556863, upload-time = "2024-02-12T20:33:27.299Z" },
+ { url = "https://files.pythonhosted.org/packages/11/15/4646143c19bc896a0cbc0c6663abf5751b465edefbc49491179454e2e2fd/black-24.2.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:e37c99f89929af50ffaf912454b3e3b47fd64109659026b678c091a4cd450fb2", size = 1403471, upload-time = "2024-02-12T20:33:11.704Z" },
+ { url = "https://files.pythonhosted.org/packages/1a/53/0b82f36f0a8beee5388f6bcc3a1591e094c87a275c64532ac6ce69fb491f/black-24.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4f9de21bafcba9683853f6c96c2d515e364aee631b178eaa5145fc1c61a3cc92", size = 1710660, upload-time = "2024-02-12T20:24:29.273Z" },
+ { url = "https://files.pythonhosted.org/packages/72/be/58ae1a41e1d174a05d074a7a10524f1cdf37df13be3ac314098ab9435550/black-24.2.0-cp311-cp311-win_amd64.whl", hash = "sha256:9db528bccb9e8e20c08e716b3b09c6bdd64da0dd129b11e160bf082d4642ac23", size = 1361881, upload-time = "2024-02-12T20:25:27.484Z" },
+ { url = "https://files.pythonhosted.org/packages/43/1e/67c87a1fb39592aa944f35cc26892946ebe0a10aa324b87f9380b8753862/black-24.2.0-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:d84f29eb3ee44859052073b7636533ec995bd0f64e2fb43aeceefc70090e752b", size = 1585288, upload-time = "2024-02-12T20:37:13.8Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/62/6437212cf40e40b74dbc7e134700a21cb21a9ac7e46ade940b5d4826456f/black-24.2.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1e08fb9a15c914b81dd734ddd7fb10513016e5ce7e6704bdd5e1251ceee51ac9", size = 1417360, upload-time = "2024-02-12T20:34:56.41Z" },
+ { url = "https://files.pythonhosted.org/packages/36/8f/de0d339ae683422a8e15d6f74b8022d4947009c347d8c2178c303c68cc4d/black-24.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:810d445ae6069ce64030c78ff6127cd9cd178a9ac3361435708b907d8a04c693", size = 1739406, upload-time = "2024-02-12T20:23:59.596Z" },
+ { url = "https://files.pythonhosted.org/packages/3e/58/89e5f5a1c4c5b66dc74eabe6337623d53b4d1c27fbbbe16defee53397f60/black-24.2.0-cp312-cp312-win_amd64.whl", hash = "sha256:ba15742a13de85e9b8f3239c8f807723991fbfae24bad92d34a2b12e81904982", size = 1373310, upload-time = "2024-02-12T20:25:27.243Z" },
+ { url = "https://files.pythonhosted.org/packages/47/15/b3770bc3328685a53bc9c041136240146c5cd866a1f020c2cf47f2ff9683/black-24.2.0-py3-none-any.whl", hash = "sha256:e8a6ae970537e67830776488bca52000eaa37fa63b9988e8c487458d9cd5ace6", size = 200610, upload-time = "2024-02-12T20:21:17.657Z" },
+]
+
+[[package]]
+name = "bleach"
+version = "6.3.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "webencodings" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/07/18/3c8523962314be6bf4c8989c79ad9531c825210dd13a8669f6b84336e8bd/bleach-6.3.0.tar.gz", hash = "sha256:6f3b91b1c0a02bb9a78b5a454c92506aa0fdf197e1d5e114d2e00c6f64306d22", size = 203533, upload-time = "2025-10-27T17:57:39.211Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/cd/3a/577b549de0cc09d95f11087ee63c739bba856cd3952697eec4c4bb91350a/bleach-6.3.0-py3-none-any.whl", hash = "sha256:fe10ec77c93ddf3d13a73b035abaac7a9f5e436513864ccdad516693213c65d6", size = 164437, upload-time = "2025-10-27T17:57:37.538Z" },
+]
+
+[package.optional-dependencies]
+css = [
+ { name = "tinycss2" },
+]
+
+[[package]]
+name = "certifi"
+version = "2026.1.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e0/2d/a891ca51311197f6ad14a7ef42e2399f36cf2f9bd44752b3dc4eab60fdc5/certifi-2026.1.4.tar.gz", hash = "sha256:ac726dd470482006e014ad384921ed6438c457018f4b3d204aea4281258b2120", size = 154268, upload-time = "2026-01-04T02:42:41.825Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e6/ad/3cc14f097111b4de0040c83a525973216457bbeeb63739ef1ed275c1c021/certifi-2026.1.4-py3-none-any.whl", hash = "sha256:9943707519e4add1115f44c2bc244f782c0249876bf51b6599fee1ffbedd685c", size = 152900, upload-time = "2026-01-04T02:42:40.15Z" },
+]
+
+[[package]]
+name = "cffi"
+version = "2.0.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "pycparser", marker = "implementation_name != 'PyPy'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/eb/56/b1ba7935a17738ae8453301356628e8147c79dbb825bcbc73dc7401f9846/cffi-2.0.0.tar.gz", hash = "sha256:44d1b5909021139fe36001ae048dbdde8214afa20200eda0f64c068cac5d5529", size = 523588, upload-time = "2025-09-08T23:24:04.541Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/93/d7/516d984057745a6cd96575eea814fe1edd6646ee6efd552fb7b0921dec83/cffi-2.0.0-cp310-cp310-macosx_10_13_x86_64.whl", hash = "sha256:0cf2d91ecc3fcc0625c2c530fe004f82c110405f101548512cce44322fa8ac44", size = 184283, upload-time = "2025-09-08T23:22:08.01Z" },
+ { url = "https://files.pythonhosted.org/packages/9e/84/ad6a0b408daa859246f57c03efd28e5dd1b33c21737c2db84cae8c237aa5/cffi-2.0.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:f73b96c41e3b2adedc34a7356e64c8eb96e03a3782b535e043a986276ce12a49", size = 180504, upload-time = "2025-09-08T23:22:10.637Z" },
+ { url = "https://files.pythonhosted.org/packages/50/bd/b1a6362b80628111e6653c961f987faa55262b4002fcec42308cad1db680/cffi-2.0.0-cp310-cp310-manylinux1_i686.manylinux2014_i686.manylinux_2_17_i686.manylinux_2_5_i686.whl", hash = "sha256:53f77cbe57044e88bbd5ed26ac1d0514d2acf0591dd6bb02a3ae37f76811b80c", size = 208811, upload-time = "2025-09-08T23:22:12.267Z" },
+ { url = "https://files.pythonhosted.org/packages/4f/27/6933a8b2562d7bd1fb595074cf99cc81fc3789f6a6c05cdabb46284a3188/cffi-2.0.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:3e837e369566884707ddaf85fc1744b47575005c0a229de3327f8f9a20f4efeb", size = 216402, upload-time = "2025-09-08T23:22:13.455Z" },
+ { url = "https://files.pythonhosted.org/packages/05/eb/b86f2a2645b62adcfff53b0dd97e8dfafb5c8aa864bd0d9a2c2049a0d551/cffi-2.0.0-cp310-cp310-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:5eda85d6d1879e692d546a078b44251cdd08dd1cfb98dfb77b670c97cee49ea0", size = 203217, upload-time = "2025-09-08T23:22:14.596Z" },
+ { url = "https://files.pythonhosted.org/packages/9f/e0/6cbe77a53acf5acc7c08cc186c9928864bd7c005f9efd0d126884858a5fe/cffi-2.0.0-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:9332088d75dc3241c702d852d4671613136d90fa6881da7d770a483fd05248b4", size = 203079, upload-time = "2025-09-08T23:22:15.769Z" },
+ { url = "https://files.pythonhosted.org/packages/98/29/9b366e70e243eb3d14a5cb488dfd3a0b6b2f1fb001a203f653b93ccfac88/cffi-2.0.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:fc7de24befaeae77ba923797c7c87834c73648a05a4bde34b3b7e5588973a453", size = 216475, upload-time = "2025-09-08T23:22:17.427Z" },
+ { url = "https://files.pythonhosted.org/packages/21/7a/13b24e70d2f90a322f2900c5d8e1f14fa7e2a6b3332b7309ba7b2ba51a5a/cffi-2.0.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:cf364028c016c03078a23b503f02058f1814320a56ad535686f90565636a9495", size = 218829, upload-time = "2025-09-08T23:22:19.069Z" },
+ { url = "https://files.pythonhosted.org/packages/60/99/c9dc110974c59cc981b1f5b66e1d8af8af764e00f0293266824d9c4254bc/cffi-2.0.0-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:e11e82b744887154b182fd3e7e8512418446501191994dbf9c9fc1f32cc8efd5", size = 211211, upload-time = "2025-09-08T23:22:20.588Z" },
+ { url = "https://files.pythonhosted.org/packages/49/72/ff2d12dbf21aca1b32a40ed792ee6b40f6dc3a9cf1644bd7ef6e95e0ac5e/cffi-2.0.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:8ea985900c5c95ce9db1745f7933eeef5d314f0565b27625d9a10ec9881e1bfb", size = 218036, upload-time = "2025-09-08T23:22:22.143Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/cc/027d7fb82e58c48ea717149b03bcadcbdc293553edb283af792bd4bcbb3f/cffi-2.0.0-cp310-cp310-win32.whl", hash = "sha256:1f72fb8906754ac8a2cc3f9f5aaa298070652a0ffae577e0ea9bd480dc3c931a", size = 172184, upload-time = "2025-09-08T23:22:23.328Z" },
+ { url = "https://files.pythonhosted.org/packages/33/fa/072dd15ae27fbb4e06b437eb6e944e75b068deb09e2a2826039e49ee2045/cffi-2.0.0-cp310-cp310-win_amd64.whl", hash = "sha256:b18a3ed7d5b3bd8d9ef7a8cb226502c6bf8308df1525e1cc676c3680e7176739", size = 182790, upload-time = "2025-09-08T23:22:24.752Z" },
+ { url = "https://files.pythonhosted.org/packages/12/4a/3dfd5f7850cbf0d06dc84ba9aa00db766b52ca38d8b86e3a38314d52498c/cffi-2.0.0-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:b4c854ef3adc177950a8dfc81a86f5115d2abd545751a304c5bcf2c2c7283cfe", size = 184344, upload-time = "2025-09-08T23:22:26.456Z" },
+ { url = "https://files.pythonhosted.org/packages/4f/8b/f0e4c441227ba756aafbe78f117485b25bb26b1c059d01f137fa6d14896b/cffi-2.0.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:2de9a304e27f7596cd03d16f1b7c72219bd944e99cc52b84d0145aefb07cbd3c", size = 180560, upload-time = "2025-09-08T23:22:28.197Z" },
+ { url = "https://files.pythonhosted.org/packages/b1/b7/1200d354378ef52ec227395d95c2576330fd22a869f7a70e88e1447eb234/cffi-2.0.0-cp311-cp311-manylinux1_i686.manylinux2014_i686.manylinux_2_17_i686.manylinux_2_5_i686.whl", hash = "sha256:baf5215e0ab74c16e2dd324e8ec067ef59e41125d3eade2b863d294fd5035c92", size = 209613, upload-time = "2025-09-08T23:22:29.475Z" },
+ { url = "https://files.pythonhosted.org/packages/b8/56/6033f5e86e8cc9bb629f0077ba71679508bdf54a9a5e112a3c0b91870332/cffi-2.0.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:730cacb21e1bdff3ce90babf007d0a0917cc3e6492f336c2f0134101e0944f93", size = 216476, upload-time = "2025-09-08T23:22:31.063Z" },
+ { url = "https://files.pythonhosted.org/packages/dc/7f/55fecd70f7ece178db2f26128ec41430d8720f2d12ca97bf8f0a628207d5/cffi-2.0.0-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:6824f87845e3396029f3820c206e459ccc91760e8fa24422f8b0c3d1731cbec5", size = 203374, upload-time = "2025-09-08T23:22:32.507Z" },
+ { url = "https://files.pythonhosted.org/packages/84/ef/a7b77c8bdc0f77adc3b46888f1ad54be8f3b7821697a7b89126e829e676a/cffi-2.0.0-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:9de40a7b0323d889cf8d23d1ef214f565ab154443c42737dfe52ff82cf857664", size = 202597, upload-time = "2025-09-08T23:22:34.132Z" },
+ { url = "https://files.pythonhosted.org/packages/d7/91/500d892b2bf36529a75b77958edfcd5ad8e2ce4064ce2ecfeab2125d72d1/cffi-2.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8941aaadaf67246224cee8c3803777eed332a19d909b47e29c9842ef1e79ac26", size = 215574, upload-time = "2025-09-08T23:22:35.443Z" },
+ { url = "https://files.pythonhosted.org/packages/44/64/58f6255b62b101093d5df22dcb752596066c7e89dd725e0afaed242a61be/cffi-2.0.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:a05d0c237b3349096d3981b727493e22147f934b20f6f125a3eba8f994bec4a9", size = 218971, upload-time = "2025-09-08T23:22:36.805Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/49/fa72cebe2fd8a55fbe14956f9970fe8eb1ac59e5df042f603ef7c8ba0adc/cffi-2.0.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:94698a9c5f91f9d138526b48fe26a199609544591f859c870d477351dc7b2414", size = 211972, upload-time = "2025-09-08T23:22:38.436Z" },
+ { url = "https://files.pythonhosted.org/packages/0b/28/dd0967a76aab36731b6ebfe64dec4e981aff7e0608f60c2d46b46982607d/cffi-2.0.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:5fed36fccc0612a53f1d4d9a816b50a36702c28a2aa880cb8a122b3466638743", size = 217078, upload-time = "2025-09-08T23:22:39.776Z" },
+ { url = "https://files.pythonhosted.org/packages/2b/c0/015b25184413d7ab0a410775fdb4a50fca20f5589b5dab1dbbfa3baad8ce/cffi-2.0.0-cp311-cp311-win32.whl", hash = "sha256:c649e3a33450ec82378822b3dad03cc228b8f5963c0c12fc3b1e0ab940f768a5", size = 172076, upload-time = "2025-09-08T23:22:40.95Z" },
+ { url = "https://files.pythonhosted.org/packages/ae/8f/dc5531155e7070361eb1b7e4c1a9d896d0cb21c49f807a6c03fd63fc877e/cffi-2.0.0-cp311-cp311-win_amd64.whl", hash = "sha256:66f011380d0e49ed280c789fbd08ff0d40968ee7b665575489afa95c98196ab5", size = 182820, upload-time = "2025-09-08T23:22:42.463Z" },
+ { url = "https://files.pythonhosted.org/packages/95/5c/1b493356429f9aecfd56bc171285a4c4ac8697f76e9bbbbb105e537853a1/cffi-2.0.0-cp311-cp311-win_arm64.whl", hash = "sha256:c6638687455baf640e37344fe26d37c404db8b80d037c3d29f58fe8d1c3b194d", size = 177635, upload-time = "2025-09-08T23:22:43.623Z" },
+ { url = "https://files.pythonhosted.org/packages/ea/47/4f61023ea636104d4f16ab488e268b93008c3d0bb76893b1b31db1f96802/cffi-2.0.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:6d02d6655b0e54f54c4ef0b94eb6be0607b70853c45ce98bd278dc7de718be5d", size = 185271, upload-time = "2025-09-08T23:22:44.795Z" },
+ { url = "https://files.pythonhosted.org/packages/df/a2/781b623f57358e360d62cdd7a8c681f074a71d445418a776eef0aadb4ab4/cffi-2.0.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:8eca2a813c1cb7ad4fb74d368c2ffbbb4789d377ee5bb8df98373c2cc0dee76c", size = 181048, upload-time = "2025-09-08T23:22:45.938Z" },
+ { url = "https://files.pythonhosted.org/packages/ff/df/a4f0fbd47331ceeba3d37c2e51e9dfc9722498becbeec2bd8bc856c9538a/cffi-2.0.0-cp312-cp312-manylinux1_i686.manylinux2014_i686.manylinux_2_17_i686.manylinux_2_5_i686.whl", hash = "sha256:21d1152871b019407d8ac3985f6775c079416c282e431a4da6afe7aefd2bccbe", size = 212529, upload-time = "2025-09-08T23:22:47.349Z" },
+ { url = "https://files.pythonhosted.org/packages/d5/72/12b5f8d3865bf0f87cf1404d8c374e7487dcf097a1c91c436e72e6badd83/cffi-2.0.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:b21e08af67b8a103c71a250401c78d5e0893beff75e28c53c98f4de42f774062", size = 220097, upload-time = "2025-09-08T23:22:48.677Z" },
+ { url = "https://files.pythonhosted.org/packages/c2/95/7a135d52a50dfa7c882ab0ac17e8dc11cec9d55d2c18dda414c051c5e69e/cffi-2.0.0-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:1e3a615586f05fc4065a8b22b8152f0c1b00cdbc60596d187c2a74f9e3036e4e", size = 207983, upload-time = "2025-09-08T23:22:50.06Z" },
+ { url = "https://files.pythonhosted.org/packages/3a/c8/15cb9ada8895957ea171c62dc78ff3e99159ee7adb13c0123c001a2546c1/cffi-2.0.0-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:81afed14892743bbe14dacb9e36d9e0e504cd204e0b165062c488942b9718037", size = 206519, upload-time = "2025-09-08T23:22:51.364Z" },
+ { url = "https://files.pythonhosted.org/packages/78/2d/7fa73dfa841b5ac06c7b8855cfc18622132e365f5b81d02230333ff26e9e/cffi-2.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3e17ed538242334bf70832644a32a7aae3d83b57567f9fd60a26257e992b79ba", size = 219572, upload-time = "2025-09-08T23:22:52.902Z" },
+ { url = "https://files.pythonhosted.org/packages/07/e0/267e57e387b4ca276b90f0434ff88b2c2241ad72b16d31836adddfd6031b/cffi-2.0.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:3925dd22fa2b7699ed2617149842d2e6adde22b262fcbfada50e3d195e4b3a94", size = 222963, upload-time = "2025-09-08T23:22:54.518Z" },
+ { url = "https://files.pythonhosted.org/packages/b6/75/1f2747525e06f53efbd878f4d03bac5b859cbc11c633d0fb81432d98a795/cffi-2.0.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:2c8f814d84194c9ea681642fd164267891702542f028a15fc97d4674b6206187", size = 221361, upload-time = "2025-09-08T23:22:55.867Z" },
+ { url = "https://files.pythonhosted.org/packages/7b/2b/2b6435f76bfeb6bbf055596976da087377ede68df465419d192acf00c437/cffi-2.0.0-cp312-cp312-win32.whl", hash = "sha256:da902562c3e9c550df360bfa53c035b2f241fed6d9aef119048073680ace4a18", size = 172932, upload-time = "2025-09-08T23:22:57.188Z" },
+ { url = "https://files.pythonhosted.org/packages/f8/ed/13bd4418627013bec4ed6e54283b1959cf6db888048c7cf4b4c3b5b36002/cffi-2.0.0-cp312-cp312-win_amd64.whl", hash = "sha256:da68248800ad6320861f129cd9c1bf96ca849a2771a59e0344e88681905916f5", size = 183557, upload-time = "2025-09-08T23:22:58.351Z" },
+ { url = "https://files.pythonhosted.org/packages/95/31/9f7f93ad2f8eff1dbc1c3656d7ca5bfd8fb52c9d786b4dcf19b2d02217fa/cffi-2.0.0-cp312-cp312-win_arm64.whl", hash = "sha256:4671d9dd5ec934cb9a73e7ee9676f9362aba54f7f34910956b84d727b0d73fb6", size = 177762, upload-time = "2025-09-08T23:22:59.668Z" },
+]
+
+[[package]]
+name = "charset-normalizer"
+version = "3.4.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/13/69/33ddede1939fdd074bce5434295f38fae7136463422fe4fd3e0e89b98062/charset_normalizer-3.4.4.tar.gz", hash = "sha256:94537985111c35f28720e43603b8e7b43a6ecfb2ce1d3058bbe955b73404e21a", size = 129418, upload-time = "2025-10-14T04:42:32.879Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1f/b8/6d51fc1d52cbd52cd4ccedd5b5b2f0f6a11bbf6765c782298b0f3e808541/charset_normalizer-3.4.4-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:e824f1492727fa856dd6eda4f7cee25f8518a12f3c4a56a74e8095695089cf6d", size = 209709, upload-time = "2025-10-14T04:40:11.385Z" },
+ { url = "https://files.pythonhosted.org/packages/5c/af/1f9d7f7faafe2ddfb6f72a2e07a548a629c61ad510fe60f9630309908fef/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4bd5d4137d500351a30687c2d3971758aac9a19208fc110ccb9d7188fbe709e8", size = 148814, upload-time = "2025-10-14T04:40:13.135Z" },
+ { url = "https://files.pythonhosted.org/packages/79/3d/f2e3ac2bbc056ca0c204298ea4e3d9db9b4afe437812638759db2c976b5f/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:027f6de494925c0ab2a55eab46ae5129951638a49a34d87f4c3eda90f696b4ad", size = 144467, upload-time = "2025-10-14T04:40:14.728Z" },
+ { url = "https://files.pythonhosted.org/packages/ec/85/1bf997003815e60d57de7bd972c57dc6950446a3e4ccac43bc3070721856/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:f820802628d2694cb7e56db99213f930856014862f3fd943d290ea8438d07ca8", size = 162280, upload-time = "2025-10-14T04:40:16.14Z" },
+ { url = "https://files.pythonhosted.org/packages/3e/8e/6aa1952f56b192f54921c436b87f2aaf7c7a7c3d0d1a765547d64fd83c13/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:798d75d81754988d2565bff1b97ba5a44411867c0cf32b77a7e8f8d84796b10d", size = 159454, upload-time = "2025-10-14T04:40:17.567Z" },
+ { url = "https://files.pythonhosted.org/packages/36/3b/60cbd1f8e93aa25d1c669c649b7a655b0b5fb4c571858910ea9332678558/charset_normalizer-3.4.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9d1bb833febdff5c8927f922386db610b49db6e0d4f4ee29601d71e7c2694313", size = 153609, upload-time = "2025-10-14T04:40:19.08Z" },
+ { url = "https://files.pythonhosted.org/packages/64/91/6a13396948b8fd3c4b4fd5bc74d045f5637d78c9675585e8e9fbe5636554/charset_normalizer-3.4.4-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:9cd98cdc06614a2f768d2b7286d66805f94c48cde050acdbbb7db2600ab3197e", size = 151849, upload-time = "2025-10-14T04:40:20.607Z" },
+ { url = "https://files.pythonhosted.org/packages/b7/7a/59482e28b9981d105691e968c544cc0df3b7d6133152fb3dcdc8f135da7a/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:077fbb858e903c73f6c9db43374fd213b0b6a778106bc7032446a8e8b5b38b93", size = 151586, upload-time = "2025-10-14T04:40:21.719Z" },
+ { url = "https://files.pythonhosted.org/packages/92/59/f64ef6a1c4bdd2baf892b04cd78792ed8684fbc48d4c2afe467d96b4df57/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_armv7l.whl", hash = "sha256:244bfb999c71b35de57821b8ea746b24e863398194a4014e4c76adc2bbdfeff0", size = 145290, upload-time = "2025-10-14T04:40:23.069Z" },
+ { url = "https://files.pythonhosted.org/packages/6b/63/3bf9f279ddfa641ffa1962b0db6a57a9c294361cc2f5fcac997049a00e9c/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:64b55f9dce520635f018f907ff1b0df1fdc31f2795a922fb49dd14fbcdf48c84", size = 163663, upload-time = "2025-10-14T04:40:24.17Z" },
+ { url = "https://files.pythonhosted.org/packages/ed/09/c9e38fc8fa9e0849b172b581fd9803bdf6e694041127933934184e19f8c3/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:faa3a41b2b66b6e50f84ae4a68c64fcd0c44355741c6374813a800cd6695db9e", size = 151964, upload-time = "2025-10-14T04:40:25.368Z" },
+ { url = "https://files.pythonhosted.org/packages/d2/d1/d28b747e512d0da79d8b6a1ac18b7ab2ecfd81b2944c4c710e166d8dd09c/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_s390x.whl", hash = "sha256:6515f3182dbe4ea06ced2d9e8666d97b46ef4c75e326b79bb624110f122551db", size = 161064, upload-time = "2025-10-14T04:40:26.806Z" },
+ { url = "https://files.pythonhosted.org/packages/bb/9a/31d62b611d901c3b9e5500c36aab0ff5eb442043fb3a1c254200d3d397d9/charset_normalizer-3.4.4-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:cc00f04ed596e9dc0da42ed17ac5e596c6ccba999ba6bd92b0e0aef2f170f2d6", size = 155015, upload-time = "2025-10-14T04:40:28.284Z" },
+ { url = "https://files.pythonhosted.org/packages/1f/f3/107e008fa2bff0c8b9319584174418e5e5285fef32f79d8ee6a430d0039c/charset_normalizer-3.4.4-cp310-cp310-win32.whl", hash = "sha256:f34be2938726fc13801220747472850852fe6b1ea75869a048d6f896838c896f", size = 99792, upload-time = "2025-10-14T04:40:29.613Z" },
+ { url = "https://files.pythonhosted.org/packages/eb/66/e396e8a408843337d7315bab30dbf106c38966f1819f123257f5520f8a96/charset_normalizer-3.4.4-cp310-cp310-win_amd64.whl", hash = "sha256:a61900df84c667873b292c3de315a786dd8dac506704dea57bc957bd31e22c7d", size = 107198, upload-time = "2025-10-14T04:40:30.644Z" },
+ { url = "https://files.pythonhosted.org/packages/b5/58/01b4f815bf0312704c267f2ccb6e5d42bcc7752340cd487bc9f8c3710597/charset_normalizer-3.4.4-cp310-cp310-win_arm64.whl", hash = "sha256:cead0978fc57397645f12578bfd2d5ea9138ea0fac82b2f63f7f7c6877986a69", size = 100262, upload-time = "2025-10-14T04:40:32.108Z" },
+ { url = "https://files.pythonhosted.org/packages/ed/27/c6491ff4954e58a10f69ad90aca8a1b6fe9c5d3c6f380907af3c37435b59/charset_normalizer-3.4.4-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:6e1fcf0720908f200cd21aa4e6750a48ff6ce4afe7ff5a79a90d5ed8a08296f8", size = 206988, upload-time = "2025-10-14T04:40:33.79Z" },
+ { url = "https://files.pythonhosted.org/packages/94/59/2e87300fe67ab820b5428580a53cad894272dbb97f38a7a814a2a1ac1011/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5f819d5fe9234f9f82d75bdfa9aef3a3d72c4d24a6e57aeaebba32a704553aa0", size = 147324, upload-time = "2025-10-14T04:40:34.961Z" },
+ { url = "https://files.pythonhosted.org/packages/07/fb/0cf61dc84b2b088391830f6274cb57c82e4da8bbc2efeac8c025edb88772/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:a59cb51917aa591b1c4e6a43c132f0cdc3c76dbad6155df4e28ee626cc77a0a3", size = 142742, upload-time = "2025-10-14T04:40:36.105Z" },
+ { url = "https://files.pythonhosted.org/packages/62/8b/171935adf2312cd745d290ed93cf16cf0dfe320863ab7cbeeae1dcd6535f/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:8ef3c867360f88ac904fd3f5e1f902f13307af9052646963ee08ff4f131adafc", size = 160863, upload-time = "2025-10-14T04:40:37.188Z" },
+ { url = "https://files.pythonhosted.org/packages/09/73/ad875b192bda14f2173bfc1bc9a55e009808484a4b256748d931b6948442/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:d9e45d7faa48ee908174d8fe84854479ef838fc6a705c9315372eacbc2f02897", size = 157837, upload-time = "2025-10-14T04:40:38.435Z" },
+ { url = "https://files.pythonhosted.org/packages/6d/fc/de9cce525b2c5b94b47c70a4b4fb19f871b24995c728e957ee68ab1671ea/charset_normalizer-3.4.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:840c25fb618a231545cbab0564a799f101b63b9901f2569faecd6b222ac72381", size = 151550, upload-time = "2025-10-14T04:40:40.053Z" },
+ { url = "https://files.pythonhosted.org/packages/55/c2/43edd615fdfba8c6f2dfbd459b25a6b3b551f24ea21981e23fb768503ce1/charset_normalizer-3.4.4-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ca5862d5b3928c4940729dacc329aa9102900382fea192fc5e52eb69d6093815", size = 149162, upload-time = "2025-10-14T04:40:41.163Z" },
+ { url = "https://files.pythonhosted.org/packages/03/86/bde4ad8b4d0e9429a4e82c1e8f5c659993a9a863ad62c7df05cf7b678d75/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d9c7f57c3d666a53421049053eaacdd14bbd0a528e2186fcb2e672effd053bb0", size = 150019, upload-time = "2025-10-14T04:40:42.276Z" },
+ { url = "https://files.pythonhosted.org/packages/1f/86/a151eb2af293a7e7bac3a739b81072585ce36ccfb4493039f49f1d3cae8c/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_armv7l.whl", hash = "sha256:277e970e750505ed74c832b4bf75dac7476262ee2a013f5574dd49075879e161", size = 143310, upload-time = "2025-10-14T04:40:43.439Z" },
+ { url = "https://files.pythonhosted.org/packages/b5/fe/43dae6144a7e07b87478fdfc4dbe9efd5defb0e7ec29f5f58a55aeef7bf7/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:31fd66405eaf47bb62e8cd575dc621c56c668f27d46a61d975a249930dd5e2a4", size = 162022, upload-time = "2025-10-14T04:40:44.547Z" },
+ { url = "https://files.pythonhosted.org/packages/80/e6/7aab83774f5d2bca81f42ac58d04caf44f0cc2b65fc6db2b3b2e8a05f3b3/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:0d3d8f15c07f86e9ff82319b3d9ef6f4bf907608f53fe9d92b28ea9ae3d1fd89", size = 149383, upload-time = "2025-10-14T04:40:46.018Z" },
+ { url = "https://files.pythonhosted.org/packages/4f/e8/b289173b4edae05c0dde07f69f8db476a0b511eac556dfe0d6bda3c43384/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:9f7fcd74d410a36883701fafa2482a6af2ff5ba96b9a620e9e0721e28ead5569", size = 159098, upload-time = "2025-10-14T04:40:47.081Z" },
+ { url = "https://files.pythonhosted.org/packages/d8/df/fe699727754cae3f8478493c7f45f777b17c3ef0600e28abfec8619eb49c/charset_normalizer-3.4.4-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:ebf3e58c7ec8a8bed6d66a75d7fb37b55e5015b03ceae72a8e7c74495551e224", size = 152991, upload-time = "2025-10-14T04:40:48.246Z" },
+ { url = "https://files.pythonhosted.org/packages/1a/86/584869fe4ddb6ffa3bd9f491b87a01568797fb9bd8933f557dba9771beaf/charset_normalizer-3.4.4-cp311-cp311-win32.whl", hash = "sha256:eecbc200c7fd5ddb9a7f16c7decb07b566c29fa2161a16cf67b8d068bd21690a", size = 99456, upload-time = "2025-10-14T04:40:49.376Z" },
+ { url = "https://files.pythonhosted.org/packages/65/f6/62fdd5feb60530f50f7e38b4f6a1d5203f4d16ff4f9f0952962c044e919a/charset_normalizer-3.4.4-cp311-cp311-win_amd64.whl", hash = "sha256:5ae497466c7901d54b639cf42d5b8c1b6a4fead55215500d2f486d34db48d016", size = 106978, upload-time = "2025-10-14T04:40:50.844Z" },
+ { url = "https://files.pythonhosted.org/packages/7a/9d/0710916e6c82948b3be62d9d398cb4fcf4e97b56d6a6aeccd66c4b2f2bd5/charset_normalizer-3.4.4-cp311-cp311-win_arm64.whl", hash = "sha256:65e2befcd84bc6f37095f5961e68a6f077bf44946771354a28ad434c2cce0ae1", size = 99969, upload-time = "2025-10-14T04:40:52.272Z" },
+ { url = "https://files.pythonhosted.org/packages/f3/85/1637cd4af66fa687396e757dec650f28025f2a2f5a5531a3208dc0ec43f2/charset_normalizer-3.4.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:0a98e6759f854bd25a58a73fa88833fba3b7c491169f86ce1180c948ab3fd394", size = 208425, upload-time = "2025-10-14T04:40:53.353Z" },
+ { url = "https://files.pythonhosted.org/packages/9d/6a/04130023fef2a0d9c62d0bae2649b69f7b7d8d24ea5536feef50551029df/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b5b290ccc2a263e8d185130284f8501e3e36c5e02750fc6b6bdeb2e9e96f1e25", size = 148162, upload-time = "2025-10-14T04:40:54.558Z" },
+ { url = "https://files.pythonhosted.org/packages/78/29/62328d79aa60da22c9e0b9a66539feae06ca0f5a4171ac4f7dc285b83688/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_armv7l.manylinux_2_17_armv7l.manylinux_2_31_armv7l.whl", hash = "sha256:74bb723680f9f7a6234dcf67aea57e708ec1fbdf5699fb91dfd6f511b0a320ef", size = 144558, upload-time = "2025-10-14T04:40:55.677Z" },
+ { url = "https://files.pythonhosted.org/packages/86/bb/b32194a4bf15b88403537c2e120b817c61cd4ecffa9b6876e941c3ee38fe/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:f1e34719c6ed0b92f418c7c780480b26b5d9c50349e9a9af7d76bf757530350d", size = 161497, upload-time = "2025-10-14T04:40:57.217Z" },
+ { url = "https://files.pythonhosted.org/packages/19/89/a54c82b253d5b9b111dc74aca196ba5ccfcca8242d0fb64146d4d3183ff1/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2437418e20515acec67d86e12bf70056a33abdacb5cb1655042f6538d6b085a8", size = 159240, upload-time = "2025-10-14T04:40:58.358Z" },
+ { url = "https://files.pythonhosted.org/packages/c0/10/d20b513afe03acc89ec33948320a5544d31f21b05368436d580dec4e234d/charset_normalizer-3.4.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:11d694519d7f29d6cd09f6ac70028dba10f92f6cdd059096db198c283794ac86", size = 153471, upload-time = "2025-10-14T04:40:59.468Z" },
+ { url = "https://files.pythonhosted.org/packages/61/fa/fbf177b55bdd727010f9c0a3c49eefa1d10f960e5f09d1d887bf93c2e698/charset_normalizer-3.4.4-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:ac1c4a689edcc530fc9d9aa11f5774b9e2f33f9a0c6a57864e90908f5208d30a", size = 150864, upload-time = "2025-10-14T04:41:00.623Z" },
+ { url = "https://files.pythonhosted.org/packages/05/12/9fbc6a4d39c0198adeebbde20b619790e9236557ca59fc40e0e3cebe6f40/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:21d142cc6c0ec30d2efee5068ca36c128a30b0f2c53c1c07bd78cb6bc1d3be5f", size = 150647, upload-time = "2025-10-14T04:41:01.754Z" },
+ { url = "https://files.pythonhosted.org/packages/ad/1f/6a9a593d52e3e8c5d2b167daf8c6b968808efb57ef4c210acb907c365bc4/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:5dbe56a36425d26d6cfb40ce79c314a2e4dd6211d51d6d2191c00bed34f354cc", size = 145110, upload-time = "2025-10-14T04:41:03.231Z" },
+ { url = "https://files.pythonhosted.org/packages/30/42/9a52c609e72471b0fc54386dc63c3781a387bb4fe61c20231a4ebcd58bdd/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:5bfbb1b9acf3334612667b61bd3002196fe2a1eb4dd74d247e0f2a4d50ec9bbf", size = 162839, upload-time = "2025-10-14T04:41:04.715Z" },
+ { url = "https://files.pythonhosted.org/packages/c4/5b/c0682bbf9f11597073052628ddd38344a3d673fda35a36773f7d19344b23/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:d055ec1e26e441f6187acf818b73564e6e6282709e9bcb5b63f5b23068356a15", size = 150667, upload-time = "2025-10-14T04:41:05.827Z" },
+ { url = "https://files.pythonhosted.org/packages/e4/24/a41afeab6f990cf2daf6cb8c67419b63b48cf518e4f56022230840c9bfb2/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:af2d8c67d8e573d6de5bc30cdb27e9b95e49115cd9baad5ddbd1a6207aaa82a9", size = 160535, upload-time = "2025-10-14T04:41:06.938Z" },
+ { url = "https://files.pythonhosted.org/packages/2a/e5/6a4ce77ed243c4a50a1fecca6aaaab419628c818a49434be428fe24c9957/charset_normalizer-3.4.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:780236ac706e66881f3b7f2f32dfe90507a09e67d1d454c762cf642e6e1586e0", size = 154816, upload-time = "2025-10-14T04:41:08.101Z" },
+ { url = "https://files.pythonhosted.org/packages/a8/ef/89297262b8092b312d29cdb2517cb1237e51db8ecef2e9af5edbe7b683b1/charset_normalizer-3.4.4-cp312-cp312-win32.whl", hash = "sha256:5833d2c39d8896e4e19b689ffc198f08ea58116bee26dea51e362ecc7cd3ed26", size = 99694, upload-time = "2025-10-14T04:41:09.23Z" },
+ { url = "https://files.pythonhosted.org/packages/3d/2d/1e5ed9dd3b3803994c155cd9aacb60c82c331bad84daf75bcb9c91b3295e/charset_normalizer-3.4.4-cp312-cp312-win_amd64.whl", hash = "sha256:a79cfe37875f822425b89a82333404539ae63dbdddf97f84dcbc3d339aae9525", size = 107131, upload-time = "2025-10-14T04:41:10.467Z" },
+ { url = "https://files.pythonhosted.org/packages/d0/d9/0ed4c7098a861482a7b6a95603edce4c0d9db2311af23da1fb2b75ec26fc/charset_normalizer-3.4.4-cp312-cp312-win_arm64.whl", hash = "sha256:376bec83a63b8021bb5c8ea75e21c4ccb86e7e45ca4eb81146091b56599b80c3", size = 100390, upload-time = "2025-10-14T04:41:11.915Z" },
+ { url = "https://files.pythonhosted.org/packages/0a/4c/925909008ed5a988ccbb72dcc897407e5d6d3bd72410d69e051fc0c14647/charset_normalizer-3.4.4-py3-none-any.whl", hash = "sha256:7a32c560861a02ff789ad905a2fe94e3f840803362c84fecf1851cb4cf3dc37f", size = 53402, upload-time = "2025-10-14T04:42:31.76Z" },
+]
+
+[[package]]
+name = "click"
+version = "8.3.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "colorama", marker = "sys_platform == 'win32'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/3d/fa/656b739db8587d7b5dfa22e22ed02566950fbfbcdc20311993483657a5c0/click-8.3.1.tar.gz", hash = "sha256:12ff4785d337a1bb490bb7e9c2b1ee5da3112e94a8622f26a6c77f5d2fc6842a", size = 295065, upload-time = "2025-11-15T20:45:42.706Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/98/78/01c019cdb5d6498122777c1a43056ebb3ebfeef2076d9d026bfe15583b2b/click-8.3.1-py3-none-any.whl", hash = "sha256:981153a64e25f12d547d3426c367a4857371575ee7ad18df2a6183ab0545b2a6", size = 108274, upload-time = "2025-11-15T20:45:41.139Z" },
+]
+
+[[package]]
+name = "cloudpickle"
+version = "3.1.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/27/fb/576f067976d320f5f0114a8d9fa1215425441bb35627b1993e5afd8111e5/cloudpickle-3.1.2.tar.gz", hash = "sha256:7fda9eb655c9c230dab534f1983763de5835249750e85fbcef43aaa30a9a2414", size = 22330, upload-time = "2025-11-03T09:25:26.604Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/88/39/799be3f2f0f38cc727ee3b4f1445fe6d5e4133064ec2e4115069418a5bb6/cloudpickle-3.1.2-py3-none-any.whl", hash = "sha256:9acb47f6afd73f60dc1df93bb801b472f05ff42fa6c84167d25cb206be1fbf4a", size = 22228, upload-time = "2025-11-03T09:25:25.534Z" },
+]
+
+[[package]]
+name = "colorama"
+version = "0.4.6"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697, upload-time = "2022-10-25T02:36:22.414Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335, upload-time = "2022-10-25T02:36:20.889Z" },
+]
+
+[[package]]
+name = "comm"
+version = "0.2.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/4c/13/7d740c5849255756bc17888787313b61fd38a0a8304fc4f073dfc46122aa/comm-0.2.3.tar.gz", hash = "sha256:2dc8048c10962d55d7ad693be1e7045d891b7ce8d999c97963a5e3e99c055971", size = 6319, upload-time = "2025-07-25T14:02:04.452Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/60/97/891a0971e1e4a8c5d2b20bbe0e524dc04548d2307fee33cdeba148fd4fc7/comm-0.2.3-py3-none-any.whl", hash = "sha256:c615d91d75f7f04f095b30d1c1711babd43bdc6419c1be9886a85f2f4e489417", size = 7294, upload-time = "2025-07-25T14:02:02.896Z" },
+]
+
+[[package]]
+name = "contourpy"
+version = "1.3.2"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+dependencies = [
+ { name = "numpy", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/66/54/eb9bfc647b19f2009dd5c7f5ec51c4e6ca831725f1aea7a993034f483147/contourpy-1.3.2.tar.gz", hash = "sha256:b6945942715a034c671b7fc54f9588126b0b8bf23db2696e3ca8328f3ff0ab54", size = 13466130, upload-time = "2025-04-15T17:47:53.79Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/12/a3/da4153ec8fe25d263aa48c1a4cbde7f49b59af86f0b6f7862788c60da737/contourpy-1.3.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:ba38e3f9f330af820c4b27ceb4b9c7feee5fe0493ea53a8720f4792667465934", size = 268551, upload-time = "2025-04-15T17:34:46.581Z" },
+ { url = "https://files.pythonhosted.org/packages/2f/6c/330de89ae1087eb622bfca0177d32a7ece50c3ef07b28002de4757d9d875/contourpy-1.3.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:dc41ba0714aa2968d1f8674ec97504a8f7e334f48eeacebcaa6256213acb0989", size = 253399, upload-time = "2025-04-15T17:34:51.427Z" },
+ { url = "https://files.pythonhosted.org/packages/c1/bd/20c6726b1b7f81a8bee5271bed5c165f0a8e1f572578a9d27e2ccb763cb2/contourpy-1.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9be002b31c558d1ddf1b9b415b162c603405414bacd6932d031c5b5a8b757f0d", size = 312061, upload-time = "2025-04-15T17:34:55.961Z" },
+ { url = "https://files.pythonhosted.org/packages/22/fc/a9665c88f8a2473f823cf1ec601de9e5375050f1958cbb356cdf06ef1ab6/contourpy-1.3.2-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8d2e74acbcba3bfdb6d9d8384cdc4f9260cae86ed9beee8bd5f54fee49a430b9", size = 351956, upload-time = "2025-04-15T17:35:00.992Z" },
+ { url = "https://files.pythonhosted.org/packages/25/eb/9f0a0238f305ad8fb7ef42481020d6e20cf15e46be99a1fcf939546a177e/contourpy-1.3.2-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e259bced5549ac64410162adc973c5e2fb77f04df4a439d00b478e57a0e65512", size = 320872, upload-time = "2025-04-15T17:35:06.177Z" },
+ { url = "https://files.pythonhosted.org/packages/32/5c/1ee32d1c7956923202f00cf8d2a14a62ed7517bdc0ee1e55301227fc273c/contourpy-1.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ad687a04bc802cbe8b9c399c07162a3c35e227e2daccf1668eb1f278cb698631", size = 325027, upload-time = "2025-04-15T17:35:11.244Z" },
+ { url = "https://files.pythonhosted.org/packages/83/bf/9baed89785ba743ef329c2b07fd0611d12bfecbedbdd3eeecf929d8d3b52/contourpy-1.3.2-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:cdd22595308f53ef2f891040ab2b93d79192513ffccbd7fe19be7aa773a5e09f", size = 1306641, upload-time = "2025-04-15T17:35:26.701Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/cc/74e5e83d1e35de2d28bd97033426b450bc4fd96e092a1f7a63dc7369b55d/contourpy-1.3.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:b4f54d6a2defe9f257327b0f243612dd051cc43825587520b1bf74a31e2f6ef2", size = 1374075, upload-time = "2025-04-15T17:35:43.204Z" },
+ { url = "https://files.pythonhosted.org/packages/0c/42/17f3b798fd5e033b46a16f8d9fcb39f1aba051307f5ebf441bad1ecf78f8/contourpy-1.3.2-cp310-cp310-win32.whl", hash = "sha256:f939a054192ddc596e031e50bb13b657ce318cf13d264f095ce9db7dc6ae81c0", size = 177534, upload-time = "2025-04-15T17:35:46.554Z" },
+ { url = "https://files.pythonhosted.org/packages/54/ec/5162b8582f2c994721018d0c9ece9dc6ff769d298a8ac6b6a652c307e7df/contourpy-1.3.2-cp310-cp310-win_amd64.whl", hash = "sha256:c440093bbc8fc21c637c03bafcbef95ccd963bc6e0514ad887932c18ca2a759a", size = 221188, upload-time = "2025-04-15T17:35:50.064Z" },
+ { url = "https://files.pythonhosted.org/packages/b3/b9/ede788a0b56fc5b071639d06c33cb893f68b1178938f3425debebe2dab78/contourpy-1.3.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:6a37a2fb93d4df3fc4c0e363ea4d16f83195fc09c891bc8ce072b9d084853445", size = 269636, upload-time = "2025-04-15T17:35:54.473Z" },
+ { url = "https://files.pythonhosted.org/packages/e6/75/3469f011d64b8bbfa04f709bfc23e1dd71be54d05b1b083be9f5b22750d1/contourpy-1.3.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:b7cd50c38f500bbcc9b6a46643a40e0913673f869315d8e70de0438817cb7773", size = 254636, upload-time = "2025-04-15T17:35:58.283Z" },
+ { url = "https://files.pythonhosted.org/packages/8d/2f/95adb8dae08ce0ebca4fd8e7ad653159565d9739128b2d5977806656fcd2/contourpy-1.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d6658ccc7251a4433eebd89ed2672c2ed96fba367fd25ca9512aa92a4b46c4f1", size = 313053, upload-time = "2025-04-15T17:36:03.235Z" },
+ { url = "https://files.pythonhosted.org/packages/c3/a6/8ccf97a50f31adfa36917707fe39c9a0cbc24b3bbb58185577f119736cc9/contourpy-1.3.2-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:70771a461aaeb335df14deb6c97439973d253ae70660ca085eec25241137ef43", size = 352985, upload-time = "2025-04-15T17:36:08.275Z" },
+ { url = "https://files.pythonhosted.org/packages/1d/b6/7925ab9b77386143f39d9c3243fdd101621b4532eb126743201160ffa7e6/contourpy-1.3.2-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:65a887a6e8c4cd0897507d814b14c54a8c2e2aa4ac9f7686292f9769fcf9a6ab", size = 323750, upload-time = "2025-04-15T17:36:13.29Z" },
+ { url = "https://files.pythonhosted.org/packages/c2/f3/20c5d1ef4f4748e52d60771b8560cf00b69d5c6368b5c2e9311bcfa2a08b/contourpy-1.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3859783aefa2b8355697f16642695a5b9792e7a46ab86da1118a4a23a51a33d7", size = 326246, upload-time = "2025-04-15T17:36:18.329Z" },
+ { url = "https://files.pythonhosted.org/packages/8c/e5/9dae809e7e0b2d9d70c52b3d24cba134dd3dad979eb3e5e71f5df22ed1f5/contourpy-1.3.2-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:eab0f6db315fa4d70f1d8ab514e527f0366ec021ff853d7ed6a2d33605cf4b83", size = 1308728, upload-time = "2025-04-15T17:36:33.878Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/4a/0058ba34aeea35c0b442ae61a4f4d4ca84d6df8f91309bc2d43bb8dd248f/contourpy-1.3.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:d91a3ccc7fea94ca0acab82ceb77f396d50a1f67412efe4c526f5d20264e6ecd", size = 1375762, upload-time = "2025-04-15T17:36:51.295Z" },
+ { url = "https://files.pythonhosted.org/packages/09/33/7174bdfc8b7767ef2c08ed81244762d93d5c579336fc0b51ca57b33d1b80/contourpy-1.3.2-cp311-cp311-win32.whl", hash = "sha256:1c48188778d4d2f3d48e4643fb15d8608b1d01e4b4d6b0548d9b336c28fc9b6f", size = 178196, upload-time = "2025-04-15T17:36:55.002Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/fe/4029038b4e1c4485cef18e480b0e2cd2d755448bb071eb9977caac80b77b/contourpy-1.3.2-cp311-cp311-win_amd64.whl", hash = "sha256:5ebac872ba09cb8f2131c46b8739a7ff71de28a24c869bcad554477eb089a878", size = 222017, upload-time = "2025-04-15T17:36:58.576Z" },
+ { url = "https://files.pythonhosted.org/packages/34/f7/44785876384eff370c251d58fd65f6ad7f39adce4a093c934d4a67a7c6b6/contourpy-1.3.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:4caf2bcd2969402bf77edc4cb6034c7dd7c0803213b3523f111eb7460a51b8d2", size = 271580, upload-time = "2025-04-15T17:37:03.105Z" },
+ { url = "https://files.pythonhosted.org/packages/93/3b/0004767622a9826ea3d95f0e9d98cd8729015768075d61f9fea8eeca42a8/contourpy-1.3.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:82199cb78276249796419fe36b7386bd8d2cc3f28b3bc19fe2454fe2e26c4c15", size = 255530, upload-time = "2025-04-15T17:37:07.026Z" },
+ { url = "https://files.pythonhosted.org/packages/e7/bb/7bd49e1f4fa805772d9fd130e0d375554ebc771ed7172f48dfcd4ca61549/contourpy-1.3.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:106fab697af11456fcba3e352ad50effe493a90f893fca6c2ca5c033820cea92", size = 307688, upload-time = "2025-04-15T17:37:11.481Z" },
+ { url = "https://files.pythonhosted.org/packages/fc/97/e1d5dbbfa170725ef78357a9a0edc996b09ae4af170927ba8ce977e60a5f/contourpy-1.3.2-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:d14f12932a8d620e307f715857107b1d1845cc44fdb5da2bc8e850f5ceba9f87", size = 347331, upload-time = "2025-04-15T17:37:18.212Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/66/e69e6e904f5ecf6901be3dd16e7e54d41b6ec6ae3405a535286d4418ffb4/contourpy-1.3.2-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:532fd26e715560721bb0d5fc7610fce279b3699b018600ab999d1be895b09415", size = 318963, upload-time = "2025-04-15T17:37:22.76Z" },
+ { url = "https://files.pythonhosted.org/packages/a8/32/b8a1c8965e4f72482ff2d1ac2cd670ce0b542f203c8e1d34e7c3e6925da7/contourpy-1.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f26b383144cf2d2c29f01a1e8170f50dacf0eac02d64139dcd709a8ac4eb3cfe", size = 323681, upload-time = "2025-04-15T17:37:33.001Z" },
+ { url = "https://files.pythonhosted.org/packages/30/c6/12a7e6811d08757c7162a541ca4c5c6a34c0f4e98ef2b338791093518e40/contourpy-1.3.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:c49f73e61f1f774650a55d221803b101d966ca0c5a2d6d5e4320ec3997489441", size = 1308674, upload-time = "2025-04-15T17:37:48.64Z" },
+ { url = "https://files.pythonhosted.org/packages/2a/8a/bebe5a3f68b484d3a2b8ffaf84704b3e343ef1addea528132ef148e22b3b/contourpy-1.3.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:3d80b2c0300583228ac98d0a927a1ba6a2ba6b8a742463c564f1d419ee5b211e", size = 1380480, upload-time = "2025-04-15T17:38:06.7Z" },
+ { url = "https://files.pythonhosted.org/packages/34/db/fcd325f19b5978fb509a7d55e06d99f5f856294c1991097534360b307cf1/contourpy-1.3.2-cp312-cp312-win32.whl", hash = "sha256:90df94c89a91b7362e1142cbee7568f86514412ab8a2c0d0fca72d7e91b62912", size = 178489, upload-time = "2025-04-15T17:38:10.338Z" },
+ { url = "https://files.pythonhosted.org/packages/01/c8/fadd0b92ffa7b5eb5949bf340a63a4a496a6930a6c37a7ba0f12acb076d6/contourpy-1.3.2-cp312-cp312-win_amd64.whl", hash = "sha256:8c942a01d9163e2e5cfb05cb66110121b8d07ad438a17f9e766317bcb62abf73", size = 223042, upload-time = "2025-04-15T17:38:14.239Z" },
+ { url = "https://files.pythonhosted.org/packages/33/05/b26e3c6ecc05f349ee0013f0bb850a761016d89cec528a98193a48c34033/contourpy-1.3.2-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:fd93cc7f3139b6dd7aab2f26a90dde0aa9fc264dbf70f6740d498a70b860b82c", size = 265681, upload-time = "2025-04-15T17:44:59.314Z" },
+ { url = "https://files.pythonhosted.org/packages/2b/25/ac07d6ad12affa7d1ffed11b77417d0a6308170f44ff20fa1d5aa6333f03/contourpy-1.3.2-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:107ba8a6a7eec58bb475329e6d3b95deba9440667c4d62b9b6063942b61d7f16", size = 315101, upload-time = "2025-04-15T17:45:04.165Z" },
+ { url = "https://files.pythonhosted.org/packages/8f/4d/5bb3192bbe9d3f27e3061a6a8e7733c9120e203cb8515767d30973f71030/contourpy-1.3.2-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:ded1706ed0c1049224531b81128efbd5084598f18d8a2d9efae833edbd2b40ad", size = 220599, upload-time = "2025-04-15T17:45:08.456Z" },
+ { url = "https://files.pythonhosted.org/packages/ff/c0/91f1215d0d9f9f343e4773ba6c9b89e8c0cc7a64a6263f21139da639d848/contourpy-1.3.2-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:5f5964cdad279256c084b69c3f412b7801e15356b16efa9d78aa974041903da0", size = 266807, upload-time = "2025-04-15T17:45:15.535Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/79/6be7e90c955c0487e7712660d6cead01fa17bff98e0ea275737cc2bc8e71/contourpy-1.3.2-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:49b65a95d642d4efa8f64ba12558fcb83407e58a2dfba9d796d77b63ccfcaff5", size = 318729, upload-time = "2025-04-15T17:45:20.166Z" },
+ { url = "https://files.pythonhosted.org/packages/87/68/7f46fb537958e87427d98a4074bcde4b67a70b04900cfc5ce29bc2f556c1/contourpy-1.3.2-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:8c5acb8dddb0752bf252e01a3035b21443158910ac16a3b0d20e7fed7d534ce5", size = 221791, upload-time = "2025-04-15T17:45:24.794Z" },
+]
+
+[[package]]
+name = "contourpy"
+version = "1.3.3"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+]
+dependencies = [
+ { name = "numpy", marker = "python_full_version >= '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/58/01/1253e6698a07380cd31a736d248a3f2a50a7c88779a1813da27503cadc2a/contourpy-1.3.3.tar.gz", hash = "sha256:083e12155b210502d0bca491432bb04d56dc3432f95a979b429f2848c3dbe880", size = 13466174, upload-time = "2025-07-26T12:03:12.549Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/91/2e/c4390a31919d8a78b90e8ecf87cd4b4c4f05a5b48d05ec17db8e5404c6f4/contourpy-1.3.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:709a48ef9a690e1343202916450bc48b9e51c049b089c7f79a267b46cffcdaa1", size = 288773, upload-time = "2025-07-26T12:01:02.277Z" },
+ { url = "https://files.pythonhosted.org/packages/0d/44/c4b0b6095fef4dc9c420e041799591e3b63e9619e3044f7f4f6c21c0ab24/contourpy-1.3.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:23416f38bfd74d5d28ab8429cc4d63fa67d5068bd711a85edb1c3fb0c3e2f381", size = 270149, upload-time = "2025-07-26T12:01:04.072Z" },
+ { url = "https://files.pythonhosted.org/packages/30/2e/dd4ced42fefac8470661d7cb7e264808425e6c5d56d175291e93890cce09/contourpy-1.3.3-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:929ddf8c4c7f348e4c0a5a3a714b5c8542ffaa8c22954862a46ca1813b667ee7", size = 329222, upload-time = "2025-07-26T12:01:05.688Z" },
+ { url = "https://files.pythonhosted.org/packages/f2/74/cc6ec2548e3d276c71389ea4802a774b7aa3558223b7bade3f25787fafc2/contourpy-1.3.3-cp311-cp311-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:9e999574eddae35f1312c2b4b717b7885d4edd6cb46700e04f7f02db454e67c1", size = 377234, upload-time = "2025-07-26T12:01:07.054Z" },
+ { url = "https://files.pythonhosted.org/packages/03/b3/64ef723029f917410f75c09da54254c5f9ea90ef89b143ccadb09df14c15/contourpy-1.3.3-cp311-cp311-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:0bf67e0e3f482cb69779dd3061b534eb35ac9b17f163d851e2a547d56dba0a3a", size = 380555, upload-time = "2025-07-26T12:01:08.801Z" },
+ { url = "https://files.pythonhosted.org/packages/5f/4b/6157f24ca425b89fe2eb7e7be642375711ab671135be21e6faa100f7448c/contourpy-1.3.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:51e79c1f7470158e838808d4a996fa9bac72c498e93d8ebe5119bc1e6becb0db", size = 355238, upload-time = "2025-07-26T12:01:10.319Z" },
+ { url = "https://files.pythonhosted.org/packages/98/56/f914f0dd678480708a04cfd2206e7c382533249bc5001eb9f58aa693e200/contourpy-1.3.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:598c3aaece21c503615fd59c92a3598b428b2f01bfb4b8ca9c4edeecc2438620", size = 1326218, upload-time = "2025-07-26T12:01:12.659Z" },
+ { url = "https://files.pythonhosted.org/packages/fb/d7/4a972334a0c971acd5172389671113ae82aa7527073980c38d5868ff1161/contourpy-1.3.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:322ab1c99b008dad206d406bb61d014cf0174df491ae9d9d0fac6a6fda4f977f", size = 1392867, upload-time = "2025-07-26T12:01:15.533Z" },
+ { url = "https://files.pythonhosted.org/packages/75/3e/f2cc6cd56dc8cff46b1a56232eabc6feea52720083ea71ab15523daab796/contourpy-1.3.3-cp311-cp311-win32.whl", hash = "sha256:fd907ae12cd483cd83e414b12941c632a969171bf90fc937d0c9f268a31cafff", size = 183677, upload-time = "2025-07-26T12:01:17.088Z" },
+ { url = "https://files.pythonhosted.org/packages/98/4b/9bd370b004b5c9d8045c6c33cf65bae018b27aca550a3f657cdc99acdbd8/contourpy-1.3.3-cp311-cp311-win_amd64.whl", hash = "sha256:3519428f6be58431c56581f1694ba8e50626f2dd550af225f82fb5f5814d2a42", size = 225234, upload-time = "2025-07-26T12:01:18.256Z" },
+ { url = "https://files.pythonhosted.org/packages/d9/b6/71771e02c2e004450c12b1120a5f488cad2e4d5b590b1af8bad060360fe4/contourpy-1.3.3-cp311-cp311-win_arm64.whl", hash = "sha256:15ff10bfada4bf92ec8b31c62bf7c1834c244019b4a33095a68000d7075df470", size = 193123, upload-time = "2025-07-26T12:01:19.848Z" },
+ { url = "https://files.pythonhosted.org/packages/be/45/adfee365d9ea3d853550b2e735f9d66366701c65db7855cd07621732ccfc/contourpy-1.3.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b08a32ea2f8e42cf1d4be3169a98dd4be32bafe4f22b6c4cb4ba810fa9e5d2cb", size = 293419, upload-time = "2025-07-26T12:01:21.16Z" },
+ { url = "https://files.pythonhosted.org/packages/53/3e/405b59cfa13021a56bba395a6b3aca8cec012b45bf177b0eaf7a202cde2c/contourpy-1.3.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:556dba8fb6f5d8742f2923fe9457dbdd51e1049c4a43fd3986a0b14a1d815fc6", size = 273979, upload-time = "2025-07-26T12:01:22.448Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/1c/a12359b9b2ca3a845e8f7f9ac08bdf776114eb931392fcad91743e2ea17b/contourpy-1.3.3-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:92d9abc807cf7d0e047b95ca5d957cf4792fcd04e920ca70d48add15c1a90ea7", size = 332653, upload-time = "2025-07-26T12:01:24.155Z" },
+ { url = "https://files.pythonhosted.org/packages/63/12/897aeebfb475b7748ea67b61e045accdfcf0d971f8a588b67108ed7f5512/contourpy-1.3.3-cp312-cp312-manylinux_2_26_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:b2e8faa0ed68cb29af51edd8e24798bb661eac3bd9f65420c1887b6ca89987c8", size = 379536, upload-time = "2025-07-26T12:01:25.91Z" },
+ { url = "https://files.pythonhosted.org/packages/43/8a/a8c584b82deb248930ce069e71576fc09bd7174bbd35183b7943fb1064fd/contourpy-1.3.3-cp312-cp312-manylinux_2_26_s390x.manylinux_2_28_s390x.whl", hash = "sha256:626d60935cf668e70a5ce6ff184fd713e9683fb458898e4249b63be9e28286ea", size = 384397, upload-time = "2025-07-26T12:01:27.152Z" },
+ { url = "https://files.pythonhosted.org/packages/cc/8f/ec6289987824b29529d0dfda0d74a07cec60e54b9c92f3c9da4c0ac732de/contourpy-1.3.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4d00e655fcef08aba35ec9610536bfe90267d7ab5ba944f7032549c55a146da1", size = 362601, upload-time = "2025-07-26T12:01:28.808Z" },
+ { url = "https://files.pythonhosted.org/packages/05/0a/a3fe3be3ee2dceb3e615ebb4df97ae6f3828aa915d3e10549ce016302bd1/contourpy-1.3.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:451e71b5a7d597379ef572de31eeb909a87246974d960049a9848c3bc6c41bf7", size = 1331288, upload-time = "2025-07-26T12:01:31.198Z" },
+ { url = "https://files.pythonhosted.org/packages/33/1d/acad9bd4e97f13f3e2b18a3977fe1b4a37ecf3d38d815333980c6c72e963/contourpy-1.3.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:459c1f020cd59fcfe6650180678a9993932d80d44ccde1fa1868977438f0b411", size = 1403386, upload-time = "2025-07-26T12:01:33.947Z" },
+ { url = "https://files.pythonhosted.org/packages/cf/8f/5847f44a7fddf859704217a99a23a4f6417b10e5ab1256a179264561540e/contourpy-1.3.3-cp312-cp312-win32.whl", hash = "sha256:023b44101dfe49d7d53932be418477dba359649246075c996866106da069af69", size = 185018, upload-time = "2025-07-26T12:01:35.64Z" },
+ { url = "https://files.pythonhosted.org/packages/19/e8/6026ed58a64563186a9ee3f29f41261fd1828f527dd93d33b60feca63352/contourpy-1.3.3-cp312-cp312-win_amd64.whl", hash = "sha256:8153b8bfc11e1e4d75bcb0bff1db232f9e10b274e0929de9d608027e0d34ff8b", size = 226567, upload-time = "2025-07-26T12:01:36.804Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/e2/f05240d2c39a1ed228d8328a78b6f44cd695f7ef47beb3e684cf93604f86/contourpy-1.3.3-cp312-cp312-win_arm64.whl", hash = "sha256:07ce5ed73ecdc4a03ffe3e1b3e3c1166db35ae7584be76f65dbbe28a7791b0cc", size = 193655, upload-time = "2025-07-26T12:01:37.999Z" },
+ { url = "https://files.pythonhosted.org/packages/a5/29/8dcfe16f0107943fa92388c23f6e05cff0ba58058c4c95b00280d4c75a14/contourpy-1.3.3-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:cd5dfcaeb10f7b7f9dc8941717c6c2ade08f587be2226222c12b25f0483ed497", size = 278809, upload-time = "2025-07-26T12:02:52.74Z" },
+ { url = "https://files.pythonhosted.org/packages/85/a9/8b37ef4f7dafeb335daee3c8254645ef5725be4d9c6aa70b50ec46ef2f7e/contourpy-1.3.3-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:0c1fc238306b35f246d61a1d416a627348b5cf0648648a031e14bb8705fcdfe8", size = 261593, upload-time = "2025-07-26T12:02:54.037Z" },
+ { url = "https://files.pythonhosted.org/packages/0a/59/ebfb8c677c75605cc27f7122c90313fd2f375ff3c8d19a1694bda74aaa63/contourpy-1.3.3-pp311-pypy311_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:70f9aad7de812d6541d29d2bbf8feb22ff7e1c299523db288004e3157ff4674e", size = 302202, upload-time = "2025-07-26T12:02:55.947Z" },
+ { url = "https://files.pythonhosted.org/packages/3c/37/21972a15834d90bfbfb009b9d004779bd5a07a0ec0234e5ba8f64d5736f4/contourpy-1.3.3-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5ed3657edf08512fc3fe81b510e35c2012fbd3081d2e26160f27ca28affec989", size = 329207, upload-time = "2025-07-26T12:02:57.468Z" },
+ { url = "https://files.pythonhosted.org/packages/0c/58/bd257695f39d05594ca4ad60df5bcb7e32247f9951fd09a9b8edb82d1daa/contourpy-1.3.3-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:3d1a3799d62d45c18bafd41c5fa05120b96a28079f2393af559b843d1a966a77", size = 225315, upload-time = "2025-07-26T12:02:58.801Z" },
+]
+
+[[package]]
+name = "coverage"
+version = "7.13.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/11/43/3e4ac666cc35f231fa70c94e9f38459299de1a152813f9d2f60fc5f3ecaf/coverage-7.13.3.tar.gz", hash = "sha256:f7f6182d3dfb8802c1747eacbfe611b669455b69b7c037484bb1efbbb56711ac", size = 826832, upload-time = "2026-02-03T14:02:30.944Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ab/07/1c8099563a8a6c389a31c2d0aa1497cee86d6248bb4b9ba5e779215db9f9/coverage-7.13.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:0b4f345f7265cdbdb5ec2521ffff15fa49de6d6c39abf89fc7ad68aa9e3a55f0", size = 219143, upload-time = "2026-02-03T13:59:40.459Z" },
+ { url = "https://files.pythonhosted.org/packages/69/39/a892d44af7aa092cab70e0cc5cdbba18eeccfe1d6930695dab1742eef9e9/coverage-7.13.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:96c3be8bae9d0333e403cc1a8eb078a7f928b5650bae94a18fb4820cc993fb9b", size = 219663, upload-time = "2026-02-03T13:59:41.951Z" },
+ { url = "https://files.pythonhosted.org/packages/9a/25/9669dcf4c2bb4c3861469e6db20e52e8c11908cf53c14ec9b12e9fd4d602/coverage-7.13.3-cp310-cp310-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:d6f4a21328ea49d38565b55599e1c02834e76583a6953e5586d65cb1efebd8f8", size = 246424, upload-time = "2026-02-03T13:59:43.418Z" },
+ { url = "https://files.pythonhosted.org/packages/f3/68/d9766c4e298aca62ea5d9543e1dd1e4e1439d7284815244d8b7db1840bfb/coverage-7.13.3-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:fc970575799a9d17d5c3fafd83a0f6ccf5d5117cdc9ad6fbd791e9ead82418b0", size = 248228, upload-time = "2026-02-03T13:59:44.816Z" },
+ { url = "https://files.pythonhosted.org/packages/f0/e2/eea6cb4a4bd443741adf008d4cccec83a1f75401df59b6559aca2bdd9710/coverage-7.13.3-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:87ff33b652b3556b05e204ae20793d1f872161b0fa5ec8a9ac76f8430e152ed6", size = 250103, upload-time = "2026-02-03T13:59:46.271Z" },
+ { url = "https://files.pythonhosted.org/packages/db/77/664280ecd666c2191610842177e2fab9e5dbdeef97178e2078fed46a3d2c/coverage-7.13.3-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:7df8759ee57b9f3f7b66799b7660c282f4375bef620ade1686d6a7b03699e75f", size = 247107, upload-time = "2026-02-03T13:59:48.53Z" },
+ { url = "https://files.pythonhosted.org/packages/2b/df/2a672eab99e0d0eba52d8a63e47dc92245eee26954d1b2d3c8f7d372151f/coverage-7.13.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:f45c9bcb16bee25a798ccba8a2f6a1251b19de6a0d617bb365d7d2f386c4e20e", size = 248143, upload-time = "2026-02-03T13:59:50.027Z" },
+ { url = "https://files.pythonhosted.org/packages/a5/dc/a104e7a87c13e57a358b8b9199a8955676e1703bb372d79722b54978ae45/coverage-7.13.3-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:318b2e4753cbf611061e01b6cc81477e1cdfeb69c36c4a14e6595e674caadb56", size = 246148, upload-time = "2026-02-03T13:59:52.025Z" },
+ { url = "https://files.pythonhosted.org/packages/2b/89/e113d3a58dc20b03b7e59aed1e53ebc9ca6167f961876443e002b10e3ae9/coverage-7.13.3-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:24db3959de8ee394eeeca89ccb8ba25305c2da9a668dd44173394cbd5aa0777f", size = 246414, upload-time = "2026-02-03T13:59:53.859Z" },
+ { url = "https://files.pythonhosted.org/packages/3f/60/a3fd0a6e8d89b488396019a2268b6a1f25ab56d6d18f3be50f35d77b47dc/coverage-7.13.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:be14d0622125edef21b3a4d8cd2d138c4872bf6e38adc90fd92385e3312f406a", size = 247023, upload-time = "2026-02-03T13:59:55.454Z" },
+ { url = "https://files.pythonhosted.org/packages/19/fa/de4840bb939dbb22ba0648a6d8069fa91c9cf3b3fca8b0d1df461e885b3d/coverage-7.13.3-cp310-cp310-win32.whl", hash = "sha256:53be4aab8ddef18beb6188f3a3fdbf4d1af2277d098d4e618be3a8e6c88e74be", size = 221751, upload-time = "2026-02-03T13:59:57.383Z" },
+ { url = "https://files.pythonhosted.org/packages/de/87/233ff8b7ef62fb63f58c78623b50bef69681111e0c4d43504f422d88cda4/coverage-7.13.3-cp310-cp310-win_amd64.whl", hash = "sha256:bfeee64ad8b4aae3233abb77eb6b52b51b05fa89da9645518671b9939a78732b", size = 222686, upload-time = "2026-02-03T13:59:58.825Z" },
+ { url = "https://files.pythonhosted.org/packages/ec/09/1ac74e37cf45f17eb41e11a21854f7f92a4c2d6c6098ef4a1becb0c6d8d3/coverage-7.13.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:5907605ee20e126eeee2abe14aae137043c2c8af2fa9b38d2ab3b7a6b8137f73", size = 219276, upload-time = "2026-02-03T14:00:00.296Z" },
+ { url = "https://files.pythonhosted.org/packages/2e/cb/71908b08b21beb2c437d0d5870c4ec129c570ca1b386a8427fcdb11cf89c/coverage-7.13.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:a88705500988c8acad8b8fd86c2a933d3aa96bec1ddc4bc5cb256360db7bbd00", size = 219776, upload-time = "2026-02-03T14:00:02.414Z" },
+ { url = "https://files.pythonhosted.org/packages/09/85/c4f3dd69232887666a2c0394d4be21c60ea934d404db068e6c96aa59cd87/coverage-7.13.3-cp311-cp311-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:7bbb5aa9016c4c29e3432e087aa29ebee3f8fda089cfbfb4e6d64bd292dcd1c2", size = 250196, upload-time = "2026-02-03T14:00:04.197Z" },
+ { url = "https://files.pythonhosted.org/packages/9c/cc/560ad6f12010344d0778e268df5ba9aa990aacccc310d478bf82bf3d302c/coverage-7.13.3-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:0c2be202a83dde768937a61cdc5d06bf9fb204048ca199d93479488e6247656c", size = 252111, upload-time = "2026-02-03T14:00:05.639Z" },
+ { url = "https://files.pythonhosted.org/packages/f0/66/3193985fb2c58e91f94cfbe9e21a6fdf941e9301fe2be9e92c072e9c8f8c/coverage-7.13.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0f45e32ef383ce56e0ca099b2e02fcdf7950be4b1b56afaab27b4ad790befe5b", size = 254217, upload-time = "2026-02-03T14:00:07.738Z" },
+ { url = "https://files.pythonhosted.org/packages/c5/78/f0f91556bf1faa416792e537c523c5ef9db9b1d32a50572c102b3d7c45b3/coverage-7.13.3-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:6ed2e787249b922a93cd95c671cc9f4c9797a106e81b455c83a9ddb9d34590c0", size = 250318, upload-time = "2026-02-03T14:00:09.224Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/aa/fc654e45e837d137b2c1f3a2cc09b4aea1e8b015acd2f774fa0f3d2ddeba/coverage-7.13.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:05dd25b21afffe545e808265897c35f32d3e4437663923e0d256d9ab5031fb14", size = 251909, upload-time = "2026-02-03T14:00:10.712Z" },
+ { url = "https://files.pythonhosted.org/packages/73/4d/ab53063992add8a9ca0463c9d92cce5994a29e17affd1c2daa091b922a93/coverage-7.13.3-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:46d29926349b5c4f1ea4fca95e8c892835515f3600995a383fa9a923b5739ea4", size = 249971, upload-time = "2026-02-03T14:00:12.402Z" },
+ { url = "https://files.pythonhosted.org/packages/29/25/83694b81e46fcff9899694a1b6f57573429cdd82b57932f09a698f03eea5/coverage-7.13.3-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:fae6a21537519c2af00245e834e5bf2884699cc7c1055738fd0f9dc37a3644ad", size = 249692, upload-time = "2026-02-03T14:00:13.868Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/ef/d68fc304301f4cb4bf6aefa0045310520789ca38dabdfba9dbecd3f37919/coverage-7.13.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:c672d4e2f0575a4ca2bf2aa0c5ced5188220ab806c1bb6d7179f70a11a017222", size = 250597, upload-time = "2026-02-03T14:00:15.461Z" },
+ { url = "https://files.pythonhosted.org/packages/8d/85/240ad396f914df361d0f71e912ddcedb48130c71b88dc4193fe3c0306f00/coverage-7.13.3-cp311-cp311-win32.whl", hash = "sha256:fcda51c918c7a13ad93b5f89a58d56e3a072c9e0ba5c231b0ed81404bf2648fb", size = 221773, upload-time = "2026-02-03T14:00:17.462Z" },
+ { url = "https://files.pythonhosted.org/packages/2f/71/165b3a6d3d052704a9ab52d11ea64ef3426745de517dda44d872716213a7/coverage-7.13.3-cp311-cp311-win_amd64.whl", hash = "sha256:d1a049b5c51b3b679928dd35e47c4a2235e0b6128b479a7596d0ef5b42fa6301", size = 222711, upload-time = "2026-02-03T14:00:19.449Z" },
+ { url = "https://files.pythonhosted.org/packages/51/d0/0ddc9c5934cdd52639c5df1f1eb0fdab51bb52348f3a8d1c7db9c600d93a/coverage-7.13.3-cp311-cp311-win_arm64.whl", hash = "sha256:79f2670c7e772f4917895c3d89aad59e01f3dbe68a4ed2d0373b431fad1dcfba", size = 221377, upload-time = "2026-02-03T14:00:20.968Z" },
+ { url = "https://files.pythonhosted.org/packages/94/44/330f8e83b143f6668778ed61d17ece9dc48459e9e74669177de02f45fec5/coverage-7.13.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ed48b4170caa2c4420e0cd27dc977caaffc7eecc317355751df8373dddcef595", size = 219441, upload-time = "2026-02-03T14:00:22.585Z" },
+ { url = "https://files.pythonhosted.org/packages/08/e7/29db05693562c2e65bdf6910c0af2fd6f9325b8f43caf7a258413f369e30/coverage-7.13.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:8f2adf4bcffbbec41f366f2e6dffb9d24e8172d16e91da5799c9b7ed6b5716e6", size = 219801, upload-time = "2026-02-03T14:00:24.186Z" },
+ { url = "https://files.pythonhosted.org/packages/90/ae/7f8a78249b02b0818db46220795f8ac8312ea4abd1d37d79ea81db5cae81/coverage-7.13.3-cp312-cp312-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:01119735c690786b6966a1e9f098da4cd7ca9174c4cfe076d04e653105488395", size = 251306, upload-time = "2026-02-03T14:00:25.798Z" },
+ { url = "https://files.pythonhosted.org/packages/62/71/a18a53d1808e09b2e9ebd6b47dad5e92daf4c38b0686b4c4d1b2f3e42b7f/coverage-7.13.3-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:8bb09e83c603f152d855f666d70a71765ca8e67332e5829e62cb9466c176af23", size = 254051, upload-time = "2026-02-03T14:00:27.474Z" },
+ { url = "https://files.pythonhosted.org/packages/4a/0a/eb30f6455d04c5a3396d0696cad2df0269ae7444bb322f86ffe3376f7bf9/coverage-7.13.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b607a40cba795cfac6d130220d25962931ce101f2f478a29822b19755377fb34", size = 255160, upload-time = "2026-02-03T14:00:29.024Z" },
+ { url = "https://files.pythonhosted.org/packages/7b/7e/a45baac86274ce3ed842dbb84f14560c673ad30535f397d89164ec56c5df/coverage-7.13.3-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:44f14a62f5da2e9aedf9080e01d2cda61df39197d48e323538ec037336d68da8", size = 251709, upload-time = "2026-02-03T14:00:30.641Z" },
+ { url = "https://files.pythonhosted.org/packages/c0/df/dd0dc12f30da11349993f3e218901fdf82f45ee44773596050c8f5a1fb25/coverage-7.13.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:debf29e0b157769843dff0981cc76f79e0ed04e36bb773c6cac5f6029054bd8a", size = 253083, upload-time = "2026-02-03T14:00:32.14Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/32/fc764c8389a8ce95cb90eb97af4c32f392ab0ac23ec57cadeefb887188d3/coverage-7.13.3-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:824bb95cd71604031ae9a48edb91fd6effde669522f960375668ed21b36e3ec4", size = 251227, upload-time = "2026-02-03T14:00:34.721Z" },
+ { url = "https://files.pythonhosted.org/packages/dd/ca/d025e9da8f06f24c34d2da9873957cfc5f7e0d67802c3e34d0caa8452130/coverage-7.13.3-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:8f1010029a5b52dc427c8e2a8dbddb2303ddd180b806687d1acd1bb1d06649e7", size = 250794, upload-time = "2026-02-03T14:00:36.278Z" },
+ { url = "https://files.pythonhosted.org/packages/45/c7/76bf35d5d488ec8f68682eb8e7671acc50a6d2d1c1182de1d2b6d4ffad3b/coverage-7.13.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:cd5dee4fd7659d8306ffa79eeaaafd91fa30a302dac3af723b9b469e549247e0", size = 252671, upload-time = "2026-02-03T14:00:38.368Z" },
+ { url = "https://files.pythonhosted.org/packages/bf/10/1921f1a03a7c209e1cb374f81a6b9b68b03cdb3ecc3433c189bc90e2a3d5/coverage-7.13.3-cp312-cp312-win32.whl", hash = "sha256:f7f153d0184d45f3873b3ad3ad22694fd73aadcb8cdbc4337ab4b41ea6b4dff1", size = 221986, upload-time = "2026-02-03T14:00:40.442Z" },
+ { url = "https://files.pythonhosted.org/packages/3c/7c/f5d93297f8e125a80c15545edc754d93e0ed8ba255b65e609b185296af01/coverage-7.13.3-cp312-cp312-win_amd64.whl", hash = "sha256:03a6e5e1e50819d6d7436f5bc40c92ded7e484e400716886ac921e35c133149d", size = 222793, upload-time = "2026-02-03T14:00:42.106Z" },
+ { url = "https://files.pythonhosted.org/packages/43/59/c86b84170015b4555ebabca8649bdf9f4a1f737a73168088385ed0f947c4/coverage-7.13.3-cp312-cp312-win_arm64.whl", hash = "sha256:51c4c42c0e7d09a822b08b6cf79b3c4db8333fffde7450da946719ba0d45730f", size = 221410, upload-time = "2026-02-03T14:00:43.726Z" },
+ { url = "https://files.pythonhosted.org/packages/7d/fb/70af542d2d938c778c9373ce253aa4116dbe7c0a5672f78b2b2ae0e1b94b/coverage-7.13.3-py3-none-any.whl", hash = "sha256:90a8af9dba6429b2573199622d72e0ebf024d6276f16abce394ad4d181bb0910", size = 211237, upload-time = "2026-02-03T14:02:27.986Z" },
+]
+
+[package.optional-dependencies]
+toml = [
+ { name = "tomli", marker = "python_full_version <= '3.11'" },
+]
+
+[[package]]
+name = "cuda-bindings"
+version = "13.2.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "cuda-pathfinder" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1a/fe/7351d7e586a8b4c9f89731bfe4cf0148223e8f9903ff09571f78b3fb0682/cuda_bindings-13.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:08b395f79cb89ce0cd8effff07c4a1e20101b873c256a1aeb286e8fd7bd0f556", size = 5744254, upload-time = "2026-03-11T00:12:29.798Z" },
+ { url = "https://files.pythonhosted.org/packages/aa/ef/184aa775e970fc089942cd9ec6302e6e44679d4c14549c6a7ea45bf7f798/cuda_bindings-13.2.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d6f3682ec3c4769326aafc67c2ba669d97d688d0b7e63e659d36d2f8b72f32d6", size = 6329075, upload-time = "2026-03-11T00:12:32.319Z" },
+ { url = "https://files.pythonhosted.org/packages/e0/a9/3a8241c6e19483ac1f1dcf5c10238205dcb8a6e9d0d4d4709240dff28ff4/cuda_bindings-13.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:721104c603f059780d287969be3d194a18d0cc3b713ed9049065a1107706759d", size = 5730273, upload-time = "2026-03-11T00:12:37.18Z" },
+ { url = "https://files.pythonhosted.org/packages/e9/94/2748597f47bb1600cd466b20cab4159f1530a3a33fe7f70fee199b3abb9e/cuda_bindings-13.2.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1eba9504ac70667dd48313395fe05157518fd6371b532790e96fbb31bbb5a5e1", size = 6313924, upload-time = "2026-03-11T00:12:39.462Z" },
+ { url = "https://files.pythonhosted.org/packages/52/c8/b2589d68acf7e3d63e2be330b84bc25712e97ed799affbca7edd7eae25d6/cuda_bindings-13.2.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e865447abfb83d6a98ad5130ed3c70b1fc295ae3eeee39fd07b4ddb0671b6788", size = 5722404, upload-time = "2026-03-11T00:12:44.041Z" },
+ { url = "https://files.pythonhosted.org/packages/1f/92/f899f7bbb5617bb65ec52a6eac1e9a1447a86b916c4194f8a5001b8cde0c/cuda_bindings-13.2.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:46d8776a55d6d5da9dd6e9858fba2efcda2abe6743871dee47dd06eb8cb6d955", size = 6320619, upload-time = "2026-03-11T00:12:45.939Z" },
+]
+
+[[package]]
+name = "cuda-pathfinder"
+version = "1.3.3"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/0b/02/4dbe7568a42e46582248942f54dc64ad094769532adbe21e525e4edf7bc4/cuda_pathfinder-1.3.3-py3-none-any.whl", hash = "sha256:9984b664e404f7c134954a771be8775dfd6180ea1e1aef4a5a37d4be05d9bbb1", size = 27154, upload-time = "2025-12-04T22:35:08.996Z" },
+]
+
+[[package]]
+name = "cuda-toolkit"
+version = "13.0.2"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/57/b2/453099f5f3b698d7d0eab38916aac44c7f76229f451709e2eb9db6615dcd/cuda_toolkit-13.0.2-py2.py3-none-any.whl", hash = "sha256:b198824cf2f54003f50d64ada3a0f184b42ca0846c1c94192fa269ecd97a66eb", size = 2364, upload-time = "2025-12-19T23:24:07.328Z" },
+]
+
+[package.optional-dependencies]
+cublas = [
+ { name = "nvidia-cublas", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+cudart = [
+ { name = "nvidia-cuda-runtime", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+cufft = [
+ { name = "nvidia-cufft", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+cufile = [
+ { name = "nvidia-cufile", marker = "sys_platform == 'linux'" },
+]
+cupti = [
+ { name = "nvidia-cuda-cupti", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+curand = [
+ { name = "nvidia-curand", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+cusolver = [
+ { name = "nvidia-cusolver", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+cusparse = [
+ { name = "nvidia-cusparse", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+nvjitlink = [
+ { name = "nvidia-nvjitlink", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+nvrtc = [
+ { name = "nvidia-cuda-nvrtc", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+nvtx = [
+ { name = "nvidia-nvtx", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
+]
+
+[[package]]
+name = "cycler"
+version = "0.12.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/a9/95/a3dbbb5028f35eafb79008e7522a75244477d2838f38cbb722248dabc2a8/cycler-0.12.1.tar.gz", hash = "sha256:88bb128f02ba341da8ef447245a9e138fae777f6a23943da4540077d3601eb1c", size = 7615, upload-time = "2023-10-07T05:32:18.335Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e7/05/c19819d5e3d95294a6f5947fb9b9629efb316b96de511b418c53d245aae6/cycler-0.12.1-py3-none-any.whl", hash = "sha256:85cef7cff222d8644161529808465972e51340599459b8ac3ccbac5a854e0d30", size = 8321, upload-time = "2023-10-07T05:32:16.783Z" },
+]
+
+[[package]]
+name = "debugpy"
+version = "1.8.20"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e0/b7/cd8080344452e4874aae67c40d8940e2b4d47b01601a8fd9f44786c757c7/debugpy-1.8.20.tar.gz", hash = "sha256:55bc8701714969f1ab89a6d5f2f3d40c36f91b2cbe2f65d98bf8196f6a6a2c33", size = 1645207, upload-time = "2026-01-29T23:03:28.199Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/71/be/8bd693a0b9d53d48c8978fa5d889e06f3b5b03e45fd1ea1e78267b4887cb/debugpy-1.8.20-cp310-cp310-macosx_15_0_x86_64.whl", hash = "sha256:157e96ffb7f80b3ad36d808646198c90acb46fdcfd8bb1999838f0b6f2b59c64", size = 2099192, upload-time = "2026-01-29T23:03:29.707Z" },
+ { url = "https://files.pythonhosted.org/packages/77/1b/85326d07432086a06361d493d2743edd0c4fc2ef62162be7f8618441ac37/debugpy-1.8.20-cp310-cp310-manylinux_2_34_x86_64.whl", hash = "sha256:c1178ae571aff42e61801a38b007af504ec8e05fde1c5c12e5a7efef21009642", size = 3088568, upload-time = "2026-01-29T23:03:31.467Z" },
+ { url = "https://files.pythonhosted.org/packages/e8/60/3e08462ee3eccd10998853eb35947c416e446bfe2bc37dbb886b9044586c/debugpy-1.8.20-cp310-cp310-win32.whl", hash = "sha256:c29dd9d656c0fbd77906a6e6a82ae4881514aa3294b94c903ff99303e789b4a2", size = 5284399, upload-time = "2026-01-29T23:03:33.678Z" },
+ { url = "https://files.pythonhosted.org/packages/72/43/09d49106e770fe558ced5e80df2e3c2ebee10e576eda155dcc5670473663/debugpy-1.8.20-cp310-cp310-win_amd64.whl", hash = "sha256:3ca85463f63b5dd0aa7aaa933d97cbc47c174896dcae8431695872969f981893", size = 5316388, upload-time = "2026-01-29T23:03:35.095Z" },
+ { url = "https://files.pythonhosted.org/packages/51/56/c3baf5cbe4dd77427fd9aef99fcdade259ad128feeb8a786c246adb838e5/debugpy-1.8.20-cp311-cp311-macosx_15_0_universal2.whl", hash = "sha256:eada6042ad88fa1571b74bd5402ee8b86eded7a8f7b827849761700aff171f1b", size = 2208318, upload-time = "2026-01-29T23:03:36.481Z" },
+ { url = "https://files.pythonhosted.org/packages/9a/7d/4fa79a57a8e69fe0d9763e98d1110320f9ecd7f1f362572e3aafd7417c9d/debugpy-1.8.20-cp311-cp311-manylinux_2_34_x86_64.whl", hash = "sha256:7de0b7dfeedc504421032afba845ae2a7bcc32ddfb07dae2c3ca5442f821c344", size = 3171493, upload-time = "2026-01-29T23:03:37.775Z" },
+ { url = "https://files.pythonhosted.org/packages/7d/f2/1e8f8affe51e12a26f3a8a8a4277d6e60aa89d0a66512f63b1e799d424a4/debugpy-1.8.20-cp311-cp311-win32.whl", hash = "sha256:773e839380cf459caf73cc533ea45ec2737a5cc184cf1b3b796cd4fd98504fec", size = 5209240, upload-time = "2026-01-29T23:03:39.109Z" },
+ { url = "https://files.pythonhosted.org/packages/d5/92/1cb532e88560cbee973396254b21bece8c5d7c2ece958a67afa08c9f10dc/debugpy-1.8.20-cp311-cp311-win_amd64.whl", hash = "sha256:1f7650546e0eded1902d0f6af28f787fa1f1dbdbc97ddabaf1cd963a405930cb", size = 5233481, upload-time = "2026-01-29T23:03:40.659Z" },
+ { url = "https://files.pythonhosted.org/packages/14/57/7f34f4736bfb6e00f2e4c96351b07805d83c9a7b33d28580ae01374430f7/debugpy-1.8.20-cp312-cp312-macosx_15_0_universal2.whl", hash = "sha256:4ae3135e2089905a916909ef31922b2d733d756f66d87345b3e5e52b7a55f13d", size = 2550686, upload-time = "2026-01-29T23:03:42.023Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/78/b193a3975ca34458f6f0e24aaf5c3e3da72f5401f6054c0dfd004b41726f/debugpy-1.8.20-cp312-cp312-manylinux_2_34_x86_64.whl", hash = "sha256:88f47850a4284b88bd2bfee1f26132147d5d504e4e86c22485dfa44b97e19b4b", size = 4310588, upload-time = "2026-01-29T23:03:43.314Z" },
+ { url = "https://files.pythonhosted.org/packages/c1/55/f14deb95eaf4f30f07ef4b90a8590fc05d9e04df85ee379712f6fb6736d7/debugpy-1.8.20-cp312-cp312-win32.whl", hash = "sha256:4057ac68f892064e5f98209ab582abfee3b543fb55d2e87610ddc133a954d390", size = 5331372, upload-time = "2026-01-29T23:03:45.526Z" },
+ { url = "https://files.pythonhosted.org/packages/a1/39/2bef246368bd42f9bd7cba99844542b74b84dacbdbea0833e610f384fee8/debugpy-1.8.20-cp312-cp312-win_amd64.whl", hash = "sha256:a1a8f851e7cf171330679ef6997e9c579ef6dd33c9098458bd9986a0f4ca52e3", size = 5372835, upload-time = "2026-01-29T23:03:47.245Z" },
+ { url = "https://files.pythonhosted.org/packages/e0/c3/7f67dea8ccf8fdcb9c99033bbe3e90b9e7395415843accb81428c441be2d/debugpy-1.8.20-py2.py3-none-any.whl", hash = "sha256:5be9bed9ae3be00665a06acaa48f8329d2b9632f15fd09f6a9a8c8d9907e54d7", size = 5337658, upload-time = "2026-01-29T23:04:17.404Z" },
+]
+
+[[package]]
+name = "decorator"
+version = "5.2.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/43/fa/6d96a0978d19e17b68d634497769987b16c8f4cd0a7a05048bec693caa6b/decorator-5.2.1.tar.gz", hash = "sha256:65f266143752f734b0a7cc83c46f4618af75b8c5911b00ccb61d0ac9b6da0360", size = 56711, upload-time = "2025-02-24T04:41:34.073Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/4e/8c/f3147f5c4b73e7550fe5f9352eaa956ae838d5c51eb58e7a25b9f3e2643b/decorator-5.2.1-py3-none-any.whl", hash = "sha256:d316bb415a2d9e2d2b3abcc4084c6502fc09240e292cd76a76afc106a1c8e04a", size = 9190, upload-time = "2025-02-24T04:41:32.565Z" },
+]
+
+[[package]]
+name = "decord"
+version = "0.6.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "numpy" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/11/79/936af42edf90a7bd4e41a6cac89c913d4b47fa48a26b042d5129a9242ee3/decord-0.6.0-py3-none-manylinux2010_x86_64.whl", hash = "sha256:51997f20be8958e23b7c4061ba45d0efcd86bffd5fe81c695d0befee0d442976", size = 13602299, upload-time = "2021-06-14T21:30:55.486Z" },
+ { url = "https://files.pythonhosted.org/packages/6c/be/e15b5b866da452e62635a7b27513f31cb581fa2ea9cc9b768b535d62a955/decord-0.6.0-py3-none-win_amd64.whl", hash = "sha256:02665d7c4f1193a330205a791bc128f7e108eb6ae5b67144437a02f700943bad", size = 24733380, upload-time = "2021-06-14T21:30:57.766Z" },
+]
+
+[[package]]
+name = "defusedxml"
+version = "0.7.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/0f/d5/c66da9b79e5bdb124974bfe172b4daf3c984ebd9c2a06e2b8a4dc7331c72/defusedxml-0.7.1.tar.gz", hash = "sha256:1bb3032db185915b62d7c6209c5a8792be6a32ab2fedacc84e01b52c51aa3e69", size = 75520, upload-time = "2021-03-08T10:59:26.269Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/07/6c/aa3f2f849e01cb6a001cd8554a88d4c77c5c1a31c95bdf1cf9301e6d9ef4/defusedxml-0.7.1-py2.py3-none-any.whl", hash = "sha256:a352e7e428770286cc899e2542b6cdaedb2b4953ff269a210103ec58f6198a61", size = 25604, upload-time = "2021-03-08T10:59:24.45Z" },
+]
+
+[[package]]
+name = "einops"
+version = "0.8.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/2c/77/850bef8d72ffb9219f0b1aac23fbc1bf7d038ee6ea666f331fa273031aa2/einops-0.8.2.tar.gz", hash = "sha256:609da665570e5e265e27283aab09e7f279ade90c4f01bcfca111f3d3e13f2827", size = 56261, upload-time = "2026-01-26T04:13:17.638Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/2a/09/f8d8f8f31e4483c10a906437b4ce31bdf3d6d417b73fe33f1a8b59e34228/einops-0.8.2-py3-none-any.whl", hash = "sha256:54058201ac7087911181bfec4af6091bb59380360f069276601256a76af08193", size = 65638, upload-time = "2026-01-26T04:13:18.546Z" },
+]
+
+[[package]]
+name = "exceptiongroup"
+version = "1.3.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "typing-extensions", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/50/79/66800aadf48771f6b62f7eb014e352e5d06856655206165d775e675a02c9/exceptiongroup-1.3.1.tar.gz", hash = "sha256:8b412432c6055b0b7d14c310000ae93352ed6754f70fa8f7c34141f91c4e3219", size = 30371, upload-time = "2025-11-21T23:01:54.787Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/8a/0e/97c33bf5009bdbac74fd2beace167cab3f978feb69cc36f1ef79360d6c4e/exceptiongroup-1.3.1-py3-none-any.whl", hash = "sha256:a7a39a3bd276781e98394987d3a5701d0c4edffb633bb7a5144577f82c773598", size = 16740, upload-time = "2025-11-21T23:01:53.443Z" },
+]
+
+[[package]]
+name = "executing"
+version = "2.2.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/cc/28/c14e053b6762b1044f34a13aab6859bbf40456d37d23aa286ac24cfd9a5d/executing-2.2.1.tar.gz", hash = "sha256:3632cc370565f6648cc328b32435bd120a1e4ebb20c77e3fdde9a13cd1e533c4", size = 1129488, upload-time = "2025-09-01T09:48:10.866Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c1/ea/53f2148663b321f21b5a606bd5f191517cf40b7072c0497d3c92c4a13b1e/executing-2.2.1-py2.py3-none-any.whl", hash = "sha256:760643d3452b4d777d295bb167ccc74c64a81df23fb5e08eff250c425a4b2017", size = 28317, upload-time = "2025-09-01T09:48:08.5Z" },
+]
+
+[[package]]
+name = "fairscale"
+version = "0.4.13"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "numpy" },
+ { name = "torch" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c1/08/b3334d7b543ac10dcb129cef4f84723ab696725512f18d69ab3a784b0bf5/fairscale-0.4.13.tar.gz", hash = "sha256:1b797825c427f5dba92253fd0d8daa574e8bd651a2423497775fab1b30cfb768", size = 266261, upload-time = "2022-12-11T18:09:16.892Z" }
+
+[[package]]
+name = "fastjsonschema"
+version = "2.21.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/20/b5/23b216d9d985a956623b6bd12d4086b60f0059b27799f23016af04a74ea1/fastjsonschema-2.21.2.tar.gz", hash = "sha256:b1eb43748041c880796cd077f1a07c3d94e93ae84bba5ed36800a33554ae05de", size = 374130, upload-time = "2025-08-14T18:49:36.666Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/cb/a8/20d0723294217e47de6d9e2e40fd4a9d2f7c4b6ef974babd482a59743694/fastjsonschema-2.21.2-py3-none-any.whl", hash = "sha256:1c797122d0a86c5cace2e54bf4e819c36223b552017172f32c5c024a6b77e463", size = 24024, upload-time = "2025-08-14T18:49:34.776Z" },
+]
+
+[[package]]
+name = "filelock"
+version = "3.20.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/1d/65/ce7f1b70157833bf3cb851b556a37d4547ceafc158aa9b34b36782f23696/filelock-3.20.3.tar.gz", hash = "sha256:18c57ee915c7ec61cff0ecf7f0f869936c7c30191bb0cf406f1341778d0834e1", size = 19485, upload-time = "2026-01-09T17:55:05.421Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/b5/36/7fb70f04bf00bc646cd5bb45aa9eddb15e19437a28b8fb2b4a5249fac770/filelock-3.20.3-py3-none-any.whl", hash = "sha256:4b0dda527ee31078689fc205ec4f1c1bf7d56cf88b6dc9426c4f230e46c2dce1", size = 16701, upload-time = "2026-01-09T17:55:04.334Z" },
+]
+
+[[package]]
+name = "fonttools"
+version = "4.61.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ec/ca/cf17b88a8df95691275a3d77dc0a5ad9907f328ae53acbe6795da1b2f5ed/fonttools-4.61.1.tar.gz", hash = "sha256:6675329885c44657f826ef01d9e4fb33b9158e9d93c537d84ad8399539bc6f69", size = 3565756, upload-time = "2025-12-12T17:31:24.246Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/5b/94/8a28707adb00bed1bf22dac16ccafe60faf2ade353dcb32c3617ee917307/fonttools-4.61.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:7c7db70d57e5e1089a274cbb2b1fd635c9a24de809a231b154965d415d6c6d24", size = 2854799, upload-time = "2025-12-12T17:29:27.5Z" },
+ { url = "https://files.pythonhosted.org/packages/94/93/c2e682faaa5ee92034818d8f8a8145ae73eb83619600495dcf8503fa7771/fonttools-4.61.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:5fe9fd43882620017add5eabb781ebfbc6998ee49b35bd7f8f79af1f9f99a958", size = 2403032, upload-time = "2025-12-12T17:29:30.115Z" },
+ { url = "https://files.pythonhosted.org/packages/f1/62/1748f7e7e1ee41aa52279fd2e3a6d0733dc42a673b16932bad8e5d0c8b28/fonttools-4.61.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d8db08051fc9e7d8bc622f2112511b8107d8f27cd89e2f64ec45e9825e8288da", size = 4897863, upload-time = "2025-12-12T17:29:32.535Z" },
+ { url = "https://files.pythonhosted.org/packages/69/69/4ca02ee367d2c98edcaeb83fc278d20972502ee071214ad9d8ca85e06080/fonttools-4.61.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:a76d4cb80f41ba94a6691264be76435e5f72f2cb3cab0b092a6212855f71c2f6", size = 4859076, upload-time = "2025-12-12T17:29:34.907Z" },
+ { url = "https://files.pythonhosted.org/packages/8c/f5/660f9e3cefa078861a7f099107c6d203b568a6227eef163dd173bfc56bdc/fonttools-4.61.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:a13fc8aeb24bad755eea8f7f9d409438eb94e82cf86b08fe77a03fbc8f6a96b1", size = 4875623, upload-time = "2025-12-12T17:29:37.33Z" },
+ { url = "https://files.pythonhosted.org/packages/63/d1/9d7c5091d2276ed47795c131c1bf9316c3c1ab2789c22e2f59e0572ccd38/fonttools-4.61.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:b846a1fcf8beadeb9ea4f44ec5bdde393e2f1569e17d700bfc49cd69bde75881", size = 4993327, upload-time = "2025-12-12T17:29:39.781Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/2d/28def73837885ae32260d07660a052b99f0aa00454867d33745dfe49dbf0/fonttools-4.61.1-cp310-cp310-win32.whl", hash = "sha256:78a7d3ab09dc47ac1a363a493e6112d8cabed7ba7caad5f54dbe2f08676d1b47", size = 1502180, upload-time = "2025-12-12T17:29:42.217Z" },
+ { url = "https://files.pythonhosted.org/packages/63/fa/bfdc98abb4dd2bd491033e85e3ba69a2313c850e759a6daa014bc9433b0f/fonttools-4.61.1-cp310-cp310-win_amd64.whl", hash = "sha256:eff1ac3cc66c2ac7cda1e64b4e2f3ffef474b7335f92fc3833fc632d595fcee6", size = 1550654, upload-time = "2025-12-12T17:29:44.564Z" },
+ { url = "https://files.pythonhosted.org/packages/69/12/bf9f4eaa2fad039356cc627587e30ed008c03f1cebd3034376b5ee8d1d44/fonttools-4.61.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:c6604b735bb12fef8e0efd5578c9fb5d3d8532d5001ea13a19cddf295673ee09", size = 2852213, upload-time = "2025-12-12T17:29:46.675Z" },
+ { url = "https://files.pythonhosted.org/packages/ac/49/4138d1acb6261499bedde1c07f8c2605d1d8f9d77a151e5507fd3ef084b6/fonttools-4.61.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:5ce02f38a754f207f2f06557523cd39a06438ba3aafc0639c477ac409fc64e37", size = 2401689, upload-time = "2025-12-12T17:29:48.769Z" },
+ { url = "https://files.pythonhosted.org/packages/e5/fe/e6ce0fe20a40e03aef906af60aa87668696f9e4802fa283627d0b5ed777f/fonttools-4.61.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:77efb033d8d7ff233385f30c62c7c79271c8885d5c9657d967ede124671bbdfb", size = 5058809, upload-time = "2025-12-12T17:29:51.701Z" },
+ { url = "https://files.pythonhosted.org/packages/79/61/1ca198af22f7dd22c17ab86e9024ed3c06299cfdb08170640e9996d501a0/fonttools-4.61.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:75c1a6dfac6abd407634420c93864a1e274ebc1c7531346d9254c0d8f6ca00f9", size = 5036039, upload-time = "2025-12-12T17:29:53.659Z" },
+ { url = "https://files.pythonhosted.org/packages/99/cc/fa1801e408586b5fce4da9f5455af8d770f4fc57391cd5da7256bb364d38/fonttools-4.61.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:0de30bfe7745c0d1ffa2b0b7048fb7123ad0d71107e10ee090fa0b16b9452e87", size = 5034714, upload-time = "2025-12-12T17:29:55.592Z" },
+ { url = "https://files.pythonhosted.org/packages/bf/aa/b7aeafe65adb1b0a925f8f25725e09f078c635bc22754f3fecb7456955b0/fonttools-4.61.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:58b0ee0ab5b1fc9921eccfe11d1435added19d6494dde14e323f25ad2bc30c56", size = 5158648, upload-time = "2025-12-12T17:29:57.861Z" },
+ { url = "https://files.pythonhosted.org/packages/99/f9/08ea7a38663328881384c6e7777bbefc46fd7d282adfd87a7d2b84ec9d50/fonttools-4.61.1-cp311-cp311-win32.whl", hash = "sha256:f79b168428351d11e10c5aeb61a74e1851ec221081299f4cf56036a95431c43a", size = 2280681, upload-time = "2025-12-12T17:29:59.943Z" },
+ { url = "https://files.pythonhosted.org/packages/07/ad/37dd1ae5fa6e01612a1fbb954f0927681f282925a86e86198ccd7b15d515/fonttools-4.61.1-cp311-cp311-win_amd64.whl", hash = "sha256:fe2efccb324948a11dd09d22136fe2ac8a97d6c1347cf0b58a911dcd529f66b7", size = 2331951, upload-time = "2025-12-12T17:30:02.254Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/16/7decaa24a1bd3a70c607b2e29f0adc6159f36a7e40eaba59846414765fd4/fonttools-4.61.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:f3cb4a569029b9f291f88aafc927dd53683757e640081ca8c412781ea144565e", size = 2851593, upload-time = "2025-12-12T17:30:04.225Z" },
+ { url = "https://files.pythonhosted.org/packages/94/98/3c4cb97c64713a8cf499b3245c3bf9a2b8fd16a3e375feff2aed78f96259/fonttools-4.61.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:41a7170d042e8c0024703ed13b71893519a1a6d6e18e933e3ec7507a2c26a4b2", size = 2400231, upload-time = "2025-12-12T17:30:06.47Z" },
+ { url = "https://files.pythonhosted.org/packages/b7/37/82dbef0f6342eb01f54bca073ac1498433d6ce71e50c3c3282b655733b31/fonttools-4.61.1-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:10d88e55330e092940584774ee5e8a6971b01fc2f4d3466a1d6c158230880796", size = 4954103, upload-time = "2025-12-12T17:30:08.432Z" },
+ { url = "https://files.pythonhosted.org/packages/6c/44/f3aeac0fa98e7ad527f479e161aca6c3a1e47bb6996b053d45226fe37bf2/fonttools-4.61.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:15acc09befd16a0fb8a8f62bc147e1a82817542d72184acca9ce6e0aeda9fa6d", size = 5004295, upload-time = "2025-12-12T17:30:10.56Z" },
+ { url = "https://files.pythonhosted.org/packages/14/e8/7424ced75473983b964d09f6747fa09f054a6d656f60e9ac9324cf40c743/fonttools-4.61.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e6bcdf33aec38d16508ce61fd81838f24c83c90a1d1b8c68982857038673d6b8", size = 4944109, upload-time = "2025-12-12T17:30:12.874Z" },
+ { url = "https://files.pythonhosted.org/packages/c8/8b/6391b257fa3d0b553d73e778f953a2f0154292a7a7a085e2374b111e5410/fonttools-4.61.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:5fade934607a523614726119164ff621e8c30e8fa1ffffbbd358662056ba69f0", size = 5093598, upload-time = "2025-12-12T17:30:15.79Z" },
+ { url = "https://files.pythonhosted.org/packages/d9/71/fd2ea96cdc512d92da5678a1c98c267ddd4d8c5130b76d0f7a80f9a9fde8/fonttools-4.61.1-cp312-cp312-win32.whl", hash = "sha256:75da8f28eff26defba42c52986de97b22106cb8f26515b7c22443ebc9c2d3261", size = 2269060, upload-time = "2025-12-12T17:30:18.058Z" },
+ { url = "https://files.pythonhosted.org/packages/80/3b/a3e81b71aed5a688e89dfe0e2694b26b78c7d7f39a5ffd8a7d75f54a12a8/fonttools-4.61.1-cp312-cp312-win_amd64.whl", hash = "sha256:497c31ce314219888c0e2fce5ad9178ca83fe5230b01a5006726cdf3ac9f24d9", size = 2319078, upload-time = "2025-12-12T17:30:22.862Z" },
+ { url = "https://files.pythonhosted.org/packages/c7/4e/ce75a57ff3aebf6fc1f4e9d508b8e5810618a33d900ad6c19eb30b290b97/fonttools-4.61.1-py3-none-any.whl", hash = "sha256:17d2bf5d541add43822bcf0c43d7d847b160c9bb01d15d5007d84e2217aaa371", size = 1148996, upload-time = "2025-12-12T17:31:21.03Z" },
+]
+
+[[package]]
+name = "fqdn"
+version = "1.5.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/30/3e/a80a8c077fd798951169626cde3e239adeba7dab75deb3555716415bd9b0/fqdn-1.5.1.tar.gz", hash = "sha256:105ed3677e767fb5ca086a0c1f4bb66ebc3c100be518f0e0d755d9eae164d89f", size = 6015, upload-time = "2021-03-11T07:16:29.08Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/cf/58/8acf1b3e91c58313ce5cb67df61001fc9dcd21be4fadb76c1a2d540e09ed/fqdn-1.5.1-py3-none-any.whl", hash = "sha256:3a179af3761e4df6eb2e026ff9e1a3033d3587bf980a0b1b2e1e5d08d7358014", size = 9121, upload-time = "2021-03-11T07:16:28.351Z" },
+]
+
+[[package]]
+name = "fsspec"
+version = "2026.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d5/7d/5df2650c57d47c57232af5ef4b4fdbff182070421e405e0d62c6cdbfaa87/fsspec-2026.1.0.tar.gz", hash = "sha256:e987cb0496a0d81bba3a9d1cee62922fb395e7d4c3b575e57f547953334fe07b", size = 310496, upload-time = "2026-01-09T15:21:35.562Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/01/c9/97cc5aae1648dcb851958a3ddf73ccd7dbe5650d95203ecb4d7720b4cdbf/fsspec-2026.1.0-py3-none-any.whl", hash = "sha256:cb76aa913c2285a3b49bdd5fc55b1d7c708d7208126b60f2eb8194fe1b4cbdcc", size = 201838, upload-time = "2026-01-09T15:21:34.041Z" },
+]
+
+[[package]]
+name = "ftfy"
+version = "6.1.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "wcwidth" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/97/16/79c6e17bd3465f6498282dd23813846c68cd0989fe60bfef68bb1918d041/ftfy-6.1.1.tar.gz", hash = "sha256:bfc2019f84fcd851419152320a6375604a0f1459c281b5b199b2cd0d2e727f8f", size = 63156, upload-time = "2022-02-09T19:44:17.423Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e1/1e/bf736f9576a8979752b826b75cbd83663ff86634ea3055a766e2d8ad3ee5/ftfy-6.1.1-py3-none-any.whl", hash = "sha256:0ffd33fce16b54cccaec78d6ec73d95ad370e5df5a25255c8966a6147bd667ca", size = 53098, upload-time = "2022-02-09T19:44:15.655Z" },
+]
+
+[[package]]
+name = "fvcore"
+version = "0.1.5.post20221221"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "iopath" },
+ { name = "numpy" },
+ { name = "pillow" },
+ { name = "pyyaml" },
+ { name = "tabulate" },
+ { name = "termcolor" },
+ { name = "tqdm" },
+ { name = "yacs" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a5/93/d056a9c4efc6c79ba7b5159cc66bb436db93d2cc46dca18ed65c59cc8e4e/fvcore-0.1.5.post20221221.tar.gz", hash = "sha256:f2fb0bb90572ae651c11c78e20493ed19b2240550a7e4bbb2d6de87bdd037860", size = 50217, upload-time = "2022-12-21T08:10:53.563Z" }
+
+[[package]]
+name = "gitdb"
+version = "4.0.12"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "smmap" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/72/94/63b0fc47eb32792c7ba1fe1b694daec9a63620db1e313033d18140c2320a/gitdb-4.0.12.tar.gz", hash = "sha256:5ef71f855d191a3326fcfbc0d5da835f26b13fbcba60c32c21091c349ffdb571", size = 394684, upload-time = "2025-01-02T07:20:46.413Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/a0/61/5c78b91c3143ed5c14207f463aecfc8f9dbb5092fb2869baf37c273b2705/gitdb-4.0.12-py3-none-any.whl", hash = "sha256:67073e15955400952c6565cc3e707c554a4eea2e428946f7a4c162fab9bd9bcf", size = 62794, upload-time = "2025-01-02T07:20:43.624Z" },
+]
+
+[[package]]
+name = "gitpython"
+version = "3.1.31"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "gitdb" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5f/11/2b0f60686dbda49028cec8c66bd18a5e82c96d92eef4bc34961e35bb3762/GitPython-3.1.31.tar.gz", hash = "sha256:8ce3bcf69adfdf7c7d503e78fd3b1c492af782d58893b650adb2ac8912ddd573", size = 195822, upload-time = "2023-02-16T16:33:16.327Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9e/8a/d1e02cc111d65b0346f70abb83c51f8593e7134bf694a4a56d1a470caaf7/GitPython-3.1.31-py3-none-any.whl", hash = "sha256:f04893614f6aa713a60cbbe1e6a97403ef633103cdd0ef5eb6efe0deb98dbe8d", size = 184332, upload-time = "2023-02-16T16:33:14.13Z" },
+]
+
+[[package]]
+name = "grpcio"
+version = "1.76.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b6/e0/318c1ce3ae5a17894d5791e87aea147587c9e702f24122cc7a5c8bbaeeb1/grpcio-1.76.0.tar.gz", hash = "sha256:7be78388d6da1a25c0d5ec506523db58b18be22d9c37d8d3a32c08be4987bd73", size = 12785182, upload-time = "2025-10-21T16:23:12.106Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/88/17/ff4795dc9a34b6aee6ec379f1b66438a3789cd1315aac0cbab60d92f74b3/grpcio-1.76.0-cp310-cp310-linux_armv7l.whl", hash = "sha256:65a20de41e85648e00305c1bb09a3598f840422e522277641145a32d42dcefcc", size = 5840037, upload-time = "2025-10-21T16:20:25.069Z" },
+ { url = "https://files.pythonhosted.org/packages/4e/ff/35f9b96e3fa2f12e1dcd58a4513a2e2294a001d64dec81677361b7040c9a/grpcio-1.76.0-cp310-cp310-macosx_11_0_universal2.whl", hash = "sha256:40ad3afe81676fd9ec6d9d406eda00933f218038433980aa19d401490e46ecde", size = 11836482, upload-time = "2025-10-21T16:20:30.113Z" },
+ { url = "https://files.pythonhosted.org/packages/3e/1c/8374990f9545e99462caacea5413ed783014b3b66ace49e35c533f07507b/grpcio-1.76.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:035d90bc79eaa4bed83f524331d55e35820725c9fbb00ffa1904d5550ed7ede3", size = 6407178, upload-time = "2025-10-21T16:20:32.733Z" },
+ { url = "https://files.pythonhosted.org/packages/1e/77/36fd7d7c75a6c12542c90a6d647a27935a1ecaad03e0ffdb7c42db6b04d2/grpcio-1.76.0-cp310-cp310-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:4215d3a102bd95e2e11b5395c78562967959824156af11fa93d18fdd18050990", size = 7075684, upload-time = "2025-10-21T16:20:35.435Z" },
+ { url = "https://files.pythonhosted.org/packages/38/f7/e3cdb252492278e004722306c5a8935eae91e64ea11f0af3437a7de2e2b7/grpcio-1.76.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:49ce47231818806067aea3324d4bf13825b658ad662d3b25fada0bdad9b8a6af", size = 6611133, upload-time = "2025-10-21T16:20:37.541Z" },
+ { url = "https://files.pythonhosted.org/packages/7e/20/340db7af162ccd20a0893b5f3c4a5d676af7b71105517e62279b5b61d95a/grpcio-1.76.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:8cc3309d8e08fd79089e13ed4819d0af72aa935dd8f435a195fd152796752ff2", size = 7195507, upload-time = "2025-10-21T16:20:39.643Z" },
+ { url = "https://files.pythonhosted.org/packages/10/f0/b2160addc1487bd8fa4810857a27132fb4ce35c1b330c2f3ac45d697b106/grpcio-1.76.0-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:971fd5a1d6e62e00d945423a567e42eb1fa678ba89072832185ca836a94daaa6", size = 8160651, upload-time = "2025-10-21T16:20:42.492Z" },
+ { url = "https://files.pythonhosted.org/packages/2c/2c/ac6f98aa113c6ef111b3f347854e99ebb7fb9d8f7bb3af1491d438f62af4/grpcio-1.76.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:9d9adda641db7207e800a7f089068f6f645959f2df27e870ee81d44701dd9db3", size = 7620568, upload-time = "2025-10-21T16:20:45.995Z" },
+ { url = "https://files.pythonhosted.org/packages/90/84/7852f7e087285e3ac17a2703bc4129fafee52d77c6c82af97d905566857e/grpcio-1.76.0-cp310-cp310-win32.whl", hash = "sha256:063065249d9e7e0782d03d2bca50787f53bd0fb89a67de9a7b521c4a01f1989b", size = 3998879, upload-time = "2025-10-21T16:20:48.592Z" },
+ { url = "https://files.pythonhosted.org/packages/10/30/d3d2adcbb6dd3ff59d6ac3df6ef830e02b437fb5c90990429fd180e52f30/grpcio-1.76.0-cp310-cp310-win_amd64.whl", hash = "sha256:a6ae758eb08088d36812dd5d9af7a9859c05b1e0f714470ea243694b49278e7b", size = 4706892, upload-time = "2025-10-21T16:20:50.697Z" },
+ { url = "https://files.pythonhosted.org/packages/a0/00/8163a1beeb6971f66b4bbe6ac9457b97948beba8dd2fc8e1281dce7f79ec/grpcio-1.76.0-cp311-cp311-linux_armv7l.whl", hash = "sha256:2e1743fbd7f5fa713a1b0a8ac8ebabf0ec980b5d8809ec358d488e273b9cf02a", size = 5843567, upload-time = "2025-10-21T16:20:52.829Z" },
+ { url = "https://files.pythonhosted.org/packages/10/c1/934202f5cf335e6d852530ce14ddb0fef21be612ba9ecbbcbd4d748ca32d/grpcio-1.76.0-cp311-cp311-macosx_11_0_universal2.whl", hash = "sha256:a8c2cf1209497cf659a667d7dea88985e834c24b7c3b605e6254cbb5076d985c", size = 11848017, upload-time = "2025-10-21T16:20:56.705Z" },
+ { url = "https://files.pythonhosted.org/packages/11/0b/8dec16b1863d74af6eb3543928600ec2195af49ca58b16334972f6775663/grpcio-1.76.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:08caea849a9d3c71a542827d6df9d5a69067b0a1efbea8a855633ff5d9571465", size = 6412027, upload-time = "2025-10-21T16:20:59.3Z" },
+ { url = "https://files.pythonhosted.org/packages/d7/64/7b9e6e7ab910bea9d46f2c090380bab274a0b91fb0a2fe9b0cd399fffa12/grpcio-1.76.0-cp311-cp311-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:f0e34c2079d47ae9f6188211db9e777c619a21d4faba6977774e8fa43b085e48", size = 7075913, upload-time = "2025-10-21T16:21:01.645Z" },
+ { url = "https://files.pythonhosted.org/packages/68/86/093c46e9546073cefa789bd76d44c5cb2abc824ca62af0c18be590ff13ba/grpcio-1.76.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8843114c0cfce61b40ad48df65abcfc00d4dba82eae8718fab5352390848c5da", size = 6615417, upload-time = "2025-10-21T16:21:03.844Z" },
+ { url = "https://files.pythonhosted.org/packages/f7/b6/5709a3a68500a9c03da6fb71740dcdd5ef245e39266461a03f31a57036d8/grpcio-1.76.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:8eddfb4d203a237da6f3cc8a540dad0517d274b5a1e9e636fd8d2c79b5c1d397", size = 7199683, upload-time = "2025-10-21T16:21:06.195Z" },
+ { url = "https://files.pythonhosted.org/packages/91/d3/4b1f2bf16ed52ce0b508161df3a2d186e4935379a159a834cb4a7d687429/grpcio-1.76.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:32483fe2aab2c3794101c2a159070584e5db11d0aa091b2c0ea9c4fc43d0d749", size = 8163109, upload-time = "2025-10-21T16:21:08.498Z" },
+ { url = "https://files.pythonhosted.org/packages/5c/61/d9043f95f5f4cf085ac5dd6137b469d41befb04bd80280952ffa2a4c3f12/grpcio-1.76.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:dcfe41187da8992c5f40aa8c5ec086fa3672834d2be57a32384c08d5a05b4c00", size = 7626676, upload-time = "2025-10-21T16:21:10.693Z" },
+ { url = "https://files.pythonhosted.org/packages/36/95/fd9a5152ca02d8881e4dd419cdd790e11805979f499a2e5b96488b85cf27/grpcio-1.76.0-cp311-cp311-win32.whl", hash = "sha256:2107b0c024d1b35f4083f11245c0e23846ae64d02f40b2b226684840260ed054", size = 3997688, upload-time = "2025-10-21T16:21:12.746Z" },
+ { url = "https://files.pythonhosted.org/packages/60/9c/5c359c8d4c9176cfa3c61ecd4efe5affe1f38d9bae81e81ac7186b4c9cc8/grpcio-1.76.0-cp311-cp311-win_amd64.whl", hash = "sha256:522175aba7af9113c48ec10cc471b9b9bd4f6ceb36aeb4544a8e2c80ed9d252d", size = 4709315, upload-time = "2025-10-21T16:21:15.26Z" },
+ { url = "https://files.pythonhosted.org/packages/bf/05/8e29121994b8d959ffa0afd28996d452f291b48cfc0875619de0bde2c50c/grpcio-1.76.0-cp312-cp312-linux_armv7l.whl", hash = "sha256:81fd9652b37b36f16138611c7e884eb82e0cec137c40d3ef7c3f9b3ed00f6ed8", size = 5799718, upload-time = "2025-10-21T16:21:17.939Z" },
+ { url = "https://files.pythonhosted.org/packages/d9/75/11d0e66b3cdf998c996489581bdad8900db79ebd83513e45c19548f1cba4/grpcio-1.76.0-cp312-cp312-macosx_11_0_universal2.whl", hash = "sha256:04bbe1bfe3a68bbfd4e52402ab7d4eb59d72d02647ae2042204326cf4bbad280", size = 11825627, upload-time = "2025-10-21T16:21:20.466Z" },
+ { url = "https://files.pythonhosted.org/packages/28/50/2f0aa0498bc188048f5d9504dcc5c2c24f2eb1a9337cd0fa09a61a2e75f0/grpcio-1.76.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:d388087771c837cdb6515539f43b9d4bf0b0f23593a24054ac16f7a960be16f4", size = 6359167, upload-time = "2025-10-21T16:21:23.122Z" },
+ { url = "https://files.pythonhosted.org/packages/66/e5/bbf0bb97d29ede1d59d6588af40018cfc345b17ce979b7b45424628dc8bb/grpcio-1.76.0-cp312-cp312-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:9f8f757bebaaea112c00dba718fc0d3260052ce714e25804a03f93f5d1c6cc11", size = 7044267, upload-time = "2025-10-21T16:21:25.995Z" },
+ { url = "https://files.pythonhosted.org/packages/f5/86/f6ec2164f743d9609691115ae8ece098c76b894ebe4f7c94a655c6b03e98/grpcio-1.76.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:980a846182ce88c4f2f7e2c22c56aefd515daeb36149d1c897f83cf57999e0b6", size = 6573963, upload-time = "2025-10-21T16:21:28.631Z" },
+ { url = "https://files.pythonhosted.org/packages/60/bc/8d9d0d8505feccfdf38a766d262c71e73639c165b311c9457208b56d92ae/grpcio-1.76.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:f92f88e6c033db65a5ae3d97905c8fea9c725b63e28d5a75cb73b49bda5024d8", size = 7164484, upload-time = "2025-10-21T16:21:30.837Z" },
+ { url = "https://files.pythonhosted.org/packages/67/e6/5d6c2fc10b95edf6df9b8f19cf10a34263b7fd48493936fffd5085521292/grpcio-1.76.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:4baf3cbe2f0be3289eb68ac8ae771156971848bb8aaff60bad42005539431980", size = 8127777, upload-time = "2025-10-21T16:21:33.577Z" },
+ { url = "https://files.pythonhosted.org/packages/3f/c8/dce8ff21c86abe025efe304d9e31fdb0deaaa3b502b6a78141080f206da0/grpcio-1.76.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:615ba64c208aaceb5ec83bfdce7728b80bfeb8be97562944836a7a0a9647d882", size = 7594014, upload-time = "2025-10-21T16:21:41.882Z" },
+ { url = "https://files.pythonhosted.org/packages/e0/42/ad28191ebf983a5d0ecef90bab66baa5a6b18f2bfdef9d0a63b1973d9f75/grpcio-1.76.0-cp312-cp312-win32.whl", hash = "sha256:45d59a649a82df5718fd9527ce775fd66d1af35e6d31abdcdc906a49c6822958", size = 3984750, upload-time = "2025-10-21T16:21:44.006Z" },
+ { url = "https://files.pythonhosted.org/packages/9e/00/7bd478cbb851c04a48baccaa49b75abaa8e4122f7d86da797500cccdd771/grpcio-1.76.0-cp312-cp312-win_amd64.whl", hash = "sha256:c088e7a90b6017307f423efbb9d1ba97a22aa2170876223f9709e9d1de0b5347", size = 4704003, upload-time = "2025-10-21T16:21:46.244Z" },
+]
+
+[[package]]
+name = "h11"
+version = "0.16.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/01/ee/02a2c011bdab74c6fb3c75474d40b3052059d95df7e73351460c8588d963/h11-0.16.0.tar.gz", hash = "sha256:4e35b956cf45792e4caa5885e69fba00bdbc6ffafbfa020300e549b208ee5ff1", size = 101250, upload-time = "2025-04-24T03:35:25.427Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515, upload-time = "2025-04-24T03:35:24.344Z" },
+]
+
+[[package]]
+name = "hf-xet"
+version = "1.2.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/5e/6e/0f11bacf08a67f7fb5ee09740f2ca54163863b07b70d579356e9222ce5d8/hf_xet-1.2.0.tar.gz", hash = "sha256:a8c27070ca547293b6890c4bf389f713f80e8c478631432962bb7f4bc0bd7d7f", size = 506020, upload-time = "2025-10-24T19:04:32.129Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/96/2d/22338486473df5923a9ab7107d375dbef9173c338ebef5098ef593d2b560/hf_xet-1.2.0-cp37-abi3-macosx_10_12_x86_64.whl", hash = "sha256:46740d4ac024a7ca9b22bebf77460ff43332868b661186a8e46c227fdae01848", size = 2866099, upload-time = "2025-10-24T19:04:15.366Z" },
+ { url = "https://files.pythonhosted.org/packages/7f/8c/c5becfa53234299bc2210ba314eaaae36c2875e0045809b82e40a9544f0c/hf_xet-1.2.0-cp37-abi3-macosx_11_0_arm64.whl", hash = "sha256:27df617a076420d8845bea087f59303da8be17ed7ec0cd7ee3b9b9f579dff0e4", size = 2722178, upload-time = "2025-10-24T19:04:13.695Z" },
+ { url = "https://files.pythonhosted.org/packages/9a/92/cf3ab0b652b082e66876d08da57fcc6fa2f0e6c70dfbbafbd470bb73eb47/hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3651fd5bfe0281951b988c0facbe726aa5e347b103a675f49a3fa8144c7968fd", size = 3320214, upload-time = "2025-10-24T19:04:03.596Z" },
+ { url = "https://files.pythonhosted.org/packages/46/92/3f7ec4a1b6a65bf45b059b6d4a5d38988f63e193056de2f420137e3c3244/hf_xet-1.2.0-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:d06fa97c8562fb3ee7a378dd9b51e343bc5bc8190254202c9771029152f5e08c", size = 3229054, upload-time = "2025-10-24T19:04:01.949Z" },
+ { url = "https://files.pythonhosted.org/packages/0b/dd/7ac658d54b9fb7999a0ccb07ad863b413cbaf5cf172f48ebcd9497ec7263/hf_xet-1.2.0-cp37-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:4c1428c9ae73ec0939410ec73023c4f842927f39db09b063b9482dac5a3bb737", size = 3413812, upload-time = "2025-10-24T19:04:24.585Z" },
+ { url = "https://files.pythonhosted.org/packages/92/68/89ac4e5b12a9ff6286a12174c8538a5930e2ed662091dd2572bbe0a18c8a/hf_xet-1.2.0-cp37-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a55558084c16b09b5ed32ab9ed38421e2d87cf3f1f89815764d1177081b99865", size = 3508920, upload-time = "2025-10-24T19:04:26.927Z" },
+ { url = "https://files.pythonhosted.org/packages/cb/44/870d44b30e1dcfb6a65932e3e1506c103a8a5aea9103c337e7a53180322c/hf_xet-1.2.0-cp37-abi3-win_amd64.whl", hash = "sha256:e6584a52253f72c9f52f9e549d5895ca7a471608495c4ecaa6cc73dba2b24d69", size = 2905735, upload-time = "2025-10-24T19:04:35.928Z" },
+]
+
+[[package]]
+name = "httpcore"
+version = "1.0.9"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "certifi" },
+ { name = "h11" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/06/94/82699a10bca87a5556c9c59b5963f2d039dbd239f25bc2a63907a05a14cb/httpcore-1.0.9.tar.gz", hash = "sha256:6e34463af53fd2ab5d807f399a9b45ea31c3dfa2276f15a2c3f00afff6e176e8", size = 85484, upload-time = "2025-04-24T22:06:22.219Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" },
+]
+
+[[package]]
+name = "httpx"
+version = "0.28.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "anyio" },
+ { name = "certifi" },
+ { name = "httpcore" },
+ { name = "idna" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b1/df/48c586a5fe32a0f01324ee087459e112ebb7224f646c0b5023f5e79e9956/httpx-0.28.1.tar.gz", hash = "sha256:75e98c5f16b0f35b567856f597f06ff2270a374470a5c2392242528e3e3e42fc", size = 141406, upload-time = "2024-12-06T15:37:23.222Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517, upload-time = "2024-12-06T15:37:21.509Z" },
+]
+
+[[package]]
+name = "huggingface-hub"
+version = "1.4.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "filelock" },
+ { name = "fsspec" },
+ { name = "hf-xet", marker = "platform_machine == 'AMD64' or platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'arm64' or platform_machine == 'x86_64'" },
+ { name = "httpx" },
+ { name = "packaging" },
+ { name = "pyyaml" },
+ { name = "shellingham" },
+ { name = "tqdm" },
+ { name = "typer-slim" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/d9/0e/e73927175162b8a4702b9f59268860f441fbe037c3960b1b6791eeb1deb7/huggingface_hub-1.4.0.tar.gz", hash = "sha256:dd8ca29409be10f544b624265f7ffe13a1a5c3f049f493b5dc9816ef3c6bd57b", size = 641608, upload-time = "2026-02-04T13:48:55.341Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3f/74/f0fb3a54fbca7c0aeff85f41d93b90ca3f6a36d918459401a3890763c54b/huggingface_hub-1.4.0-py3-none-any.whl", hash = "sha256:49d380ffddb31d9d4b6acc0792691f8fa077e1ed51980ed42c7abca62ec1b3b6", size = 553202, upload-time = "2026-02-04T13:48:53.545Z" },
+]
+
+[[package]]
+name = "hydra-core"
+version = "1.3.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "antlr4-python3-runtime" },
+ { name = "omegaconf" },
+ { name = "packaging" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/6d/8e/07e42bc434a847154083b315779b0a81d567154504624e181caf2c71cd98/hydra-core-1.3.2.tar.gz", hash = "sha256:8a878ed67216997c3e9d88a8e72e7b4767e81af37afb4ea3334b269a4390a824", size = 3263494, upload-time = "2023-02-23T18:33:43.03Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c6/50/e0edd38dcd63fb26a8547f13d28f7a008bc4a3fd4eb4ff030673f22ad41a/hydra_core-1.3.2-py3-none-any.whl", hash = "sha256:fa0238a9e31df3373b35b0bfb672c34cc92718d21f81311d8996a16de1141d8b", size = 154547, upload-time = "2023-02-23T18:33:40.801Z" },
+]
+
+[[package]]
+name = "idna"
+version = "3.11"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/6f/6d/0703ccc57f3a7233505399edb88de3cbd678da106337b9fcde432b65ed60/idna-3.11.tar.gz", hash = "sha256:795dafcc9c04ed0c1fb032c2aa73654d8e8c5023a7df64a53f39190ada629902", size = 194582, upload-time = "2025-10-12T14:55:20.501Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" },
+]
+
+[[package]]
+name = "imageio"
+version = "2.37.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "numpy" },
+ { name = "pillow" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a3/6f/606be632e37bf8d05b253e8626c2291d74c691ddc7bcdf7d6aaf33b32f6a/imageio-2.37.2.tar.gz", hash = "sha256:0212ef2727ac9caa5ca4b2c75ae89454312f440a756fcfc8ef1993e718f50f8a", size = 389600, upload-time = "2025-11-04T14:29:39.898Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/fb/fe/301e0936b79bcab4cacc7548bf2853fc28dced0a578bab1f7ef53c9aa75b/imageio-2.37.2-py3-none-any.whl", hash = "sha256:ad9adfb20335d718c03de457358ed69f141021a333c40a53e57273d8a5bd0b9b", size = 317646, upload-time = "2025-11-04T14:29:37.948Z" },
+]
+
+[[package]]
+name = "iniconfig"
+version = "2.3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/72/34/14ca021ce8e5dfedc35312d08ba8bf51fdd999c576889fc2c24cb97f4f10/iniconfig-2.3.0.tar.gz", hash = "sha256:c76315c77db068650d49c5b56314774a7804df16fee4402c1f19d6d15d8c4730", size = 20503, upload-time = "2025-10-18T21:55:43.219Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" },
+]
+
+[[package]]
+name = "iopath"
+version = "0.1.10"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "portalocker" },
+ { name = "tqdm" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/72/73/b3d451dfc523756cf177d3ebb0af76dc7751b341c60e2a21871be400ae29/iopath-0.1.10.tar.gz", hash = "sha256:3311c16a4d9137223e20f141655759933e1eda24f8bff166af834af3c645ef01", size = 42226, upload-time = "2022-07-09T19:00:50.866Z" }
+
+[[package]]
+name = "ipycanvas"
+version = "0.14.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "ipywidgets" },
+ { name = "numpy" },
+ { name = "pillow" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/50/56/484c8979bbcaa3e3f2da4eac6a1eb41e998e353e4c6ef89e9612889813c8/ipycanvas-0.14.3.tar.gz", hash = "sha256:c6a53a22eebf4d611b168b8f4434145883f27a7575509bd99a4bfc48c5385a39", size = 4150499, upload-time = "2025-12-11T09:12:59.916Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/11/37/c6880bd16093793dcb4c005011cf968f45fd815b7b5094fa8374524add26/ipycanvas-0.14.3-py2.py3-none-any.whl", hash = "sha256:8a2f48e1e079355d3e7d5683e5c6e7684a87c15c3750c8d8cd2289c95383ee3e", size = 142962, upload-time = "2025-12-11T09:12:50.5Z" },
+]
+
+[[package]]
+name = "ipykernel"
+version = "7.1.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "appnope", marker = "sys_platform == 'darwin'" },
+ { name = "comm" },
+ { name = "debugpy" },
+ { name = "ipython", version = "8.38.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "ipython", version = "9.10.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "jupyter-client" },
+ { name = "jupyter-core" },
+ { name = "matplotlib-inline" },
+ { name = "nest-asyncio" },
+ { name = "packaging" },
+ { name = "psutil" },
+ { name = "pyzmq" },
+ { name = "tornado" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b9/a4/4948be6eb88628505b83a1f2f40d90254cab66abf2043b3c40fa07dfce0f/ipykernel-7.1.0.tar.gz", hash = "sha256:58a3fc88533d5930c3546dc7eac66c6d288acde4f801e2001e65edc5dc9cf0db", size = 174579, upload-time = "2025-10-27T09:46:39.471Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/a3/17/20c2552266728ceba271967b87919664ecc0e33efca29c3efc6baf88c5f9/ipykernel-7.1.0-py3-none-any.whl", hash = "sha256:763b5ec6c5b7776f6a8d7ce09b267693b4e5ce75cb50ae696aaefb3c85e1ea4c", size = 117968, upload-time = "2025-10-27T09:46:37.805Z" },
+]
+
+[[package]]
+name = "ipympl"
+version = "0.10.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "ipython", version = "8.38.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "ipython", version = "9.10.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "ipywidgets" },
+ { name = "matplotlib" },
+ { name = "numpy" },
+ { name = "pillow" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/16/9c/f79e29f6262e821a15757662aa11cbb1db0a51ef836a32a46ddcb25e6832/ipympl-0.10.0.tar.gz", hash = "sha256:eda69602a010af2a42e8ebd069b0ee0dbe8df7fc69d7c1e8b99fece0a2fe613f", size = 3595672, upload-time = "2026-01-21T20:19:47.971Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/12/b3/88c0ef22878c86035f058df0ac6c171319ffd0aa52a406455ed3a3847566/ipympl-0.10.0-py3-none-any.whl", hash = "sha256:a09c4f0ff86490cc62aed45e53b912fb706e3ec3506c4a51ce4a670d6667f5ce", size = 519020, upload-time = "2026-01-21T20:19:46.325Z" },
+]
+
+[[package]]
+name = "ipython"
+version = "8.38.0"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+dependencies = [
+ { name = "colorama", marker = "python_full_version < '3.11' and sys_platform == 'win32'" },
+ { name = "decorator", marker = "python_full_version < '3.11'" },
+ { name = "exceptiongroup", marker = "python_full_version < '3.11'" },
+ { name = "jedi", marker = "python_full_version < '3.11'" },
+ { name = "matplotlib-inline", marker = "python_full_version < '3.11'" },
+ { name = "pexpect", marker = "python_full_version < '3.11' and sys_platform != 'emscripten' and sys_platform != 'win32'" },
+ { name = "prompt-toolkit", marker = "python_full_version < '3.11'" },
+ { name = "pygments", marker = "python_full_version < '3.11'" },
+ { name = "stack-data", marker = "python_full_version < '3.11'" },
+ { name = "traitlets", marker = "python_full_version < '3.11'" },
+ { name = "typing-extensions", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/e5/61/1810830e8b93c72dcd3c0f150c80a00c3deb229562d9423807ec92c3a539/ipython-8.38.0.tar.gz", hash = "sha256:9cfea8c903ce0867cc2f23199ed8545eb741f3a69420bfcf3743ad1cec856d39", size = 5513996, upload-time = "2026-01-05T10:59:06.901Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9f/df/db59624f4c71b39717c423409950ac3f2c8b2ce4b0aac843112c7fb3f721/ipython-8.38.0-py3-none-any.whl", hash = "sha256:750162629d800ac65bb3b543a14e7a74b0e88063eac9b92124d4b2aa3f6d8e86", size = 831813, upload-time = "2026-01-05T10:59:04.239Z" },
+]
+
+[[package]]
+name = "ipython"
+version = "9.10.0"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+]
+dependencies = [
+ { name = "colorama", marker = "python_full_version >= '3.11' and sys_platform == 'win32'" },
+ { name = "decorator", marker = "python_full_version >= '3.11'" },
+ { name = "ipython-pygments-lexers", marker = "python_full_version >= '3.11'" },
+ { name = "jedi", marker = "python_full_version >= '3.11'" },
+ { name = "matplotlib-inline", marker = "python_full_version >= '3.11'" },
+ { name = "pexpect", marker = "python_full_version >= '3.11' and sys_platform != 'emscripten' and sys_platform != 'win32'" },
+ { name = "prompt-toolkit", marker = "python_full_version >= '3.11'" },
+ { name = "pygments", marker = "python_full_version >= '3.11'" },
+ { name = "stack-data", marker = "python_full_version >= '3.11'" },
+ { name = "traitlets", marker = "python_full_version >= '3.11'" },
+ { name = "typing-extensions", marker = "python_full_version == '3.11.*'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a6/60/2111715ea11f39b1535bed6024b7dec7918b71e5e5d30855a5b503056b50/ipython-9.10.0.tar.gz", hash = "sha256:cd9e656be97618a0676d058134cd44e6dc7012c0e5cb36a9ce96a8c904adaf77", size = 4426526, upload-time = "2026-02-02T10:00:33.594Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3d/aa/898dec789a05731cd5a9f50605b7b44a72bd198fd0d4528e11fc610177cc/ipython-9.10.0-py3-none-any.whl", hash = "sha256:c6ab68cc23bba8c7e18e9b932797014cc61ea7fd6f19de180ab9ba73e65ee58d", size = 622774, upload-time = "2026-02-02T10:00:31.503Z" },
+]
+
+[[package]]
+name = "ipython-pygments-lexers"
+version = "1.1.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "pygments", marker = "python_full_version >= '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ef/4c/5dd1d8af08107f88c7f741ead7a40854b8ac24ddf9ae850afbcf698aa552/ipython_pygments_lexers-1.1.1.tar.gz", hash = "sha256:09c0138009e56b6854f9535736f4171d855c8c08a563a0dcd8022f78355c7e81", size = 8393, upload-time = "2025-01-17T11:24:34.505Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/d9/33/1f075bf72b0b747cb3288d011319aaf64083cf2efef8354174e3ed4540e2/ipython_pygments_lexers-1.1.1-py3-none-any.whl", hash = "sha256:a9462224a505ade19a605f71f8fa63c2048833ce50abc86768a0d81d876dc81c", size = 8074, upload-time = "2025-01-17T11:24:33.271Z" },
+]
+
+[[package]]
+name = "ipywidgets"
+version = "8.1.8"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "comm" },
+ { name = "ipython", version = "8.38.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "ipython", version = "9.10.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "jupyterlab-widgets" },
+ { name = "traitlets" },
+ { name = "widgetsnbextension" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/4c/ae/c5ce1edc1afe042eadb445e95b0671b03cee61895264357956e61c0d2ac0/ipywidgets-8.1.8.tar.gz", hash = "sha256:61f969306b95f85fba6b6986b7fe45d73124d1d9e3023a8068710d47a22ea668", size = 116739, upload-time = "2025-11-01T21:18:12.393Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/56/6d/0d9848617b9f753b87f214f1c682592f7ca42de085f564352f10f0843026/ipywidgets-8.1.8-py3-none-any.whl", hash = "sha256:ecaca67aed704a338f88f67b1181b58f821ab5dc89c1f0f5ef99db43c1c2921e", size = 139808, upload-time = "2025-11-01T21:18:10.956Z" },
+]
+
+[[package]]
+name = "isoduration"
+version = "20.11.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "arrow" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/7c/1a/3c8edc664e06e6bd06cce40c6b22da5f1429aa4224d0c590f3be21c91ead/isoduration-20.11.0.tar.gz", hash = "sha256:ac2f9015137935279eac671f94f89eb00584f940f5dc49462a0c4ee692ba1bd9", size = 11649, upload-time = "2020-11-01T11:00:00.312Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/7b/55/e5326141505c5d5e34c5e0935d2908a74e4561eca44108fbfb9c13d2911a/isoduration-20.11.0-py3-none-any.whl", hash = "sha256:b2904c2a4228c3d44f409c8ae8e2370eb21a26f7ac2ec5446df141dde3452042", size = 11321, upload-time = "2020-11-01T10:59:58.02Z" },
+]
+
+[[package]]
+name = "jedi"
+version = "0.19.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "parso" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/72/3a/79a912fbd4d8dd6fbb02bf69afd3bb72cf0c729bb3063c6f4498603db17a/jedi-0.19.2.tar.gz", hash = "sha256:4770dc3de41bde3966b02eb84fbcf557fb33cce26ad23da12c742fb50ecb11f0", size = 1231287, upload-time = "2024-11-11T01:41:42.873Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c0/5a/9cac0c82afec3d09ccd97c8b6502d48f165f9124db81b4bcb90b4af974ee/jedi-0.19.2-py2.py3-none-any.whl", hash = "sha256:a8ef22bde8490f57fe5c7681a3c83cb58874daf72b4784de3cce5b6ef6edb5b9", size = 1572278, upload-time = "2024-11-11T01:41:40.175Z" },
+]
+
+[[package]]
+name = "jinja2"
+version = "3.1.6"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "markupsafe" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/df/bf/f7da0350254c0ed7c72f3e33cef02e048281fec7ecec5f032d4aac52226b/jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d", size = 245115, upload-time = "2025-03-05T20:05:02.478Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" },
+]
+
+[[package]]
+name = "joblib"
+version = "1.5.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/41/f2/d34e8b3a08a9cc79a50b2208a93dce981fe615b64d5a4d4abee421d898df/joblib-1.5.3.tar.gz", hash = "sha256:8561a3269e6801106863fd0d6d84bb737be9e7631e33aaed3fb9ce5953688da3", size = 331603, upload-time = "2025-12-15T08:41:46.427Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/7b/91/984aca2ec129e2757d1e4e3c81c3fcda9d0f85b74670a094cc443d9ee949/joblib-1.5.3-py3-none-any.whl", hash = "sha256:5fc3c5039fc5ca8c0276333a188bbd59d6b7ab37fe6632daa76bc7f9ec18e713", size = 309071, upload-time = "2025-12-15T08:41:44.973Z" },
+]
+
+[[package]]
+name = "json5"
+version = "0.13.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/77/e8/a3f261a66e4663f22700bc8a17c08cb83e91fbf086726e7a228398968981/json5-0.13.0.tar.gz", hash = "sha256:b1edf8d487721c0bf64d83c28e91280781f6e21f4a797d3261c7c828d4c165bf", size = 52441, upload-time = "2026-01-01T19:42:14.99Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/d7/9e/038522f50ceb7e74f1f991bf1b699f24b0c2bbe7c390dd36ad69f4582258/json5-0.13.0-py3-none-any.whl", hash = "sha256:9a08e1dd65f6a4d4c6fa82d216cf2477349ec2346a38fd70cc11d2557499fbcc", size = 36163, upload-time = "2026-01-01T19:42:13.962Z" },
+]
+
+[[package]]
+name = "jsonpointer"
+version = "3.0.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/6a/0a/eebeb1fa92507ea94016a2a790b93c2ae41a7e18778f85471dc54475ed25/jsonpointer-3.0.0.tar.gz", hash = "sha256:2b2d729f2091522d61c3b31f82e11870f60b68f43fbc705cb76bf4b832af59ef", size = 9114, upload-time = "2024-06-10T19:24:42.462Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/71/92/5e77f98553e9e75130c78900d000368476aed74276eb8ae8796f65f00918/jsonpointer-3.0.0-py2.py3-none-any.whl", hash = "sha256:13e088adc14fca8b6aa8177c044e12701e6ad4b28ff10e65f2267a90109c9942", size = 7595, upload-time = "2024-06-10T19:24:40.698Z" },
+]
+
+[[package]]
+name = "jsonschema"
+version = "4.26.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "attrs" },
+ { name = "jsonschema-specifications" },
+ { name = "referencing" },
+ { name = "rpds-py" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b3/fc/e067678238fa451312d4c62bf6e6cf5ec56375422aee02f9cb5f909b3047/jsonschema-4.26.0.tar.gz", hash = "sha256:0c26707e2efad8aa1bfc5b7ce170f3fccc2e4918ff85989ba9ffa9facb2be326", size = 366583, upload-time = "2026-01-07T13:41:07.246Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/69/90/f63fb5873511e014207a475e2bb4e8b2e570d655b00ac19a9a0ca0a385ee/jsonschema-4.26.0-py3-none-any.whl", hash = "sha256:d489f15263b8d200f8387e64b4c3a75f06629559fb73deb8fdfb525f2dab50ce", size = 90630, upload-time = "2026-01-07T13:41:05.306Z" },
+]
+
+[package.optional-dependencies]
+format-nongpl = [
+ { name = "fqdn" },
+ { name = "idna" },
+ { name = "isoduration" },
+ { name = "jsonpointer" },
+ { name = "rfc3339-validator" },
+ { name = "rfc3986-validator" },
+ { name = "rfc3987-syntax" },
+ { name = "uri-template" },
+ { name = "webcolors" },
+]
+
+[[package]]
+name = "jsonschema-specifications"
+version = "2025.9.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "referencing" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/19/74/a633ee74eb36c44aa6d1095e7cc5569bebf04342ee146178e2d36600708b/jsonschema_specifications-2025.9.1.tar.gz", hash = "sha256:b540987f239e745613c7a9176f3edb72b832a4ac465cf02712288397832b5e8d", size = 32855, upload-time = "2025-09-08T01:34:59.186Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/41/45/1a4ed80516f02155c51f51e8cedb3c1902296743db0bbc66608a0db2814f/jsonschema_specifications-2025.9.1-py3-none-any.whl", hash = "sha256:98802fee3a11ee76ecaca44429fda8a41bff98b00a0f2838151b113f210cc6fe", size = 18437, upload-time = "2025-09-08T01:34:57.871Z" },
+]
+
+[[package]]
+name = "jupyter"
+version = "1.1.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "ipykernel" },
+ { name = "ipywidgets" },
+ { name = "jupyter-console" },
+ { name = "jupyterlab" },
+ { name = "nbconvert" },
+ { name = "notebook" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/58/f3/af28ea964ab8bc1e472dba2e82627d36d470c51f5cd38c37502eeffaa25e/jupyter-1.1.1.tar.gz", hash = "sha256:d55467bceabdea49d7e3624af7e33d59c37fff53ed3a350e1ac957bed731de7a", size = 5714959, upload-time = "2024-08-30T07:15:48.299Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/38/64/285f20a31679bf547b75602702f7800e74dbabae36ef324f716c02804753/jupyter-1.1.1-py2.py3-none-any.whl", hash = "sha256:7a59533c22af65439b24bbe60373a4e95af8f16ac65a6c00820ad378e3f7cc83", size = 2657, upload-time = "2024-08-30T07:15:47.045Z" },
+]
+
+[[package]]
+name = "jupyter-client"
+version = "8.8.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "jupyter-core" },
+ { name = "python-dateutil" },
+ { name = "pyzmq" },
+ { name = "tornado" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/05/e4/ba649102a3bc3fbca54e7239fb924fd434c766f855693d86de0b1f2bec81/jupyter_client-8.8.0.tar.gz", hash = "sha256:d556811419a4f2d96c869af34e854e3f059b7cc2d6d01a9cd9c85c267691be3e", size = 348020, upload-time = "2026-01-08T13:55:47.938Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/2d/0b/ceb7694d864abc0a047649aec263878acb9f792e1fec3e676f22dc9015e3/jupyter_client-8.8.0-py3-none-any.whl", hash = "sha256:f93a5b99c5e23a507b773d3a1136bd6e16c67883ccdbd9a829b0bbdb98cd7d7a", size = 107371, upload-time = "2026-01-08T13:55:45.562Z" },
+]
+
+[[package]]
+name = "jupyter-console"
+version = "6.6.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "ipykernel" },
+ { name = "ipython", version = "8.38.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "ipython", version = "9.10.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "jupyter-client" },
+ { name = "jupyter-core" },
+ { name = "prompt-toolkit" },
+ { name = "pygments" },
+ { name = "pyzmq" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/bd/2d/e2fd31e2fc41c14e2bcb6c976ab732597e907523f6b2420305f9fc7fdbdb/jupyter_console-6.6.3.tar.gz", hash = "sha256:566a4bf31c87adbfadf22cdf846e3069b59a71ed5da71d6ba4d8aaad14a53539", size = 34363, upload-time = "2023-03-06T14:13:31.02Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ca/77/71d78d58f15c22db16328a476426f7ac4a60d3a5a7ba3b9627ee2f7903d4/jupyter_console-6.6.3-py3-none-any.whl", hash = "sha256:309d33409fcc92ffdad25f0bcdf9a4a9daa61b6f341177570fdac03de5352485", size = 24510, upload-time = "2023-03-06T14:13:28.229Z" },
+]
+
+[[package]]
+name = "jupyter-core"
+version = "5.9.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "platformdirs" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/02/49/9d1284d0dc65e2c757b74c6687b6d319b02f822ad039e5c512df9194d9dd/jupyter_core-5.9.1.tar.gz", hash = "sha256:4d09aaff303b9566c3ce657f580bd089ff5c91f5f89cf7d8846c3cdf465b5508", size = 89814, upload-time = "2025-10-16T19:19:18.444Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e7/e7/80988e32bf6f73919a113473a604f5a8f09094de312b9d52b79c2df7612b/jupyter_core-5.9.1-py3-none-any.whl", hash = "sha256:ebf87fdc6073d142e114c72c9e29a9d7ca03fad818c5d300ce2adc1fb0743407", size = 29032, upload-time = "2025-10-16T19:19:16.783Z" },
+]
+
+[[package]]
+name = "jupyter-events"
+version = "0.12.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "jsonschema", extra = ["format-nongpl"] },
+ { name = "packaging" },
+ { name = "python-json-logger" },
+ { name = "pyyaml" },
+ { name = "referencing" },
+ { name = "rfc3339-validator" },
+ { name = "rfc3986-validator" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9d/c3/306d090461e4cf3cd91eceaff84bede12a8e52cd821c2d20c9a4fd728385/jupyter_events-0.12.0.tar.gz", hash = "sha256:fc3fce98865f6784c9cd0a56a20644fc6098f21c8c33834a8d9fe383c17e554b", size = 62196, upload-time = "2025-02-03T17:23:41.485Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e2/48/577993f1f99c552f18a0428731a755e06171f9902fa118c379eb7c04ea22/jupyter_events-0.12.0-py3-none-any.whl", hash = "sha256:6464b2fa5ad10451c3d35fabc75eab39556ae1e2853ad0c0cc31b656731a97fb", size = 19430, upload-time = "2025-02-03T17:23:38.643Z" },
+]
+
+[[package]]
+name = "jupyter-lsp"
+version = "2.3.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "jupyter-server" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/eb/5a/9066c9f8e94ee517133cd98dba393459a16cd48bba71a82f16a65415206c/jupyter_lsp-2.3.0.tar.gz", hash = "sha256:458aa59339dc868fb784d73364f17dbce8836e906cd75fd471a325cba02e0245", size = 54823, upload-time = "2025-08-27T17:47:34.671Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1a/60/1f6cee0c46263de1173894f0fafcb3475ded276c472c14d25e0280c18d6d/jupyter_lsp-2.3.0-py3-none-any.whl", hash = "sha256:e914a3cb2addf48b1c7710914771aaf1819d46b2e5a79b0f917b5478ec93f34f", size = 76687, upload-time = "2025-08-27T17:47:33.15Z" },
+]
+
+[[package]]
+name = "jupyter-server"
+version = "2.17.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "anyio" },
+ { name = "argon2-cffi" },
+ { name = "jinja2" },
+ { name = "jupyter-client" },
+ { name = "jupyter-core" },
+ { name = "jupyter-events" },
+ { name = "jupyter-server-terminals" },
+ { name = "nbconvert" },
+ { name = "nbformat" },
+ { name = "overrides", marker = "python_full_version < '3.12'" },
+ { name = "packaging" },
+ { name = "prometheus-client" },
+ { name = "pywinpty", marker = "(os_name == 'nt' and platform_machine != 'aarch64' and sys_platform == 'linux') or (os_name == 'nt' and sys_platform != 'darwin' and sys_platform != 'linux')" },
+ { name = "pyzmq" },
+ { name = "send2trash" },
+ { name = "terminado" },
+ { name = "tornado" },
+ { name = "traitlets" },
+ { name = "websocket-client" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5b/ac/e040ec363d7b6b1f11304cc9f209dac4517ece5d5e01821366b924a64a50/jupyter_server-2.17.0.tar.gz", hash = "sha256:c38ea898566964c888b4772ae1ed58eca84592e88251d2cfc4d171f81f7e99d5", size = 731949, upload-time = "2025-08-21T14:42:54.042Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/92/80/a24767e6ca280f5a49525d987bf3e4d7552bf67c8be07e8ccf20271f8568/jupyter_server-2.17.0-py3-none-any.whl", hash = "sha256:e8cb9c7db4251f51ed307e329b81b72ccf2056ff82d50524debde1ee1870e13f", size = 388221, upload-time = "2025-08-21T14:42:52.034Z" },
+]
+
+[[package]]
+name = "jupyter-server-terminals"
+version = "0.5.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "pywinpty", marker = "(os_name == 'nt' and platform_machine != 'aarch64' and sys_platform == 'linux') or (os_name == 'nt' and sys_platform != 'darwin' and sys_platform != 'linux')" },
+ { name = "terminado" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/f4/a7/bcd0a9b0cbba88986fe944aaaf91bfda603e5a50bda8ed15123f381a3b2f/jupyter_server_terminals-0.5.4.tar.gz", hash = "sha256:bbda128ed41d0be9020349f9f1f2a4ab9952a73ed5f5ac9f1419794761fb87f5", size = 31770, upload-time = "2026-01-14T16:53:20.213Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/d1/2d/6674563f71c6320841fc300911a55143925112a72a883e2ca71fba4c618d/jupyter_server_terminals-0.5.4-py3-none-any.whl", hash = "sha256:55be353fc74a80bc7f3b20e6be50a55a61cd525626f578dcb66a5708e2007d14", size = 13704, upload-time = "2026-01-14T16:53:18.738Z" },
+]
+
+[[package]]
+name = "jupyterlab"
+version = "4.5.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "async-lru" },
+ { name = "httpx" },
+ { name = "ipykernel" },
+ { name = "jinja2" },
+ { name = "jupyter-core" },
+ { name = "jupyter-lsp" },
+ { name = "jupyter-server" },
+ { name = "jupyterlab-server" },
+ { name = "notebook-shim" },
+ { name = "packaging" },
+ { name = "setuptools" },
+ { name = "tomli", marker = "python_full_version < '3.11'" },
+ { name = "tornado" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/3e/76/393eae3349f9a39bf21f8f5406e5244d36e2bfc932049b6070c271f92764/jupyterlab-4.5.3.tar.gz", hash = "sha256:4a159f71067cb38e4a82e86a42de8e7e926f384d7f2291964f282282096d27e8", size = 23939231, upload-time = "2026-01-23T15:04:25.768Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9e/9a/0bf9a7a45f0006d7ff4fdc4fc313de4255acab02bf4db1887c65f0472c01/jupyterlab-4.5.3-py3-none-any.whl", hash = "sha256:63c9f3a48de72ba00df766ad6eed416394f5bb883829f11eeff0872302520ba7", size = 12391761, upload-time = "2026-01-23T15:04:21.214Z" },
+]
+
+[[package]]
+name = "jupyterlab-pygments"
+version = "0.3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/90/51/9187be60d989df97f5f0aba133fa54e7300f17616e065d1ada7d7646b6d6/jupyterlab_pygments-0.3.0.tar.gz", hash = "sha256:721aca4d9029252b11cfa9d185e5b5af4d54772bb8072f9b7036f4170054d35d", size = 512900, upload-time = "2023-11-23T09:26:37.44Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/b1/dd/ead9d8ea85bf202d90cc513b533f9c363121c7792674f78e0d8a854b63b4/jupyterlab_pygments-0.3.0-py3-none-any.whl", hash = "sha256:841a89020971da1d8693f1a99997aefc5dc424bb1b251fd6322462a1b8842780", size = 15884, upload-time = "2023-11-23T09:26:34.325Z" },
+]
+
+[[package]]
+name = "jupyterlab-server"
+version = "2.28.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "babel" },
+ { name = "jinja2" },
+ { name = "json5" },
+ { name = "jsonschema" },
+ { name = "jupyter-server" },
+ { name = "packaging" },
+ { name = "requests" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/d6/2c/90153f189e421e93c4bb4f9e3f59802a1f01abd2ac5cf40b152d7f735232/jupyterlab_server-2.28.0.tar.gz", hash = "sha256:35baa81898b15f93573e2deca50d11ac0ae407ebb688299d3a5213265033712c", size = 76996, upload-time = "2025-10-22T13:59:18.37Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e0/07/a000fe835f76b7e1143242ab1122e6362ef1c03f23f83a045c38859c2ae0/jupyterlab_server-2.28.0-py3-none-any.whl", hash = "sha256:e4355b148fdcf34d312bbbc80f22467d6d20460e8b8736bf235577dd18506968", size = 59830, upload-time = "2025-10-22T13:59:16.767Z" },
+]
+
+[[package]]
+name = "jupyterlab-widgets"
+version = "3.0.16"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/26/2d/ef58fed122b268c69c0aa099da20bc67657cdfb2e222688d5731bd5b971d/jupyterlab_widgets-3.0.16.tar.gz", hash = "sha256:423da05071d55cf27a9e602216d35a3a65a3e41cdf9c5d3b643b814ce38c19e0", size = 897423, upload-time = "2025-11-01T21:11:29.724Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ab/b5/36c712098e6191d1b4e349304ef73a8d06aed77e56ceaac8c0a306c7bda1/jupyterlab_widgets-3.0.16-py3-none-any.whl", hash = "sha256:45fa36d9c6422cf2559198e4db481aa243c7a32d9926b500781c830c80f7ecf8", size = 914926, upload-time = "2025-11-01T21:11:28.008Z" },
+]
+
+[[package]]
+name = "kiwisolver"
+version = "1.4.9"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/5c/3c/85844f1b0feb11ee581ac23fe5fce65cd049a200c1446708cc1b7f922875/kiwisolver-1.4.9.tar.gz", hash = "sha256:c3b22c26c6fd6811b0ae8363b95ca8ce4ea3c202d3d0975b2914310ceb1bcc4d", size = 97564, upload-time = "2025-08-10T21:27:49.279Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c6/5d/8ce64e36d4e3aac5ca96996457dcf33e34e6051492399a3f1fec5657f30b/kiwisolver-1.4.9-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:b4b4d74bda2b8ebf4da5bd42af11d02d04428b2c32846e4c2c93219df8a7987b", size = 124159, upload-time = "2025-08-10T21:25:35.472Z" },
+ { url = "https://files.pythonhosted.org/packages/96/1e/22f63ec454874378175a5f435d6ea1363dd33fb2af832c6643e4ccea0dc8/kiwisolver-1.4.9-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:fb3b8132019ea572f4611d770991000d7f58127560c4889729248eb5852a102f", size = 66578, upload-time = "2025-08-10T21:25:36.73Z" },
+ { url = "https://files.pythonhosted.org/packages/41/4c/1925dcfff47a02d465121967b95151c82d11027d5ec5242771e580e731bd/kiwisolver-1.4.9-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:84fd60810829c27ae375114cd379da1fa65e6918e1da405f356a775d49a62bcf", size = 65312, upload-time = "2025-08-10T21:25:37.658Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/42/0f333164e6307a0687d1eb9ad256215aae2f4bd5d28f4653d6cd319a3ba3/kiwisolver-1.4.9-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl", hash = "sha256:b78efa4c6e804ecdf727e580dbb9cba85624d2e1c6b5cb059c66290063bd99a9", size = 1628458, upload-time = "2025-08-10T21:25:39.067Z" },
+ { url = "https://files.pythonhosted.org/packages/86/b6/2dccb977d651943995a90bfe3495c2ab2ba5cd77093d9f2318a20c9a6f59/kiwisolver-1.4.9-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d4efec7bcf21671db6a3294ff301d2fc861c31faa3c8740d1a94689234d1b415", size = 1225640, upload-time = "2025-08-10T21:25:40.489Z" },
+ { url = "https://files.pythonhosted.org/packages/50/2b/362ebd3eec46c850ccf2bfe3e30f2fc4c008750011f38a850f088c56a1c6/kiwisolver-1.4.9-cp310-cp310-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:90f47e70293fc3688b71271100a1a5453aa9944a81d27ff779c108372cf5567b", size = 1244074, upload-time = "2025-08-10T21:25:42.221Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/bb/f09a1e66dab8984773d13184a10a29fe67125337649d26bdef547024ed6b/kiwisolver-1.4.9-cp310-cp310-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:8fdca1def57a2e88ef339de1737a1449d6dbf5fab184c54a1fca01d541317154", size = 1293036, upload-time = "2025-08-10T21:25:43.801Z" },
+ { url = "https://files.pythonhosted.org/packages/ea/01/11ecf892f201cafda0f68fa59212edaea93e96c37884b747c181303fccd1/kiwisolver-1.4.9-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:9cf554f21be770f5111a1690d42313e140355e687e05cf82cb23d0a721a64a48", size = 2175310, upload-time = "2025-08-10T21:25:45.045Z" },
+ { url = "https://files.pythonhosted.org/packages/7f/5f/bfe11d5b934f500cc004314819ea92427e6e5462706a498c1d4fc052e08f/kiwisolver-1.4.9-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:fc1795ac5cd0510207482c3d1d3ed781143383b8cfd36f5c645f3897ce066220", size = 2270943, upload-time = "2025-08-10T21:25:46.393Z" },
+ { url = "https://files.pythonhosted.org/packages/3d/de/259f786bf71f1e03e73d87e2db1a9a3bcab64d7b4fd780167123161630ad/kiwisolver-1.4.9-cp310-cp310-musllinux_1_2_s390x.whl", hash = "sha256:ccd09f20ccdbbd341b21a67ab50a119b64a403b09288c27481575105283c1586", size = 2440488, upload-time = "2025-08-10T21:25:48.074Z" },
+ { url = "https://files.pythonhosted.org/packages/1b/76/c989c278faf037c4d3421ec07a5c452cd3e09545d6dae7f87c15f54e4edf/kiwisolver-1.4.9-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:540c7c72324d864406a009d72f5d6856f49693db95d1fbb46cf86febef873634", size = 2246787, upload-time = "2025-08-10T21:25:49.442Z" },
+ { url = "https://files.pythonhosted.org/packages/a2/55/c2898d84ca440852e560ca9f2a0d28e6e931ac0849b896d77231929900e7/kiwisolver-1.4.9-cp310-cp310-win_amd64.whl", hash = "sha256:ede8c6d533bc6601a47ad4046080d36b8fc99f81e6f1c17b0ac3c2dc91ac7611", size = 73730, upload-time = "2025-08-10T21:25:51.102Z" },
+ { url = "https://files.pythonhosted.org/packages/e8/09/486d6ac523dd33b80b368247f238125d027964cfacb45c654841e88fb2ae/kiwisolver-1.4.9-cp310-cp310-win_arm64.whl", hash = "sha256:7b4da0d01ac866a57dd61ac258c5607b4cd677f63abaec7b148354d2b2cdd536", size = 65036, upload-time = "2025-08-10T21:25:52.063Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/ab/c80b0d5a9d8a1a65f4f815f2afff9798b12c3b9f31f1d304dd233dd920e2/kiwisolver-1.4.9-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:eb14a5da6dc7642b0f3a18f13654847cd8b7a2550e2645a5bda677862b03ba16", size = 124167, upload-time = "2025-08-10T21:25:53.403Z" },
+ { url = "https://files.pythonhosted.org/packages/a0/c0/27fe1a68a39cf62472a300e2879ffc13c0538546c359b86f149cc19f6ac3/kiwisolver-1.4.9-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:39a219e1c81ae3b103643d2aedb90f1ef22650deb266ff12a19e7773f3e5f089", size = 66579, upload-time = "2025-08-10T21:25:54.79Z" },
+ { url = "https://files.pythonhosted.org/packages/31/a2/a12a503ac1fd4943c50f9822678e8015a790a13b5490354c68afb8489814/kiwisolver-1.4.9-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:2405a7d98604b87f3fc28b1716783534b1b4b8510d8142adca34ee0bc3c87543", size = 65309, upload-time = "2025-08-10T21:25:55.76Z" },
+ { url = "https://files.pythonhosted.org/packages/66/e1/e533435c0be77c3f64040d68d7a657771194a63c279f55573188161e81ca/kiwisolver-1.4.9-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:dc1ae486f9abcef254b5618dfb4113dd49f94c68e3e027d03cf0143f3f772b61", size = 1435596, upload-time = "2025-08-10T21:25:56.861Z" },
+ { url = "https://files.pythonhosted.org/packages/67/1e/51b73c7347f9aabdc7215aa79e8b15299097dc2f8e67dee2b095faca9cb0/kiwisolver-1.4.9-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8a1f570ce4d62d718dce3f179ee78dac3b545ac16c0c04bb363b7607a949c0d1", size = 1246548, upload-time = "2025-08-10T21:25:58.246Z" },
+ { url = "https://files.pythonhosted.org/packages/21/aa/72a1c5d1e430294f2d32adb9542719cfb441b5da368d09d268c7757af46c/kiwisolver-1.4.9-cp311-cp311-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:cb27e7b78d716c591e88e0a09a2139c6577865d7f2e152488c2cc6257f460872", size = 1263618, upload-time = "2025-08-10T21:25:59.857Z" },
+ { url = "https://files.pythonhosted.org/packages/a3/af/db1509a9e79dbf4c260ce0cfa3903ea8945f6240e9e59d1e4deb731b1a40/kiwisolver-1.4.9-cp311-cp311-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:15163165efc2f627eb9687ea5f3a28137217d217ac4024893d753f46bce9de26", size = 1317437, upload-time = "2025-08-10T21:26:01.105Z" },
+ { url = "https://files.pythonhosted.org/packages/e0/f2/3ea5ee5d52abacdd12013a94130436e19969fa183faa1e7c7fbc89e9a42f/kiwisolver-1.4.9-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:bdee92c56a71d2b24c33a7d4c2856bd6419d017e08caa7802d2963870e315028", size = 2195742, upload-time = "2025-08-10T21:26:02.675Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/9b/1efdd3013c2d9a2566aa6a337e9923a00590c516add9a1e89a768a3eb2fc/kiwisolver-1.4.9-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:412f287c55a6f54b0650bd9b6dce5aceddb95864a1a90c87af16979d37c89771", size = 2290810, upload-time = "2025-08-10T21:26:04.009Z" },
+ { url = "https://files.pythonhosted.org/packages/fb/e5/cfdc36109ae4e67361f9bc5b41323648cb24a01b9ade18784657e022e65f/kiwisolver-1.4.9-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:2c93f00dcba2eea70af2be5f11a830a742fe6b579a1d4e00f47760ef13be247a", size = 2461579, upload-time = "2025-08-10T21:26:05.317Z" },
+ { url = "https://files.pythonhosted.org/packages/62/86/b589e5e86c7610842213994cdea5add00960076bef4ae290c5fa68589cac/kiwisolver-1.4.9-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f117e1a089d9411663a3207ba874f31be9ac8eaa5b533787024dc07aeb74f464", size = 2268071, upload-time = "2025-08-10T21:26:06.686Z" },
+ { url = "https://files.pythonhosted.org/packages/3b/c6/f8df8509fd1eee6c622febe54384a96cfaf4d43bf2ccec7a0cc17e4715c9/kiwisolver-1.4.9-cp311-cp311-win_amd64.whl", hash = "sha256:be6a04e6c79819c9a8c2373317d19a96048e5a3f90bec587787e86a1153883c2", size = 73840, upload-time = "2025-08-10T21:26:07.94Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/2d/16e0581daafd147bc11ac53f032a2b45eabac897f42a338d0a13c1e5c436/kiwisolver-1.4.9-cp311-cp311-win_arm64.whl", hash = "sha256:0ae37737256ba2de764ddc12aed4956460277f00c4996d51a197e72f62f5eec7", size = 65159, upload-time = "2025-08-10T21:26:09.048Z" },
+ { url = "https://files.pythonhosted.org/packages/86/c9/13573a747838aeb1c76e3267620daa054f4152444d1f3d1a2324b78255b5/kiwisolver-1.4.9-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:ac5a486ac389dddcc5bef4f365b6ae3ffff2c433324fb38dd35e3fab7c957999", size = 123686, upload-time = "2025-08-10T21:26:10.034Z" },
+ { url = "https://files.pythonhosted.org/packages/51/ea/2ecf727927f103ffd1739271ca19c424d0e65ea473fbaeea1c014aea93f6/kiwisolver-1.4.9-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:f2ba92255faa7309d06fe44c3a4a97efe1c8d640c2a79a5ef728b685762a6fd2", size = 66460, upload-time = "2025-08-10T21:26:11.083Z" },
+ { url = "https://files.pythonhosted.org/packages/5b/5a/51f5464373ce2aeb5194508298a508b6f21d3867f499556263c64c621914/kiwisolver-1.4.9-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4a2899935e724dd1074cb568ce7ac0dce28b2cd6ab539c8e001a8578eb106d14", size = 64952, upload-time = "2025-08-10T21:26:12.058Z" },
+ { url = "https://files.pythonhosted.org/packages/70/90/6d240beb0f24b74371762873e9b7f499f1e02166a2d9c5801f4dbf8fa12e/kiwisolver-1.4.9-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:f6008a4919fdbc0b0097089f67a1eb55d950ed7e90ce2cc3e640abadd2757a04", size = 1474756, upload-time = "2025-08-10T21:26:13.096Z" },
+ { url = "https://files.pythonhosted.org/packages/12/42/f36816eaf465220f683fb711efdd1bbf7a7005a2473d0e4ed421389bd26c/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:67bb8b474b4181770f926f7b7d2f8c0248cbcb78b660fdd41a47054b28d2a752", size = 1276404, upload-time = "2025-08-10T21:26:14.457Z" },
+ { url = "https://files.pythonhosted.org/packages/2e/64/bc2de94800adc830c476dce44e9b40fd0809cddeef1fde9fcf0f73da301f/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2327a4a30d3ee07d2fbe2e7933e8a37c591663b96ce42a00bc67461a87d7df77", size = 1294410, upload-time = "2025-08-10T21:26:15.73Z" },
+ { url = "https://files.pythonhosted.org/packages/5f/42/2dc82330a70aa8e55b6d395b11018045e58d0bb00834502bf11509f79091/kiwisolver-1.4.9-cp312-cp312-manylinux_2_24_s390x.manylinux_2_28_s390x.whl", hash = "sha256:7a08b491ec91b1d5053ac177afe5290adacf1f0f6307d771ccac5de30592d198", size = 1343631, upload-time = "2025-08-10T21:26:17.045Z" },
+ { url = "https://files.pythonhosted.org/packages/22/fd/f4c67a6ed1aab149ec5a8a401c323cee7a1cbe364381bb6c9c0d564e0e20/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:d8fc5c867c22b828001b6a38d2eaeb88160bf5783c6cb4a5e440efc981ce286d", size = 2224963, upload-time = "2025-08-10T21:26:18.737Z" },
+ { url = "https://files.pythonhosted.org/packages/45/aa/76720bd4cb3713314677d9ec94dcc21ced3f1baf4830adde5bb9b2430a5f/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:3b3115b2581ea35bb6d1f24a4c90af37e5d9b49dcff267eeed14c3893c5b86ab", size = 2321295, upload-time = "2025-08-10T21:26:20.11Z" },
+ { url = "https://files.pythonhosted.org/packages/80/19/d3ec0d9ab711242f56ae0dc2fc5d70e298bb4a1f9dfab44c027668c673a1/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:858e4c22fb075920b96a291928cb7dea5644e94c0ee4fcd5af7e865655e4ccf2", size = 2487987, upload-time = "2025-08-10T21:26:21.49Z" },
+ { url = "https://files.pythonhosted.org/packages/39/e9/61e4813b2c97e86b6fdbd4dd824bf72d28bcd8d4849b8084a357bc0dd64d/kiwisolver-1.4.9-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ed0fecd28cc62c54b262e3736f8bb2512d8dcfdc2bcf08be5f47f96bf405b145", size = 2291817, upload-time = "2025-08-10T21:26:22.812Z" },
+ { url = "https://files.pythonhosted.org/packages/a0/41/85d82b0291db7504da3c2defe35c9a8a5c9803a730f297bd823d11d5fb77/kiwisolver-1.4.9-cp312-cp312-win_amd64.whl", hash = "sha256:f68208a520c3d86ea51acf688a3e3002615a7f0238002cccc17affecc86a8a54", size = 73895, upload-time = "2025-08-10T21:26:24.37Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/92/5f3068cf15ee5cb624a0c7596e67e2a0bb2adee33f71c379054a491d07da/kiwisolver-1.4.9-cp312-cp312-win_arm64.whl", hash = "sha256:2c1a4f57df73965f3f14df20b80ee29e6a7930a57d2d9e8491a25f676e197c60", size = 64992, upload-time = "2025-08-10T21:26:25.732Z" },
+ { url = "https://files.pythonhosted.org/packages/a2/63/fde392691690f55b38d5dd7b3710f5353bf7a8e52de93a22968801ab8978/kiwisolver-1.4.9-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:4d1d9e582ad4d63062d34077a9a1e9f3c34088a2ec5135b1f7190c07cf366527", size = 60183, upload-time = "2025-08-10T21:27:37.669Z" },
+ { url = "https://files.pythonhosted.org/packages/27/b1/6aad34edfdb7cced27f371866f211332bba215bfd918ad3322a58f480d8b/kiwisolver-1.4.9-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:deed0c7258ceb4c44ad5ec7d9918f9f14fd05b2be86378d86cf50e63d1e7b771", size = 58675, upload-time = "2025-08-10T21:27:39.031Z" },
+ { url = "https://files.pythonhosted.org/packages/9d/1a/23d855a702bb35a76faed5ae2ba3de57d323f48b1f6b17ee2176c4849463/kiwisolver-1.4.9-pp310-pypy310_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0a590506f303f512dff6b7f75fd2fd18e16943efee932008fe7140e5fa91d80e", size = 80277, upload-time = "2025-08-10T21:27:40.129Z" },
+ { url = "https://files.pythonhosted.org/packages/5a/5b/5239e3c2b8fb5afa1e8508f721bb77325f740ab6994d963e61b2b7abcc1e/kiwisolver-1.4.9-pp310-pypy310_pp73-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e09c2279a4d01f099f52d5c4b3d9e208e91edcbd1a175c9662a8b16e000fece9", size = 77994, upload-time = "2025-08-10T21:27:41.181Z" },
+ { url = "https://files.pythonhosted.org/packages/f9/1c/5d4d468fb16f8410e596ed0eac02d2c68752aa7dc92997fe9d60a7147665/kiwisolver-1.4.9-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:c9e7cdf45d594ee04d5be1b24dd9d49f3d1590959b2271fb30b5ca2b262c00fb", size = 73744, upload-time = "2025-08-10T21:27:42.254Z" },
+ { url = "https://files.pythonhosted.org/packages/a3/0f/36d89194b5a32c054ce93e586d4049b6c2c22887b0eb229c61c68afd3078/kiwisolver-1.4.9-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:720e05574713db64c356e86732c0f3c5252818d05f9df320f0ad8380641acea5", size = 60104, upload-time = "2025-08-10T21:27:43.287Z" },
+ { url = "https://files.pythonhosted.org/packages/52/ba/4ed75f59e4658fd21fe7dde1fee0ac397c678ec3befba3fe6482d987af87/kiwisolver-1.4.9-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:17680d737d5335b552994a2008fab4c851bcd7de33094a82067ef3a576ff02fa", size = 58592, upload-time = "2025-08-10T21:27:44.314Z" },
+ { url = "https://files.pythonhosted.org/packages/33/01/a8ea7c5ea32a9b45ceeaee051a04c8ed4320f5add3c51bfa20879b765b70/kiwisolver-1.4.9-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:85b5352f94e490c028926ea567fc569c52ec79ce131dadb968d3853e809518c2", size = 80281, upload-time = "2025-08-10T21:27:45.369Z" },
+ { url = "https://files.pythonhosted.org/packages/da/e3/dbd2ecdce306f1d07a1aaf324817ee993aab7aee9db47ceac757deabafbe/kiwisolver-1.4.9-pp311-pypy311_pp73-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:464415881e4801295659462c49461a24fb107c140de781d55518c4b80cb6790f", size = 78009, upload-time = "2025-08-10T21:27:46.376Z" },
+ { url = "https://files.pythonhosted.org/packages/da/e9/0d4add7873a73e462aeb45c036a2dead2562b825aa46ba326727b3f31016/kiwisolver-1.4.9-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:fb940820c63a9590d31d88b815e7a3aa5915cad3ce735ab45f0c730b39547de1", size = 73929, upload-time = "2025-08-10T21:27:48.236Z" },
+]
+
+[[package]]
+name = "lark"
+version = "1.3.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/da/34/28fff3ab31ccff1fd4f6c7c7b0ceb2b6968d8ea4950663eadcb5720591a0/lark-1.3.1.tar.gz", hash = "sha256:b426a7a6d6d53189d318f2b6236ab5d6429eaf09259f1ca33eb716eed10d2905", size = 382732, upload-time = "2025-10-27T18:25:56.653Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/82/3d/14ce75ef66813643812f3093ab17e46d3a206942ce7376d31ec2d36229e7/lark-1.3.1-py3-none-any.whl", hash = "sha256:c629b661023a014c37da873b4ff58a817398d12635d3bbb2c5a03be7fe5d1e12", size = 113151, upload-time = "2025-10-27T18:25:54.882Z" },
+]
+
+[[package]]
+name = "lazy-loader"
+version = "0.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "packaging" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/6f/6b/c875b30a1ba490860c93da4cabf479e03f584eba06fe5963f6f6644653d8/lazy_loader-0.4.tar.gz", hash = "sha256:47c75182589b91a4e1a85a136c074285a5ad4d9f39c63e0d7fb76391c4574cd1", size = 15431, upload-time = "2024-04-05T13:03:12.261Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/83/60/d497a310bde3f01cb805196ac61b7ad6dc5dcf8dce66634dc34364b20b4f/lazy_loader-0.4-py3-none-any.whl", hash = "sha256:342aa8e14d543a154047afb4ba8ef17f5563baad3fc610d7b15b213b0f119efc", size = 12097, upload-time = "2024-04-05T13:03:10.514Z" },
+]
+
+[[package]]
+name = "libcst"
+version = "1.8.6"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "pyyaml" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/de/cd/337df968b38d94c5aabd3e1b10630f047a2b345f6e1d4456bd9fe7417537/libcst-1.8.6.tar.gz", hash = "sha256:f729c37c9317126da9475bdd06a7208eb52fcbd180a6341648b45a56b4ba708b", size = 891354, upload-time = "2025-11-03T22:33:30.621Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c4/52/97d5454dee9d014821fe0c88f3dc0e83131b97dd074a4d49537056a75475/libcst-1.8.6-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:a20c5182af04332cc94d8520792befda06d73daf2865e6dddc5161c72ea92cb9", size = 2211698, upload-time = "2025-11-03T22:31:50.117Z" },
+ { url = "https://files.pythonhosted.org/packages/6c/a4/d1205985d378164687af3247a9c8f8bdb96278b0686ac98ab951bc6d336a/libcst-1.8.6-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:36473e47cb199b7e6531d653ee6ffed057de1d179301e6c67f651f3af0b499d6", size = 2093104, upload-time = "2025-11-03T22:31:52.189Z" },
+ { url = "https://files.pythonhosted.org/packages/9e/de/1338da681b7625b51e584922576d54f1b8db8fc7ff4dc79121afc5d4d2cd/libcst-1.8.6-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:06fc56335a45d61b7c1b856bfab4587b84cfe31e9d6368f60bb3c9129d900f58", size = 2237419, upload-time = "2025-11-03T22:31:53.526Z" },
+ { url = "https://files.pythonhosted.org/packages/50/06/ee66f2d83b870534756e593d464d8b33b0914c224dff3a407e0f74dc04e0/libcst-1.8.6-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:6b23d14a7fc0addd9795795763af26b185deb7c456b1e7cc4d5228e69dab5ce8", size = 2300820, upload-time = "2025-11-03T22:31:55.995Z" },
+ { url = "https://files.pythonhosted.org/packages/9c/ca/959088729de8e0eac8dd516e4fb8623d8d92bad539060fa85c9e94d418a5/libcst-1.8.6-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:16cfe0cfca5fd840e1fb2c30afb628b023d3085b30c3484a79b61eae9d6fe7ba", size = 2301201, upload-time = "2025-11-03T22:31:57.347Z" },
+ { url = "https://files.pythonhosted.org/packages/c2/4c/2a21a8c452436097dfe1da277f738c3517f3f728713f16d84b9a3d67ca8d/libcst-1.8.6-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:455f49a93aea4070132c30ebb6c07c2dea0ba6c1fde5ffde59fc45dbb9cfbe4b", size = 2408213, upload-time = "2025-11-03T22:31:59.221Z" },
+ { url = "https://files.pythonhosted.org/packages/3e/26/8f7b671fad38a515bb20b038718fd2221ab658299119ac9bcec56c2ced27/libcst-1.8.6-cp310-cp310-win_amd64.whl", hash = "sha256:72cca15800ffc00ba25788e4626189fe0bc5fe2a0c1cb4294bce2e4df21cc073", size = 2119189, upload-time = "2025-11-03T22:32:00.696Z" },
+ { url = "https://files.pythonhosted.org/packages/5b/bf/ffb23a48e27001165cc5c81c5d9b3d6583b21b7f5449109e03a0020b060c/libcst-1.8.6-cp310-cp310-win_arm64.whl", hash = "sha256:6cad63e3a26556b020b634d25a8703b605c0e0b491426b3e6b9e12ed20f09100", size = 2001736, upload-time = "2025-11-03T22:32:02.986Z" },
+ { url = "https://files.pythonhosted.org/packages/dc/15/95c2ecadc0fb4af8a7057ac2012a4c0ad5921b9ef1ace6c20006b56d3b5f/libcst-1.8.6-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:3649a813660fbffd7bc24d3f810b1f75ac98bd40d9d6f56d1f0ee38579021073", size = 2211289, upload-time = "2025-11-03T22:32:04.673Z" },
+ { url = "https://files.pythonhosted.org/packages/80/c3/7e1107acd5ed15cf60cc07c7bb64498a33042dc4821874aea3ec4942f3cd/libcst-1.8.6-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:0cbe17067055829607c5ba4afa46bfa4d0dd554c0b5a583546e690b7367a29b6", size = 2092927, upload-time = "2025-11-03T22:32:06.209Z" },
+ { url = "https://files.pythonhosted.org/packages/c1/ff/0d2be87f67e2841a4a37d35505e74b65991d30693295c46fc0380ace0454/libcst-1.8.6-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:59a7e388c57d21d63722018978a8ddba7b176e3a99bd34b9b84a576ed53f2978", size = 2237002, upload-time = "2025-11-03T22:32:07.559Z" },
+ { url = "https://files.pythonhosted.org/packages/69/99/8c4a1b35c7894ccd7d33eae01ac8967122f43da41325223181ca7e4738fe/libcst-1.8.6-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:b6c1248cc62952a3a005792b10cdef2a4e130847be9c74f33a7d617486f7e532", size = 2301048, upload-time = "2025-11-03T22:32:08.869Z" },
+ { url = "https://files.pythonhosted.org/packages/9b/8b/d1aa811eacf936cccfb386ae0585aa530ea1221ccf528d67144e041f5915/libcst-1.8.6-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:6421a930b028c5ef4a943b32a5a78b7f1bf15138214525a2088f11acbb7d3d64", size = 2300675, upload-time = "2025-11-03T22:32:10.579Z" },
+ { url = "https://files.pythonhosted.org/packages/c6/6b/7b65cd41f25a10c1fef2389ddc5c2b2cc23dc4d648083fa3e1aa7e0eeac2/libcst-1.8.6-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:6d8b67874f2188399a71a71731e1ba2d1a2c3173b7565d1cc7ffb32e8fbaba5b", size = 2407934, upload-time = "2025-11-03T22:32:11.856Z" },
+ { url = "https://files.pythonhosted.org/packages/c5/8b/401cfff374bb3b785adfad78f05225225767ee190997176b2a9da9ed9460/libcst-1.8.6-cp311-cp311-win_amd64.whl", hash = "sha256:b0d8c364c44ae343937f474b2e492c1040df96d94530377c2f9263fb77096e4f", size = 2119247, upload-time = "2025-11-03T22:32:13.279Z" },
+ { url = "https://files.pythonhosted.org/packages/f1/17/085f59eaa044b6ff6bc42148a5449df2b7f0ba567307de7782fe85c39ee2/libcst-1.8.6-cp311-cp311-win_arm64.whl", hash = "sha256:5dcaaebc835dfe5755bc85f9b186fb7e2895dda78e805e577fef1011d51d5a5c", size = 2001774, upload-time = "2025-11-03T22:32:14.647Z" },
+ { url = "https://files.pythonhosted.org/packages/0c/3c/93365c17da3d42b055a8edb0e1e99f1c60c776471db6c9b7f1ddf6a44b28/libcst-1.8.6-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:0c13d5bd3d8414a129e9dccaf0e5785108a4441e9b266e1e5e9d1f82d1b943c9", size = 2206166, upload-time = "2025-11-03T22:32:16.012Z" },
+ { url = "https://files.pythonhosted.org/packages/1d/cb/7530940e6ac50c6dd6022349721074e19309eb6aa296e942ede2213c1a19/libcst-1.8.6-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f1472eeafd67cdb22544e59cf3bfc25d23dc94058a68cf41f6654ff4fcb92e09", size = 2083726, upload-time = "2025-11-03T22:32:17.312Z" },
+ { url = "https://files.pythonhosted.org/packages/1b/cf/7e5eaa8c8f2c54913160671575351d129170db757bb5e4b7faffed022271/libcst-1.8.6-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:089c58e75cb142ec33738a1a4ea7760a28b40c078ab2fd26b270dac7d2633a4d", size = 2235755, upload-time = "2025-11-03T22:32:18.859Z" },
+ { url = "https://files.pythonhosted.org/packages/55/54/570ec2b0e9a3de0af9922e3bb1b69a5429beefbc753a7ea770a27ad308bd/libcst-1.8.6-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:c9d7aeafb1b07d25a964b148c0dda9451efb47bbbf67756e16eeae65004b0eb5", size = 2301473, upload-time = "2025-11-03T22:32:20.499Z" },
+ { url = "https://files.pythonhosted.org/packages/11/4c/163457d1717cd12181c421a4cca493454bcabd143fc7e53313bc6a4ad82a/libcst-1.8.6-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:207481197afd328aa91d02670c15b48d0256e676ce1ad4bafb6dc2b593cc58f1", size = 2298899, upload-time = "2025-11-03T22:32:21.765Z" },
+ { url = "https://files.pythonhosted.org/packages/35/1d/317ddef3669883619ef3d3395ea583305f353ef4ad87d7a5ac1c39be38e3/libcst-1.8.6-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:375965f34cc6f09f5f809244d3ff9bd4f6cb6699f571121cebce53622e7e0b86", size = 2408239, upload-time = "2025-11-03T22:32:23.275Z" },
+ { url = "https://files.pythonhosted.org/packages/9a/a1/f47d8cccf74e212dd6044b9d6dbc223636508da99acff1d54786653196bc/libcst-1.8.6-cp312-cp312-win_amd64.whl", hash = "sha256:da95b38693b989eaa8d32e452e8261cfa77fe5babfef1d8d2ac25af8c4aa7e6d", size = 2119660, upload-time = "2025-11-03T22:32:24.822Z" },
+ { url = "https://files.pythonhosted.org/packages/19/d0/dd313bf6a7942cdf951828f07ecc1a7695263f385065edc75ef3016a3cb5/libcst-1.8.6-cp312-cp312-win_arm64.whl", hash = "sha256:bff00e1c766658adbd09a175267f8b2f7616e5ee70ce45db3d7c4ce6d9f6bec7", size = 1999824, upload-time = "2025-11-03T22:32:26.131Z" },
+]
+
+[[package]]
+name = "lightning-utilities"
+version = "0.15.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "packaging" },
+ { name = "setuptools" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b8/39/6fc58ca81492db047149b4b8fd385aa1bfb8c28cd7cacb0c7eb0c44d842f/lightning_utilities-0.15.2.tar.gz", hash = "sha256:cdf12f530214a63dacefd713f180d1ecf5d165338101617b4742e8f22c032e24", size = 31090, upload-time = "2025-08-06T13:57:39.242Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/de/73/3d757cb3fc16f0f9794dd289bcd0c4a031d9cf54d8137d6b984b2d02edf3/lightning_utilities-0.15.2-py3-none-any.whl", hash = "sha256:ad3ab1703775044bbf880dbf7ddaaac899396c96315f3aa1779cec9d618a9841", size = 29431, upload-time = "2025-08-06T13:57:38.046Z" },
+]
+
+[[package]]
+name = "llvmlite"
+version = "0.46.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/74/cd/08ae687ba099c7e3d21fe2ea536500563ef1943c5105bf6ab4ee3829f68e/llvmlite-0.46.0.tar.gz", hash = "sha256:227c9fd6d09dce2783c18b754b7cd9d9b3b3515210c46acc2d3c5badd9870ceb", size = 193456, upload-time = "2025-12-08T18:15:36.295Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3d/a4/3959e1c61c5ca9db7921e5fd115b344c29b9d57a5dadd87bef97963ca1a5/llvmlite-0.46.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:4323177e936d61ae0f73e653e2e614284d97d14d5dd12579adc92b6c2b0597b0", size = 37232766, upload-time = "2025-12-08T18:14:34.765Z" },
+ { url = "https://files.pythonhosted.org/packages/c2/a5/a4d916f1015106e1da876028606a8e87fd5d5c840f98c87bc2d5153b6a2f/llvmlite-0.46.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0a2d461cb89537b7c20feb04c46c32e12d5ad4f0896c9dfc0f60336219ff248e", size = 56275176, upload-time = "2025-12-08T18:14:37.944Z" },
+ { url = "https://files.pythonhosted.org/packages/79/7f/a7f2028805dac8c1a6fae7bda4e739b7ebbcd45b29e15bf6d21556fcd3d5/llvmlite-0.46.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b1f6595a35b7b39c3518b85a28bf18f45e075264e4b2dce3f0c2a4f232b4a910", size = 55128629, upload-time = "2025-12-08T18:14:41.674Z" },
+ { url = "https://files.pythonhosted.org/packages/b2/bc/4689e1ba0c073c196b594471eb21be0aa51d9e64b911728aa13cd85ef0ae/llvmlite-0.46.0-cp310-cp310-win_amd64.whl", hash = "sha256:e7a34d4aa6f9a97ee006b504be6d2b8cb7f755b80ab2f344dda1ef992f828559", size = 38138651, upload-time = "2025-12-08T18:14:45.845Z" },
+ { url = "https://files.pythonhosted.org/packages/7a/a1/2ad4b2367915faeebe8447f0a057861f646dbf5fbbb3561db42c65659cf3/llvmlite-0.46.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:82f3d39b16f19aa1a56d5fe625883a6ab600d5cc9ea8906cca70ce94cabba067", size = 37232766, upload-time = "2025-12-08T18:14:48.836Z" },
+ { url = "https://files.pythonhosted.org/packages/12/b5/99cf8772fdd846c07da4fd70f07812a3c8fd17ea2409522c946bb0f2b277/llvmlite-0.46.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:a3df43900119803bbc52720e758c76f316a9a0f34612a886862dfe0a5591a17e", size = 56275175, upload-time = "2025-12-08T18:14:51.604Z" },
+ { url = "https://files.pythonhosted.org/packages/38/f2/ed806f9c003563732da156139c45d970ee435bd0bfa5ed8de87ba972b452/llvmlite-0.46.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:de183fefc8022d21b0aa37fc3e90410bc3524aed8617f0ff76732fc6c3af5361", size = 55128630, upload-time = "2025-12-08T18:14:55.107Z" },
+ { url = "https://files.pythonhosted.org/packages/19/0c/8f5a37a65fc9b7b17408508145edd5f86263ad69c19d3574e818f533a0eb/llvmlite-0.46.0-cp311-cp311-win_amd64.whl", hash = "sha256:e8b10bc585c58bdffec9e0c309bb7d51be1f2f15e169a4b4d42f2389e431eb93", size = 38138652, upload-time = "2025-12-08T18:14:58.171Z" },
+ { url = "https://files.pythonhosted.org/packages/2b/f8/4db016a5e547d4e054ff2f3b99203d63a497465f81ab78ec8eb2ff7b2304/llvmlite-0.46.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6b9588ad4c63b4f0175a3984b85494f0c927c6b001e3a246a3a7fb3920d9a137", size = 37232767, upload-time = "2025-12-08T18:15:00.737Z" },
+ { url = "https://files.pythonhosted.org/packages/aa/85/4890a7c14b4fa54400945cb52ac3cd88545bbdb973c440f98ca41591cdc5/llvmlite-0.46.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3535bd2bb6a2d7ae4012681ac228e5132cdb75fefb1bcb24e33f2f3e0c865ed4", size = 56275176, upload-time = "2025-12-08T18:15:03.936Z" },
+ { url = "https://files.pythonhosted.org/packages/6a/07/3d31d39c1a1a08cd5337e78299fca77e6aebc07c059fbd0033e3edfab45c/llvmlite-0.46.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4cbfd366e60ff87ea6cc62f50bc4cd800ebb13ed4c149466f50cf2163a473d1e", size = 55128630, upload-time = "2025-12-08T18:15:07.196Z" },
+ { url = "https://files.pythonhosted.org/packages/2a/6b/d139535d7590a1bba1ceb68751bef22fadaa5b815bbdf0e858e3875726b2/llvmlite-0.46.0-cp312-cp312-win_amd64.whl", hash = "sha256:398b39db462c39563a97b912d4f2866cd37cba60537975a09679b28fbbc0fb38", size = 38138940, upload-time = "2025-12-08T18:15:10.162Z" },
+]
+
+[[package]]
+name = "markdown"
+version = "3.10.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b7/b1/af95bcae8549f1f3fd70faacb29075826a0d689a27f232e8cee315efa053/markdown-3.10.1.tar.gz", hash = "sha256:1c19c10bd5c14ac948c53d0d762a04e2fa35a6d58a6b7b1e6bfcbe6fefc0001a", size = 365402, upload-time = "2026-01-21T18:09:28.206Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/59/1b/6ef961f543593969d25b2afe57a3564200280528caa9bd1082eecdd7b3bc/markdown-3.10.1-py3-none-any.whl", hash = "sha256:867d788939fe33e4b736426f5b9f651ad0c0ae0ecf89df0ca5d1176c70812fe3", size = 107684, upload-time = "2026-01-21T18:09:27.203Z" },
+]
+
+[[package]]
+name = "markupsafe"
+version = "3.0.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/7e/99/7690b6d4034fffd95959cbe0c02de8deb3098cc577c67bb6a24fe5d7caa7/markupsafe-3.0.3.tar.gz", hash = "sha256:722695808f4b6457b320fdc131280796bdceb04ab50fe1795cd540799ebe1698", size = 80313, upload-time = "2025-09-27T18:37:40.426Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e8/4b/3541d44f3937ba468b75da9eebcae497dcf67adb65caa16760b0a6807ebb/markupsafe-3.0.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:2f981d352f04553a7171b8e44369f2af4055f888dfb147d55e42d29e29e74559", size = 11631, upload-time = "2025-09-27T18:36:05.558Z" },
+ { url = "https://files.pythonhosted.org/packages/98/1b/fbd8eed11021cabd9226c37342fa6ca4e8a98d8188a8d9b66740494960e4/markupsafe-3.0.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:e1c1493fb6e50ab01d20a22826e57520f1284df32f2d8601fdd90b6304601419", size = 12057, upload-time = "2025-09-27T18:36:07.165Z" },
+ { url = "https://files.pythonhosted.org/packages/40/01/e560d658dc0bb8ab762670ece35281dec7b6c1b33f5fbc09ebb57a185519/markupsafe-3.0.3-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1ba88449deb3de88bd40044603fafffb7bc2b055d626a330323a9ed736661695", size = 22050, upload-time = "2025-09-27T18:36:08.005Z" },
+ { url = "https://files.pythonhosted.org/packages/af/cd/ce6e848bbf2c32314c9b237839119c5a564a59725b53157c856e90937b7a/markupsafe-3.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f42d0984e947b8adf7dd6dde396e720934d12c506ce84eea8476409563607591", size = 20681, upload-time = "2025-09-27T18:36:08.881Z" },
+ { url = "https://files.pythonhosted.org/packages/c9/2a/b5c12c809f1c3045c4d580b035a743d12fcde53cf685dbc44660826308da/markupsafe-3.0.3-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:c0c0b3ade1c0b13b936d7970b1d37a57acde9199dc2aecc4c336773e1d86049c", size = 20705, upload-time = "2025-09-27T18:36:10.131Z" },
+ { url = "https://files.pythonhosted.org/packages/cf/e3/9427a68c82728d0a88c50f890d0fc072a1484de2f3ac1ad0bfc1a7214fd5/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:0303439a41979d9e74d18ff5e2dd8c43ed6c6001fd40e5bf2e43f7bd9bbc523f", size = 21524, upload-time = "2025-09-27T18:36:11.324Z" },
+ { url = "https://files.pythonhosted.org/packages/bc/36/23578f29e9e582a4d0278e009b38081dbe363c5e7165113fad546918a232/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:d2ee202e79d8ed691ceebae8e0486bd9a2cd4794cec4824e1c99b6f5009502f6", size = 20282, upload-time = "2025-09-27T18:36:12.573Z" },
+ { url = "https://files.pythonhosted.org/packages/56/21/dca11354e756ebd03e036bd8ad58d6d7168c80ce1fe5e75218e4945cbab7/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:177b5253b2834fe3678cb4a5f0059808258584c559193998be2601324fdeafb1", size = 20745, upload-time = "2025-09-27T18:36:13.504Z" },
+ { url = "https://files.pythonhosted.org/packages/87/99/faba9369a7ad6e4d10b6a5fbf71fa2a188fe4a593b15f0963b73859a1bbd/markupsafe-3.0.3-cp310-cp310-win32.whl", hash = "sha256:2a15a08b17dd94c53a1da0438822d70ebcd13f8c3a95abe3a9ef9f11a94830aa", size = 14571, upload-time = "2025-09-27T18:36:14.779Z" },
+ { url = "https://files.pythonhosted.org/packages/d6/25/55dc3ab959917602c96985cb1253efaa4ff42f71194bddeb61eb7278b8be/markupsafe-3.0.3-cp310-cp310-win_amd64.whl", hash = "sha256:c4ffb7ebf07cfe8931028e3e4c85f0357459a3f9f9490886198848f4fa002ec8", size = 15056, upload-time = "2025-09-27T18:36:16.125Z" },
+ { url = "https://files.pythonhosted.org/packages/d0/9e/0a02226640c255d1da0b8d12e24ac2aa6734da68bff14c05dd53b94a0fc3/markupsafe-3.0.3-cp310-cp310-win_arm64.whl", hash = "sha256:e2103a929dfa2fcaf9bb4e7c091983a49c9ac3b19c9061b6d5427dd7d14d81a1", size = 13932, upload-time = "2025-09-27T18:36:17.311Z" },
+ { url = "https://files.pythonhosted.org/packages/08/db/fefacb2136439fc8dd20e797950e749aa1f4997ed584c62cfb8ef7c2be0e/markupsafe-3.0.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:1cc7ea17a6824959616c525620e387f6dd30fec8cb44f649e31712db02123dad", size = 11631, upload-time = "2025-09-27T18:36:18.185Z" },
+ { url = "https://files.pythonhosted.org/packages/e1/2e/5898933336b61975ce9dc04decbc0a7f2fee78c30353c5efba7f2d6ff27a/markupsafe-3.0.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:4bd4cd07944443f5a265608cc6aab442e4f74dff8088b0dfc8238647b8f6ae9a", size = 12058, upload-time = "2025-09-27T18:36:19.444Z" },
+ { url = "https://files.pythonhosted.org/packages/1d/09/adf2df3699d87d1d8184038df46a9c80d78c0148492323f4693df54e17bb/markupsafe-3.0.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6b5420a1d9450023228968e7e6a9ce57f65d148ab56d2313fcd589eee96a7a50", size = 24287, upload-time = "2025-09-27T18:36:20.768Z" },
+ { url = "https://files.pythonhosted.org/packages/30/ac/0273f6fcb5f42e314c6d8cd99effae6a5354604d461b8d392b5ec9530a54/markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0bf2a864d67e76e5c9a34dc26ec616a66b9888e25e7b9460e1c76d3293bd9dbf", size = 22940, upload-time = "2025-09-27T18:36:22.249Z" },
+ { url = "https://files.pythonhosted.org/packages/19/ae/31c1be199ef767124c042c6c3e904da327a2f7f0cd63a0337e1eca2967a8/markupsafe-3.0.3-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:bc51efed119bc9cfdf792cdeaa4d67e8f6fcccab66ed4bfdd6bde3e59bfcbb2f", size = 21887, upload-time = "2025-09-27T18:36:23.535Z" },
+ { url = "https://files.pythonhosted.org/packages/b2/76/7edcab99d5349a4532a459e1fe64f0b0467a3365056ae550d3bcf3f79e1e/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:068f375c472b3e7acbe2d5318dea141359e6900156b5b2ba06a30b169086b91a", size = 23692, upload-time = "2025-09-27T18:36:24.823Z" },
+ { url = "https://files.pythonhosted.org/packages/a4/28/6e74cdd26d7514849143d69f0bf2399f929c37dc2b31e6829fd2045b2765/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:7be7b61bb172e1ed687f1754f8e7484f1c8019780f6f6b0786e76bb01c2ae115", size = 21471, upload-time = "2025-09-27T18:36:25.95Z" },
+ { url = "https://files.pythonhosted.org/packages/62/7e/a145f36a5c2945673e590850a6f8014318d5577ed7e5920a4b3448e0865d/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f9e130248f4462aaa8e2552d547f36ddadbeaa573879158d721bbd33dfe4743a", size = 22923, upload-time = "2025-09-27T18:36:27.109Z" },
+ { url = "https://files.pythonhosted.org/packages/0f/62/d9c46a7f5c9adbeeeda52f5b8d802e1094e9717705a645efc71b0913a0a8/markupsafe-3.0.3-cp311-cp311-win32.whl", hash = "sha256:0db14f5dafddbb6d9208827849fad01f1a2609380add406671a26386cdf15a19", size = 14572, upload-time = "2025-09-27T18:36:28.045Z" },
+ { url = "https://files.pythonhosted.org/packages/83/8a/4414c03d3f891739326e1783338e48fb49781cc915b2e0ee052aa490d586/markupsafe-3.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:de8a88e63464af587c950061a5e6a67d3632e36df62b986892331d4620a35c01", size = 15077, upload-time = "2025-09-27T18:36:29.025Z" },
+ { url = "https://files.pythonhosted.org/packages/35/73/893072b42e6862f319b5207adc9ae06070f095b358655f077f69a35601f0/markupsafe-3.0.3-cp311-cp311-win_arm64.whl", hash = "sha256:3b562dd9e9ea93f13d53989d23a7e775fdfd1066c33494ff43f5418bc8c58a5c", size = 13876, upload-time = "2025-09-27T18:36:29.954Z" },
+ { url = "https://files.pythonhosted.org/packages/5a/72/147da192e38635ada20e0a2e1a51cf8823d2119ce8883f7053879c2199b5/markupsafe-3.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:d53197da72cc091b024dd97249dfc7794d6a56530370992a5e1a08983ad9230e", size = 11615, upload-time = "2025-09-27T18:36:30.854Z" },
+ { url = "https://files.pythonhosted.org/packages/9a/81/7e4e08678a1f98521201c3079f77db69fb552acd56067661f8c2f534a718/markupsafe-3.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1872df69a4de6aead3491198eaf13810b565bdbeec3ae2dc8780f14458ec73ce", size = 12020, upload-time = "2025-09-27T18:36:31.971Z" },
+ { url = "https://files.pythonhosted.org/packages/1e/2c/799f4742efc39633a1b54a92eec4082e4f815314869865d876824c257c1e/markupsafe-3.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3a7e8ae81ae39e62a41ec302f972ba6ae23a5c5396c8e60113e9066ef893da0d", size = 24332, upload-time = "2025-09-27T18:36:32.813Z" },
+ { url = "https://files.pythonhosted.org/packages/3c/2e/8d0c2ab90a8c1d9a24f0399058ab8519a3279d1bd4289511d74e909f060e/markupsafe-3.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d6dd0be5b5b189d31db7cda48b91d7e0a9795f31430b7f271219ab30f1d3ac9d", size = 22947, upload-time = "2025-09-27T18:36:33.86Z" },
+ { url = "https://files.pythonhosted.org/packages/2c/54/887f3092a85238093a0b2154bd629c89444f395618842e8b0c41783898ea/markupsafe-3.0.3-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:94c6f0bb423f739146aec64595853541634bde58b2135f27f61c1ffd1cd4d16a", size = 21962, upload-time = "2025-09-27T18:36:35.099Z" },
+ { url = "https://files.pythonhosted.org/packages/c9/2f/336b8c7b6f4a4d95e91119dc8521402461b74a485558d8f238a68312f11c/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:be8813b57049a7dc738189df53d69395eba14fb99345e0a5994914a3864c8a4b", size = 23760, upload-time = "2025-09-27T18:36:36.001Z" },
+ { url = "https://files.pythonhosted.org/packages/32/43/67935f2b7e4982ffb50a4d169b724d74b62a3964bc1a9a527f5ac4f1ee2b/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:83891d0e9fb81a825d9a6d61e3f07550ca70a076484292a70fde82c4b807286f", size = 21529, upload-time = "2025-09-27T18:36:36.906Z" },
+ { url = "https://files.pythonhosted.org/packages/89/e0/4486f11e51bbba8b0c041098859e869e304d1c261e59244baa3d295d47b7/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:77f0643abe7495da77fb436f50f8dab76dbc6e5fd25d39589a0f1fe6548bfa2b", size = 23015, upload-time = "2025-09-27T18:36:37.868Z" },
+ { url = "https://files.pythonhosted.org/packages/2f/e1/78ee7a023dac597a5825441ebd17170785a9dab23de95d2c7508ade94e0e/markupsafe-3.0.3-cp312-cp312-win32.whl", hash = "sha256:d88b440e37a16e651bda4c7c2b930eb586fd15ca7406cb39e211fcff3bf3017d", size = 14540, upload-time = "2025-09-27T18:36:38.761Z" },
+ { url = "https://files.pythonhosted.org/packages/aa/5b/bec5aa9bbbb2c946ca2733ef9c4ca91c91b6a24580193e891b5f7dbe8e1e/markupsafe-3.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:26a5784ded40c9e318cfc2bdb30fe164bdb8665ded9cd64d500a34fb42067b1c", size = 15105, upload-time = "2025-09-27T18:36:39.701Z" },
+ { url = "https://files.pythonhosted.org/packages/e5/f1/216fc1bbfd74011693a4fd837e7026152e89c4bcf3e77b6692fba9923123/markupsafe-3.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:35add3b638a5d900e807944a078b51922212fb3dedb01633a8defc4b01a3c85f", size = 13906, upload-time = "2025-09-27T18:36:40.689Z" },
+]
+
+[[package]]
+name = "matplotlib"
+version = "3.10.8"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "contourpy", version = "1.3.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "contourpy", version = "1.3.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "cycler" },
+ { name = "fonttools" },
+ { name = "kiwisolver" },
+ { name = "numpy" },
+ { name = "packaging" },
+ { name = "pillow" },
+ { name = "pyparsing" },
+ { name = "python-dateutil" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/8a/76/d3c6e3a13fe484ebe7718d14e269c9569c4eb0020a968a327acb3b9a8fe6/matplotlib-3.10.8.tar.gz", hash = "sha256:2299372c19d56bcd35cf05a2738308758d32b9eaed2371898d8f5bd33f084aa3", size = 34806269, upload-time = "2025-12-10T22:56:51.155Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/58/be/a30bd917018ad220c400169fba298f2bb7003c8ccbc0c3e24ae2aacad1e8/matplotlib-3.10.8-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:00270d217d6b20d14b584c521f810d60c5c78406dc289859776550df837dcda7", size = 8239828, upload-time = "2025-12-10T22:55:02.313Z" },
+ { url = "https://files.pythonhosted.org/packages/58/27/ca01e043c4841078e82cf6e80a6993dfecd315c3d79f5f3153afbb8e1ec6/matplotlib-3.10.8-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:37b3c1cc42aa184b3f738cfa18c1c1d72fd496d85467a6cf7b807936d39aa656", size = 8128050, upload-time = "2025-12-10T22:55:04.997Z" },
+ { url = "https://files.pythonhosted.org/packages/cb/aa/7ab67f2b729ae6a91bcf9dcac0affb95fb8c56f7fd2b2af894ae0b0cf6fa/matplotlib-3.10.8-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:ee40c27c795bda6a5292e9cff9890189d32f7e3a0bf04e0e3c9430c4a00c37df", size = 8700452, upload-time = "2025-12-10T22:55:07.47Z" },
+ { url = "https://files.pythonhosted.org/packages/73/ae/2d5817b0acee3c49b7e7ccfbf5b273f284957cc8e270adf36375db353190/matplotlib-3.10.8-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a48f2b74020919552ea25d222d5cc6af9ca3f4eb43a93e14d068457f545c2a17", size = 9534928, upload-time = "2025-12-10T22:55:10.566Z" },
+ { url = "https://files.pythonhosted.org/packages/c9/5b/8e66653e9f7c39cb2e5cab25fce4810daffa2bff02cbf5f3077cea9e942c/matplotlib-3.10.8-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:f254d118d14a7f99d616271d6c3c27922c092dac11112670b157798b89bf4933", size = 9586377, upload-time = "2025-12-10T22:55:12.362Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/e2/fd0bbadf837f81edb0d208ba8f8cb552874c3b16e27cb91a31977d90875d/matplotlib-3.10.8-cp310-cp310-win_amd64.whl", hash = "sha256:f9b587c9c7274c1613a30afabf65a272114cd6cdbe67b3406f818c79d7ab2e2a", size = 8128127, upload-time = "2025-12-10T22:55:14.436Z" },
+ { url = "https://files.pythonhosted.org/packages/f8/86/de7e3a1cdcfc941483af70609edc06b83e7c8a0e0dc9ac325200a3f4d220/matplotlib-3.10.8-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:6be43b667360fef5c754dda5d25a32e6307a03c204f3c0fc5468b78fa87b4160", size = 8251215, upload-time = "2025-12-10T22:55:16.175Z" },
+ { url = "https://files.pythonhosted.org/packages/fd/14/baad3222f424b19ce6ad243c71de1ad9ec6b2e4eb1e458a48fdc6d120401/matplotlib-3.10.8-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:a2b336e2d91a3d7006864e0990c83b216fcdca64b5a6484912902cef87313d78", size = 8139625, upload-time = "2025-12-10T22:55:17.712Z" },
+ { url = "https://files.pythonhosted.org/packages/8f/a0/7024215e95d456de5883e6732e708d8187d9753a21d32f8ddb3befc0c445/matplotlib-3.10.8-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:efb30e3baaea72ce5928e32bab719ab4770099079d66726a62b11b1ef7273be4", size = 8712614, upload-time = "2025-12-10T22:55:20.8Z" },
+ { url = "https://files.pythonhosted.org/packages/5a/f4/b8347351da9a5b3f41e26cf547252d861f685c6867d179a7c9d60ad50189/matplotlib-3.10.8-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d56a1efd5bfd61486c8bc968fa18734464556f0fb8e51690f4ac25d85cbbbbc2", size = 9540997, upload-time = "2025-12-10T22:55:23.258Z" },
+ { url = "https://files.pythonhosted.org/packages/9e/c0/c7b914e297efe0bc36917bf216b2acb91044b91e930e878ae12981e461e5/matplotlib-3.10.8-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:238b7ce5717600615c895050239ec955d91f321c209dd110db988500558e70d6", size = 9596825, upload-time = "2025-12-10T22:55:25.217Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/d3/a4bbc01c237ab710a1f22b4da72f4ff6d77eb4c7735ea9811a94ae239067/matplotlib-3.10.8-cp311-cp311-win_amd64.whl", hash = "sha256:18821ace09c763ec93aef5eeff087ee493a24051936d7b9ebcad9662f66501f9", size = 8135090, upload-time = "2025-12-10T22:55:27.162Z" },
+ { url = "https://files.pythonhosted.org/packages/89/dd/a0b6588f102beab33ca6f5218b31725216577b2a24172f327eaf6417d5c9/matplotlib-3.10.8-cp311-cp311-win_arm64.whl", hash = "sha256:bab485bcf8b1c7d2060b4fcb6fc368a9e6f4cd754c9c2fea281f4be21df394a2", size = 8012377, upload-time = "2025-12-10T22:55:29.185Z" },
+ { url = "https://files.pythonhosted.org/packages/9e/67/f997cdcbb514012eb0d10cd2b4b332667997fb5ebe26b8d41d04962fa0e6/matplotlib-3.10.8-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:64fcc24778ca0404ce0cb7b6b77ae1f4c7231cdd60e6778f999ee05cbd581b9a", size = 8260453, upload-time = "2025-12-10T22:55:30.709Z" },
+ { url = "https://files.pythonhosted.org/packages/7e/65/07d5f5c7f7c994f12c768708bd2e17a4f01a2b0f44a1c9eccad872433e2e/matplotlib-3.10.8-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:b9a5ca4ac220a0cdd1ba6bcba3608547117d30468fefce49bb26f55c1a3d5c58", size = 8148321, upload-time = "2025-12-10T22:55:33.265Z" },
+ { url = "https://files.pythonhosted.org/packages/3e/f3/c5195b1ae57ef85339fd7285dfb603b22c8b4e79114bae5f4f0fcf688677/matplotlib-3.10.8-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:3ab4aabc72de4ff77b3ec33a6d78a68227bf1123465887f9905ba79184a1cc04", size = 8716944, upload-time = "2025-12-10T22:55:34.922Z" },
+ { url = "https://files.pythonhosted.org/packages/00/f9/7638f5cc82ec8a7aa005de48622eecc3ed7c9854b96ba15bd76b7fd27574/matplotlib-3.10.8-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:24d50994d8c5816ddc35411e50a86ab05f575e2530c02752e02538122613371f", size = 9550099, upload-time = "2025-12-10T22:55:36.789Z" },
+ { url = "https://files.pythonhosted.org/packages/57/61/78cd5920d35b29fd2a0fe894de8adf672ff52939d2e9b43cb83cd5ce1bc7/matplotlib-3.10.8-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:99eefd13c0dc3b3c1b4d561c1169e65fe47aab7b8158754d7c084088e2329466", size = 9613040, upload-time = "2025-12-10T22:55:38.715Z" },
+ { url = "https://files.pythonhosted.org/packages/30/4e/c10f171b6e2f44d9e3a2b96efa38b1677439d79c99357600a62cc1e9594e/matplotlib-3.10.8-cp312-cp312-win_amd64.whl", hash = "sha256:dd80ecb295460a5d9d260df63c43f4afbdd832d725a531f008dad1664f458adf", size = 8142717, upload-time = "2025-12-10T22:55:41.103Z" },
+ { url = "https://files.pythonhosted.org/packages/f1/76/934db220026b5fef85f45d51a738b91dea7d70207581063cd9bd8fafcf74/matplotlib-3.10.8-cp312-cp312-win_arm64.whl", hash = "sha256:3c624e43ed56313651bc18a47f838b60d7b8032ed348911c54906b130b20071b", size = 8012751, upload-time = "2025-12-10T22:55:42.684Z" },
+ { url = "https://files.pythonhosted.org/packages/f5/43/31d59500bb950b0d188e149a2e552040528c13d6e3d6e84d0cccac593dcd/matplotlib-3.10.8-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:f97aeb209c3d2511443f8797e3e5a569aebb040d4f8bc79aa3ee78a8fb9e3dd8", size = 8237252, upload-time = "2025-12-10T22:56:39.529Z" },
+ { url = "https://files.pythonhosted.org/packages/0c/2c/615c09984f3c5f907f51c886538ad785cf72e0e11a3225de2c0f9442aecc/matplotlib-3.10.8-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:fb061f596dad3a0f52b60dc6a5dec4a0c300dec41e058a7efe09256188d170b7", size = 8124693, upload-time = "2025-12-10T22:56:41.758Z" },
+ { url = "https://files.pythonhosted.org/packages/91/e1/2757277a1c56041e1fc104b51a0f7b9a4afc8eb737865d63cababe30bc61/matplotlib-3.10.8-pp310-pypy310_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:12d90df9183093fcd479f4172ac26b322b1248b15729cb57f42f71f24c7e37a3", size = 8702205, upload-time = "2025-12-10T22:56:43.415Z" },
+ { url = "https://files.pythonhosted.org/packages/04/30/3afaa31c757f34b7725ab9d2ba8b48b5e89c2019c003e7d0ead143aabc5a/matplotlib-3.10.8-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:6da7c2ce169267d0d066adcf63758f0604aa6c3eebf67458930f9d9b79ad1db1", size = 8249198, upload-time = "2025-12-10T22:56:45.584Z" },
+ { url = "https://files.pythonhosted.org/packages/48/2f/6334aec331f57485a642a7c8be03cb286f29111ae71c46c38b363230063c/matplotlib-3.10.8-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:9153c3292705be9f9c64498a8872118540c3f4123d1a1c840172edf262c8be4a", size = 8136817, upload-time = "2025-12-10T22:56:47.339Z" },
+ { url = "https://files.pythonhosted.org/packages/73/e4/6d6f14b2a759c622f191b2d67e9075a3f56aaccb3be4bb9bb6890030d0a0/matplotlib-3.10.8-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:1ae029229a57cd1e8fe542485f27e7ca7b23aa9e8944ddb4985d0bc444f1eca2", size = 8713867, upload-time = "2025-12-10T22:56:48.954Z" },
+]
+
+[[package]]
+name = "matplotlib-inline"
+version = "0.2.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c7/74/97e72a36efd4ae2bccb3463284300f8953f199b5ffbc04cbbb0ec78f74b1/matplotlib_inline-0.2.1.tar.gz", hash = "sha256:e1ee949c340d771fc39e241ea75683deb94762c8fa5f2927ec57c83c4dffa9fe", size = 8110, upload-time = "2025-10-23T09:00:22.126Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/af/33/ee4519fa02ed11a94aef9559552f3b17bb863f2ecfe1a35dc7f548cde231/matplotlib_inline-0.2.1-py3-none-any.whl", hash = "sha256:d56ce5156ba6085e00a9d54fead6ed29a9c47e215cd1bba2e976ef39f5710a76", size = 9516, upload-time = "2025-10-23T09:00:20.675Z" },
+]
+
+[[package]]
+name = "mistune"
+version = "3.2.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "typing-extensions", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9d/55/d01f0c4b45ade6536c51170b9043db8b2ec6ddf4a35c7ea3f5f559ac935b/mistune-3.2.0.tar.gz", hash = "sha256:708487c8a8cdd99c9d90eb3ed4c3ed961246ff78ac82f03418f5183ab70e398a", size = 95467, upload-time = "2025-12-23T11:36:34.994Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9b/f7/4a5e785ec9fbd65146a27b6b70b6cdc161a66f2024e4b04ac06a67f5578b/mistune-3.2.0-py3-none-any.whl", hash = "sha256:febdc629a3c78616b94393c6580551e0e34cc289987ec6c35ed3f4be42d0eee1", size = 53598, upload-time = "2025-12-23T11:36:33.211Z" },
+]
+
+[[package]]
+name = "moreorless"
+version = "0.5.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "click" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/8d/85/2e4999ac4a21ab3c5f31e2a48e0989a80be3afc512a7983e3253615983d4/moreorless-0.5.0.tar.gz", hash = "sha256:560a04f85006fccd74feaa4b6213a446392ff7b5ec0194a5464b6c30f182fa33", size = 14093, upload-time = "2025-05-04T22:29:59.006Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/fa/2e/9ea80ca55b73530b7639c6f146a58f636ddfe5a852ad467a44fe3e80d809/moreorless-0.5.0-py3-none-any.whl", hash = "sha256:66228870cd2f14bad5c3c3780aa71e29d3b2d9b5a01c03bfbf105efd4f668ecf", size = 14380, upload-time = "2025-05-04T22:29:57.417Z" },
+]
+
+[[package]]
+name = "mpmath"
+version = "1.3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e0/47/dd32fa426cc72114383ac549964eecb20ecfd886d1e5ccf5340b55b02f57/mpmath-1.3.0.tar.gz", hash = "sha256:7a28eb2a9774d00c7bc92411c19a89209d5da7c4c9a9e227be8330a23a25b91f", size = 508106, upload-time = "2023-03-07T16:47:11.061Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl", hash = "sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c", size = 536198, upload-time = "2023-03-07T16:47:09.197Z" },
+]
+
+[[package]]
+name = "mypy-extensions"
+version = "1.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/a2/6e/371856a3fb9d31ca8dac321cda606860fa4548858c0cc45d9d1d4ca2628b/mypy_extensions-1.1.0.tar.gz", hash = "sha256:52e68efc3284861e772bbcd66823fde5ae21fd2fdb51c62a211403730b916558", size = 6343, upload-time = "2025-04-22T14:54:24.164Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/79/7b/2c79738432f5c924bef5071f933bcc9efd0473bac3b4aa584a6f7c1c8df8/mypy_extensions-1.1.0-py3-none-any.whl", hash = "sha256:1be4cccdb0f2482337c4743e60421de3a356cd97508abadd57d47403e94f5505", size = 4963, upload-time = "2025-04-22T14:54:22.983Z" },
+]
+
+[[package]]
+name = "nbclient"
+version = "0.10.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "jupyter-client" },
+ { name = "jupyter-core" },
+ { name = "nbformat" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/56/91/1c1d5a4b9a9ebba2b4e32b8c852c2975c872aec1fe42ab5e516b2cecd193/nbclient-0.10.4.tar.gz", hash = "sha256:1e54091b16e6da39e297b0ece3e10f6f29f4ac4e8ee515d29f8a7099bd6553c9", size = 62554, upload-time = "2025-12-23T07:45:46.369Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/83/a0/5b0c2f11142ed1dddec842457d3f65eaf71a0080894eb6f018755b319c3a/nbclient-0.10.4-py3-none-any.whl", hash = "sha256:9162df5a7373d70d606527300a95a975a47c137776cd942e52d9c7e29ff83440", size = 25465, upload-time = "2025-12-23T07:45:44.51Z" },
+]
+
+[[package]]
+name = "nbconvert"
+version = "7.17.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "beautifulsoup4" },
+ { name = "bleach", extra = ["css"] },
+ { name = "defusedxml" },
+ { name = "jinja2" },
+ { name = "jupyter-core" },
+ { name = "jupyterlab-pygments" },
+ { name = "markupsafe" },
+ { name = "mistune" },
+ { name = "nbclient" },
+ { name = "nbformat" },
+ { name = "packaging" },
+ { name = "pandocfilters" },
+ { name = "pygments" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/38/47/81f886b699450d0569f7bc551df2b1673d18df7ff25cc0c21ca36ed8a5ff/nbconvert-7.17.0.tar.gz", hash = "sha256:1b2696f1b5be12309f6c7d707c24af604b87dfaf6d950794c7b07acab96dda78", size = 862855, upload-time = "2026-01-29T16:37:48.478Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/0d/4b/8d5f796a792f8a25f6925a96032f098789f448571eb92011df1ae59e8ea8/nbconvert-7.17.0-py3-none-any.whl", hash = "sha256:4f99a63b337b9a23504347afdab24a11faa7d86b405e5c8f9881cd313336d518", size = 261510, upload-time = "2026-01-29T16:37:46.322Z" },
+]
+
+[[package]]
+name = "nbformat"
+version = "5.10.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "fastjsonschema" },
+ { name = "jsonschema" },
+ { name = "jupyter-core" },
+ { name = "traitlets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/6d/fd/91545e604bc3dad7dca9ed03284086039b294c6b3d75c0d2fa45f9e9caf3/nbformat-5.10.4.tar.gz", hash = "sha256:322168b14f937a5d11362988ecac2a4952d3d8e3a2cbeb2319584631226d5b3a", size = 142749, upload-time = "2024-04-04T11:20:37.371Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/a9/82/0340caa499416c78e5d8f5f05947ae4bc3cba53c9f038ab6e9ed964e22f1/nbformat-5.10.4-py3-none-any.whl", hash = "sha256:3b48d6c8fbca4b299bf3982ea7db1af21580e4fec269ad087b9e81588891200b", size = 78454, upload-time = "2024-04-04T11:20:34.895Z" },
+]
+
+[[package]]
+name = "nest-asyncio"
+version = "1.6.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/83/f8/51569ac65d696c8ecbee95938f89d4abf00f47d58d48f6fbabfe8f0baefe/nest_asyncio-1.6.0.tar.gz", hash = "sha256:6f172d5449aca15afd6c646851f4e31e02c598d553a667e38cafa997cfec55fe", size = 7418, upload-time = "2024-01-21T14:25:19.227Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/a0/c4/c2971a3ba4c6103a3d10c4b0f24f461ddc027f0f09763220cf35ca1401b3/nest_asyncio-1.6.0-py3-none-any.whl", hash = "sha256:87af6efd6b5e897c81050477ef65c62e2b2f35d51703cae01aff2905b1852e1c", size = 5195, upload-time = "2024-01-21T14:25:17.223Z" },
+]
+
+[[package]]
+name = "networkx"
+version = "3.4.2"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+sdist = { url = "https://files.pythonhosted.org/packages/fd/1d/06475e1cd5264c0b870ea2cc6fdb3e37177c1e565c43f56ff17a10e3937f/networkx-3.4.2.tar.gz", hash = "sha256:307c3669428c5362aab27c8a1260aa8f47c4e91d3891f48be0141738d8d053e1", size = 2151368, upload-time = "2024-10-21T12:39:38.695Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/b9/54/dd730b32ea14ea797530a4479b2ed46a6fb250f682a9cfb997e968bf0261/networkx-3.4.2-py3-none-any.whl", hash = "sha256:df5d4365b724cf81b8c6a7312509d0c22386097011ad1abe274afd5e9d3bbc5f", size = 1723263, upload-time = "2024-10-21T12:39:36.247Z" },
+]
+
+[[package]]
+name = "networkx"
+version = "3.6.1"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+]
+sdist = { url = "https://files.pythonhosted.org/packages/6a/51/63fe664f3908c97be9d2e4f1158eb633317598cfa6e1fc14af5383f17512/networkx-3.6.1.tar.gz", hash = "sha256:26b7c357accc0c8cde558ad486283728b65b6a95d85ee1cd66bafab4c8168509", size = 2517025, upload-time = "2025-12-08T17:02:39.908Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9e/c9/b2622292ea83fbb4ec318f5b9ab867d0a28ab43c5717bb85b0a5f6b3b0a4/networkx-3.6.1-py3-none-any.whl", hash = "sha256:d47fbf302e7d9cbbb9e2555a0d267983d2aa476bac30e90dfbe5669bd57f3762", size = 2068504, upload-time = "2025-12-08T17:02:38.159Z" },
+]
+
+[[package]]
+name = "notebook"
+version = "7.5.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "jupyter-server" },
+ { name = "jupyterlab" },
+ { name = "jupyterlab-server" },
+ { name = "notebook-shim" },
+ { name = "tornado" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b8/cb/cc7f4df5cee315dd126a47eb60890690a0438d5e0dd40c32d60ce16de377/notebook-7.5.3.tar.gz", hash = "sha256:393ceb269cf9fdb02a3be607a57d7bd5c2c14604f1818a17dbeb38e04f98cbfa", size = 14073140, upload-time = "2026-01-26T07:28:36.605Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/96/98/9286e7f35e5584ebb79f997f2fb0cb66745c86f6c5fccf15ba32aac5e908/notebook-7.5.3-py3-none-any.whl", hash = "sha256:c997bfa1a2a9eb58c9bbb7e77d50428befb1033dd6f02c482922e96851d67354", size = 14481744, upload-time = "2026-01-26T07:28:31.867Z" },
+]
+
+[[package]]
+name = "notebook-shim"
+version = "0.2.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "jupyter-server" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/54/d2/92fa3243712b9a3e8bafaf60aac366da1cada3639ca767ff4b5b3654ec28/notebook_shim-0.2.4.tar.gz", hash = "sha256:b4b2cfa1b65d98307ca24361f5b30fe785b53c3fd07b7a47e89acb5e6ac638cb", size = 13167, upload-time = "2024-02-14T23:35:18.353Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/f9/33/bd5b9137445ea4b680023eb0469b2bb969d61303dedb2aac6560ff3d14a1/notebook_shim-0.2.4-py3-none-any.whl", hash = "sha256:411a5be4e9dc882a074ccbcae671eda64cceb068767e9a3419096986560e1cef", size = 13307, upload-time = "2024-02-14T23:35:16.286Z" },
+]
+
+[[package]]
+name = "numba"
+version = "0.63.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "llvmlite" },
+ { name = "numpy" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/dc/60/0145d479b2209bd8fdae5f44201eceb8ce5a23e0ed54c71f57db24618665/numba-0.63.1.tar.gz", hash = "sha256:b320aa675d0e3b17b40364935ea52a7b1c670c9037c39cf92c49502a75902f4b", size = 2761666, upload-time = "2025-12-10T02:57:39.002Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/5e/ce/5283d4ffa568f795bb0fd61ee1f0efc0c6094b94209259167fc8d4276bde/numba-0.63.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:c6d6bf5bf00f7db629305caaec82a2ffb8abe2bf45eaad0d0738dc7de4113779", size = 2680810, upload-time = "2025-12-10T02:56:55.269Z" },
+ { url = "https://files.pythonhosted.org/packages/0f/72/a8bda517e26d912633b32626333339b7c769ea73a5c688365ea5f88fd07e/numba-0.63.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:08653d0dfc9cc9c4c9a8fba29ceb1f2d5340c3b86c4a7e5e07e42b643bc6a2f4", size = 3739735, upload-time = "2025-12-10T02:56:57.922Z" },
+ { url = "https://files.pythonhosted.org/packages/ca/17/1913b7c1173b2db30fb7a9696892a7c4c59aeee777a9af6859e9e01bac51/numba-0.63.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f09eebf5650246ce2a4e9a8d38270e2d4b0b0ae978103bafb38ed7adc5ea906e", size = 3446707, upload-time = "2025-12-10T02:56:59.837Z" },
+ { url = "https://files.pythonhosted.org/packages/b4/77/703db56c3061e9fdad5e79c91452947fdeb2ec0bdfe4affe9b144e7025e0/numba-0.63.1-cp310-cp310-win_amd64.whl", hash = "sha256:f8bba17421d865d8c0f7be2142754ebce53e009daba41c44cf6909207d1a8d7d", size = 2747374, upload-time = "2025-12-10T02:57:07.908Z" },
+ { url = "https://files.pythonhosted.org/packages/70/90/5f8614c165d2e256fbc6c57028519db6f32e4982475a372bbe550ea0454c/numba-0.63.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:b33db00f18ccc790ee9911ce03fcdfe9d5124637d1ecc266f5ae0df06e02fec3", size = 2680501, upload-time = "2025-12-10T02:57:09.797Z" },
+ { url = "https://files.pythonhosted.org/packages/dc/9d/d0afc4cf915edd8eadd9b2ab5b696242886ee4f97720d9322650d66a88c6/numba-0.63.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7d31ea186a78a7c0f6b1b2a3fe68057fdb291b045c52d86232b5383b6cf4fc25", size = 3744945, upload-time = "2025-12-10T02:57:11.697Z" },
+ { url = "https://files.pythonhosted.org/packages/05/a9/d82f38f2ab73f3be6f838a826b545b80339762ee8969c16a8bf1d39395a8/numba-0.63.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ed3bb2fbdb651d6aac394388130a7001aab6f4541837123a4b4ab8b02716530c", size = 3450827, upload-time = "2025-12-10T02:57:13.709Z" },
+ { url = "https://files.pythonhosted.org/packages/18/3f/a9b106e93c5bd7434e65f044bae0d204e20aa7f7f85d72ceb872c7c04216/numba-0.63.1-cp311-cp311-win_amd64.whl", hash = "sha256:1ecbff7688f044b1601be70113e2fb1835367ee0b28ffa8f3adf3a05418c5c87", size = 2747262, upload-time = "2025-12-10T02:57:15.664Z" },
+ { url = "https://files.pythonhosted.org/packages/14/9c/c0974cd3d00ff70d30e8ff90522ba5fbb2bcee168a867d2321d8d0457676/numba-0.63.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:2819cd52afa5d8d04e057bdfd54367575105f8829350d8fb5e4066fb7591cc71", size = 2680981, upload-time = "2025-12-10T02:57:17.579Z" },
+ { url = "https://files.pythonhosted.org/packages/cb/70/ea2bc45205f206b7a24ee68a159f5097c9ca7e6466806e7c213587e0c2b1/numba-0.63.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5cfd45dbd3d409e713b1ccfdc2ee72ca82006860254429f4ef01867fdba5845f", size = 3801656, upload-time = "2025-12-10T02:57:19.106Z" },
+ { url = "https://files.pythonhosted.org/packages/0d/82/4f4ba4fd0f99825cbf3cdefd682ca3678be1702b63362011de6e5f71f831/numba-0.63.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:69a599df6976c03b7ecf15d05302696f79f7e6d10d620367407517943355bcb0", size = 3501857, upload-time = "2025-12-10T02:57:20.721Z" },
+ { url = "https://files.pythonhosted.org/packages/af/fd/6540456efa90b5f6604a86ff50dabefb187e43557e9081adcad3be44f048/numba-0.63.1-cp312-cp312-win_amd64.whl", hash = "sha256:bbad8c63e4fc7eb3cdb2c2da52178e180419f7969f9a685f283b313a70b92af3", size = 2750282, upload-time = "2025-12-10T02:57:22.474Z" },
+]
+
+[[package]]
+name = "numpy"
+version = "1.26.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/65/6e/09db70a523a96d25e115e71cc56a6f9031e7b8cd166c1ac8438307c14058/numpy-1.26.4.tar.gz", hash = "sha256:2a02aba9ed12e4ac4eb3ea9421c420301a0c6460d9830d74a9df87efa4912010", size = 15786129, upload-time = "2024-02-06T00:26:44.495Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/a7/94/ace0fdea5241a27d13543ee117cbc65868e82213fb31a8eb7fe9ff23f313/numpy-1.26.4-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:9ff0f4f29c51e2803569d7a51c2304de5554655a60c5d776e35b4a41413830d0", size = 20631468, upload-time = "2024-02-05T23:48:01.194Z" },
+ { url = "https://files.pythonhosted.org/packages/20/f7/b24208eba89f9d1b58c1668bc6c8c4fd472b20c45573cb767f59d49fb0f6/numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2e4ee3380d6de9c9ec04745830fd9e2eccb3e6cf790d39d7b98ffd19b0dd754a", size = 13966411, upload-time = "2024-02-05T23:48:29.038Z" },
+ { url = "https://files.pythonhosted.org/packages/fc/a5/4beee6488160798683eed5bdb7eead455892c3b4e1f78d79d8d3f3b084ac/numpy-1.26.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d209d8969599b27ad20994c8e41936ee0964e6da07478d6c35016bc386b66ad4", size = 14219016, upload-time = "2024-02-05T23:48:54.098Z" },
+ { url = "https://files.pythonhosted.org/packages/4b/d7/ecf66c1cd12dc28b4040b15ab4d17b773b87fa9d29ca16125de01adb36cd/numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ffa75af20b44f8dba823498024771d5ac50620e6915abac414251bd971b4529f", size = 18240889, upload-time = "2024-02-05T23:49:25.361Z" },
+ { url = "https://files.pythonhosted.org/packages/24/03/6f229fe3187546435c4f6f89f6d26c129d4f5bed40552899fcf1f0bf9e50/numpy-1.26.4-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:62b8e4b1e28009ef2846b4c7852046736bab361f7aeadeb6a5b89ebec3c7055a", size = 13876746, upload-time = "2024-02-05T23:49:51.983Z" },
+ { url = "https://files.pythonhosted.org/packages/39/fe/39ada9b094f01f5a35486577c848fe274e374bbf8d8f472e1423a0bbd26d/numpy-1.26.4-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:a4abb4f9001ad2858e7ac189089c42178fcce737e4169dc61321660f1a96c7d2", size = 18078620, upload-time = "2024-02-05T23:50:22.515Z" },
+ { url = "https://files.pythonhosted.org/packages/d5/ef/6ad11d51197aad206a9ad2286dc1aac6a378059e06e8cf22cd08ed4f20dc/numpy-1.26.4-cp310-cp310-win32.whl", hash = "sha256:bfe25acf8b437eb2a8b2d49d443800a5f18508cd811fea3181723922a8a82b07", size = 5972659, upload-time = "2024-02-05T23:50:35.834Z" },
+ { url = "https://files.pythonhosted.org/packages/19/77/538f202862b9183f54108557bfda67e17603fc560c384559e769321c9d92/numpy-1.26.4-cp310-cp310-win_amd64.whl", hash = "sha256:b97fe8060236edf3662adfc2c633f56a08ae30560c56310562cb4f95500022d5", size = 15808905, upload-time = "2024-02-05T23:51:03.701Z" },
+ { url = "https://files.pythonhosted.org/packages/11/57/baae43d14fe163fa0e4c47f307b6b2511ab8d7d30177c491960504252053/numpy-1.26.4-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:4c66707fabe114439db9068ee468c26bbdf909cac0fb58686a42a24de1760c71", size = 20630554, upload-time = "2024-02-05T23:51:50.149Z" },
+ { url = "https://files.pythonhosted.org/packages/1a/2e/151484f49fd03944c4a3ad9c418ed193cfd02724e138ac8a9505d056c582/numpy-1.26.4-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:edd8b5fe47dab091176d21bb6de568acdd906d1887a4584a15a9a96a1dca06ef", size = 13997127, upload-time = "2024-02-05T23:52:15.314Z" },
+ { url = "https://files.pythonhosted.org/packages/79/ae/7e5b85136806f9dadf4878bf73cf223fe5c2636818ba3ab1c585d0403164/numpy-1.26.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7ab55401287bfec946ced39700c053796e7cc0e3acbef09993a9ad2adba6ca6e", size = 14222994, upload-time = "2024-02-05T23:52:47.569Z" },
+ { url = "https://files.pythonhosted.org/packages/3a/d0/edc009c27b406c4f9cbc79274d6e46d634d139075492ad055e3d68445925/numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:666dbfb6ec68962c033a450943ded891bed2d54e6755e35e5835d63f4f6931d5", size = 18252005, upload-time = "2024-02-05T23:53:15.637Z" },
+ { url = "https://files.pythonhosted.org/packages/09/bf/2b1aaf8f525f2923ff6cfcf134ae5e750e279ac65ebf386c75a0cf6da06a/numpy-1.26.4-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:96ff0b2ad353d8f990b63294c8986f1ec3cb19d749234014f4e7eb0112ceba5a", size = 13885297, upload-time = "2024-02-05T23:53:42.16Z" },
+ { url = "https://files.pythonhosted.org/packages/df/a0/4e0f14d847cfc2a633a1c8621d00724f3206cfeddeb66d35698c4e2cf3d2/numpy-1.26.4-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:60dedbb91afcbfdc9bc0b1f3f402804070deed7392c23eb7a7f07fa857868e8a", size = 18093567, upload-time = "2024-02-05T23:54:11.696Z" },
+ { url = "https://files.pythonhosted.org/packages/d2/b7/a734c733286e10a7f1a8ad1ae8c90f2d33bf604a96548e0a4a3a6739b468/numpy-1.26.4-cp311-cp311-win32.whl", hash = "sha256:1af303d6b2210eb850fcf03064d364652b7120803a0b872f5211f5234b399f20", size = 5968812, upload-time = "2024-02-05T23:54:26.453Z" },
+ { url = "https://files.pythonhosted.org/packages/3f/6b/5610004206cf7f8e7ad91c5a85a8c71b2f2f8051a0c0c4d5916b76d6cbb2/numpy-1.26.4-cp311-cp311-win_amd64.whl", hash = "sha256:cd25bcecc4974d09257ffcd1f098ee778f7834c3ad767fe5db785be9a4aa9cb2", size = 15811913, upload-time = "2024-02-05T23:54:53.933Z" },
+ { url = "https://files.pythonhosted.org/packages/95/12/8f2020a8e8b8383ac0177dc9570aad031a3beb12e38847f7129bacd96228/numpy-1.26.4-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:b3ce300f3644fb06443ee2222c2201dd3a89ea6040541412b8fa189341847218", size = 20335901, upload-time = "2024-02-05T23:55:32.801Z" },
+ { url = "https://files.pythonhosted.org/packages/75/5b/ca6c8bd14007e5ca171c7c03102d17b4f4e0ceb53957e8c44343a9546dcc/numpy-1.26.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:03a8c78d01d9781b28a6989f6fa1bb2c4f2d51201cf99d3dd875df6fbd96b23b", size = 13685868, upload-time = "2024-02-05T23:55:56.28Z" },
+ { url = "https://files.pythonhosted.org/packages/79/f8/97f10e6755e2a7d027ca783f63044d5b1bc1ae7acb12afe6a9b4286eac17/numpy-1.26.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9fad7dcb1aac3c7f0584a5a8133e3a43eeb2fe127f47e3632d43d677c66c102b", size = 13925109, upload-time = "2024-02-05T23:56:20.368Z" },
+ { url = "https://files.pythonhosted.org/packages/0f/50/de23fde84e45f5c4fda2488c759b69990fd4512387a8632860f3ac9cd225/numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:675d61ffbfa78604709862923189bad94014bef562cc35cf61d3a07bba02a7ed", size = 17950613, upload-time = "2024-02-05T23:56:56.054Z" },
+ { url = "https://files.pythonhosted.org/packages/4c/0c/9c603826b6465e82591e05ca230dfc13376da512b25ccd0894709b054ed0/numpy-1.26.4-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:ab47dbe5cc8210f55aa58e4805fe224dac469cde56b9f731a4c098b91917159a", size = 13572172, upload-time = "2024-02-05T23:57:21.56Z" },
+ { url = "https://files.pythonhosted.org/packages/76/8c/2ba3902e1a0fc1c74962ea9bb33a534bb05984ad7ff9515bf8d07527cadd/numpy-1.26.4-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:1dda2e7b4ec9dd512f84935c5f126c8bd8b9f2fc001e9f54af255e8c5f16b0e0", size = 17786643, upload-time = "2024-02-05T23:57:56.585Z" },
+ { url = "https://files.pythonhosted.org/packages/28/4a/46d9e65106879492374999e76eb85f87b15328e06bd1550668f79f7b18c6/numpy-1.26.4-cp312-cp312-win32.whl", hash = "sha256:50193e430acfc1346175fcbdaa28ffec49947a06918b7b92130744e81e640110", size = 5677803, upload-time = "2024-02-05T23:58:08.963Z" },
+ { url = "https://files.pythonhosted.org/packages/16/2e/86f24451c2d530c88daf997cb8d6ac622c1d40d19f5a031ed68a4b73a374/numpy-1.26.4-cp312-cp312-win_amd64.whl", hash = "sha256:08beddf13648eb95f8d867350f6a018a4be2e5ad54c8d8caed89ebca558b2818", size = 15517754, upload-time = "2024-02-05T23:58:36.364Z" },
+]
+
+[[package]]
+name = "nvidia-cublas"
+version = "13.1.0.3"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e1/a5/fce49e2ae977e0ccc084e5adafceb4f0ac0c8333cb6863501618a7277f67/nvidia_cublas-13.1.0.3-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:c86fc7f7ae36d7528288c5d88098edcb7b02c633d262e7ddbb86b0ad91be5df2", size = 542851226, upload-time = "2025-10-09T08:59:04.818Z" },
+ { url = "https://files.pythonhosted.org/packages/e7/44/423ac00af4dd95a5aeb27207e2c0d9b7118702149bf4704c3ddb55bb7429/nvidia_cublas-13.1.0.3-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:ee8722c1f0145ab246bccb9e452153b5e0515fd094c3678df50b2a0888b8b171", size = 423133236, upload-time = "2025-10-09T08:59:32.536Z" },
+]
+
+[[package]]
+name = "nvidia-cuda-cupti"
+version = "13.0.85"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/2a/2a/80353b103fc20ce05ef51e928daed4b6015db4aaa9162ed0997090fe2250/nvidia_cuda_cupti-13.0.85-py3-none-manylinux_2_25_aarch64.whl", hash = "sha256:796bd679890ee55fb14a94629b698b6db54bcfd833d391d5e94017dd9d7d3151", size = 10310827, upload-time = "2025-09-04T08:26:42.012Z" },
+ { url = "https://files.pythonhosted.org/packages/33/6d/737d164b4837a9bbd202f5ae3078975f0525a55730fe871d8ed4e3b952b0/nvidia_cuda_cupti-13.0.85-py3-none-manylinux_2_25_x86_64.whl", hash = "sha256:4eb01c08e859bf924d222250d2e8f8b8ff6d3db4721288cf35d14252a4d933c8", size = 10715597, upload-time = "2025-09-04T08:26:51.312Z" },
+]
+
+[[package]]
+name = "nvidia-cuda-nvrtc"
+version = "13.0.88"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c3/68/483a78f5e8f31b08fb1bb671559968c0ca3a065ac7acabfc7cee55214fd6/nvidia_cuda_nvrtc-13.0.88-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:ad9b6d2ead2435f11cbb6868809d2adeeee302e9bb94bcf0539c7a40d80e8575", size = 90215200, upload-time = "2025-09-04T08:28:44.204Z" },
+ { url = "https://files.pythonhosted.org/packages/b7/dc/6bb80850e0b7edd6588d560758f17e0550893a1feaf436807d64d2da040f/nvidia_cuda_nvrtc-13.0.88-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:d27f20a0ca67a4bb34268a5e951033496c5b74870b868bacd046b1b8e0c3267b", size = 43015449, upload-time = "2025-09-04T08:28:20.239Z" },
+]
+
+[[package]]
+name = "nvidia-cuda-runtime"
+version = "13.0.96"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/87/4f/17d7b9b8e285199c58ce28e31b5c5bbaa4d8271af06a89b6405258245de2/nvidia_cuda_runtime-13.0.96-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:ef9bcbe90493a2b9d810e43d249adb3d02e98dd30200d86607d8d02687c43f55", size = 2261060, upload-time = "2025-10-09T08:55:15.78Z" },
+ { url = "https://files.pythonhosted.org/packages/2e/24/d1558f3b68b1d26e706813b1d10aa1d785e4698c425af8db8edc3dced472/nvidia_cuda_runtime-13.0.96-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7f82250d7782aa23b6cfe765ecc7db554bd3c2870c43f3d1821f1d18aebf0548", size = 2243632, upload-time = "2025-10-09T08:55:36.117Z" },
+]
+
+[[package]]
+name = "nvidia-cudnn-cu13"
+version = "9.19.0.56"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "nvidia-cublas" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/f1/84/26025437c1e6b61a707442184fa0c03d083b661adf3a3eecfd6d21677740/nvidia_cudnn_cu13-9.19.0.56-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:6ed29ffaee1176c612daf442e4dd6cfeb6a0caa43ddcbeb59da94953030b1be4", size = 433781201, upload-time = "2026-02-03T20:40:53.805Z" },
+ { url = "https://files.pythonhosted.org/packages/a3/22/0b4b932655d17a6da1b92fa92ab12844b053bb2ac2475e179ba6f043da1e/nvidia_cudnn_cu13-9.19.0.56-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:d20e1734305e9d68889a96e3f35094d733ff1f83932ebe462753973e53a572bf", size = 366066321, upload-time = "2026-02-03T20:44:52.837Z" },
+]
+
+[[package]]
+name = "nvidia-cufft"
+version = "12.0.0.61"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "nvidia-nvjitlink" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554, upload-time = "2025-09-04T08:31:38.196Z" },
+ { url = "https://files.pythonhosted.org/packages/a8/2f/7b57e29836ea8714f81e9898409196f47d772d5ddedddf1592eadb8ab743/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:6c44f692dce8fd5ffd3e3df134b6cdb9c2f72d99cf40b62c32dde45eea9ddad3", size = 214085489, upload-time = "2025-09-04T08:31:56.044Z" },
+]
+
+[[package]]
+name = "nvidia-cufile"
+version = "1.15.1.6"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3f/70/4f193de89a48b71714e74602ee14d04e4019ad36a5a9f20c425776e72cd6/nvidia_cufile-1.15.1.6-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:08a3ecefae5a01c7f5117351c64f17c7c62efa5fffdbe24fc7d298da19cd0b44", size = 1223672, upload-time = "2025-09-04T08:32:22.779Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/73/cc4a14c9813a8a0d509417cf5f4bdaba76e924d58beb9864f5a7baceefbf/nvidia_cufile-1.15.1.6-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:bdc0deedc61f548bddf7733bdc216456c2fdb101d020e1ab4b88d232d5e2f6d1", size = 1136992, upload-time = "2025-09-04T08:32:14.119Z" },
+]
+
+[[package]]
+name = "nvidia-curand"
+version = "10.4.0.35"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1e/72/7c2ae24fb6b63a32e6ae5d241cc65263ea18d08802aaae087d9f013335a2/nvidia_curand-10.4.0.35-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:133df5a7509c3e292aaa2b477afd0194f06ce4ea24d714d616ff36439cee349a", size = 61962106, upload-time = "2025-08-04T10:21:41.128Z" },
+ { url = "https://files.pythonhosted.org/packages/a5/9f/be0a41ca4a4917abf5cb9ae0daff1a6060cc5de950aec0396de9f3b52bc5/nvidia_curand-10.4.0.35-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:1aee33a5da6e1db083fe2b90082def8915f30f3248d5896bcec36a579d941bfc", size = 59544258, upload-time = "2025-08-04T10:22:03.992Z" },
+]
+
+[[package]]
+name = "nvidia-cusolver"
+version = "12.0.4.66"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "nvidia-cublas" },
+ { name = "nvidia-cusparse" },
+ { name = "nvidia-nvjitlink" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760, upload-time = "2025-09-04T08:33:04.222Z" },
+ { url = "https://files.pythonhosted.org/packages/5f/67/cba3777620cdacb99102da4042883709c41c709f4b6323c10781a9c3aa34/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_x86_64.whl", hash = "sha256:0a759da5dea5c0ea10fd307de75cdeb59e7ea4fcb8add0924859b944babf1112", size = 200941980, upload-time = "2025-09-04T08:33:22.767Z" },
+]
+
+[[package]]
+name = "nvidia-cusparse"
+version = "12.6.3.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "nvidia-nvjitlink" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568, upload-time = "2025-09-04T08:33:42.864Z" },
+ { url = "https://files.pythonhosted.org/packages/fa/18/623c77619c31d62efd55302939756966f3ecc8d724a14dab2b75f1508850/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:2b3c89c88d01ee0e477cb7f82ef60a11a4bcd57b6b87c33f789350b59759360b", size = 145942937, upload-time = "2025-09-04T08:33:58.029Z" },
+]
+
+[[package]]
+name = "nvidia-cusparselt-cu13"
+version = "0.8.0"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/46/10/8dcd1175260706a2fc92a16a52e306b71d4c1ea0b0cc4a9484183399818a/nvidia_cusparselt_cu13-0.8.0-py3-none-manylinux2014_aarch64.whl", hash = "sha256:400c6ed1cf6780fc6efedd64ec9f1345871767e6a1a0a552a1ea0578117ea77c", size = 220791277, upload-time = "2025-08-13T19:22:40.982Z" },
+ { url = "https://files.pythonhosted.org/packages/fd/53/43b0d71f4e702fa9733f8b4571fdca50a8813f1e450b656c239beff12315/nvidia_cusparselt_cu13-0.8.0-py3-none-manylinux2014_x86_64.whl", hash = "sha256:25e30a8a7323935d4ad0340b95a0b69926eee755767e8e0b1cf8dd85b197d3fd", size = 169884119, upload-time = "2025-08-13T19:23:41.967Z" },
+]
+
+[[package]]
+name = "nvidia-nccl-cu13"
+version = "2.28.9"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/39/55/1920646a2e43ffd4fc958536b276197ed740e9e0c54105b4bb3521591fc7/nvidia_nccl_cu13-2.28.9-py3-none-manylinux_2_18_aarch64.whl", hash = "sha256:01c873ba1626b54caa12272ed228dc5b2781545e0ae8ba3f432a8ef1c6d78643", size = 196561677, upload-time = "2025-11-18T05:49:03.45Z" },
+ { url = "https://files.pythonhosted.org/packages/b0/b4/878fefaad5b2bcc6fcf8d474a25e3e3774bc5133e4b58adff4d0bca238bc/nvidia_nccl_cu13-2.28.9-py3-none-manylinux_2_18_x86_64.whl", hash = "sha256:e4553a30f34195f3fa1da02a6da3d6337d28f2003943aa0a3d247bbc25fefc42", size = 196493177, upload-time = "2025-11-18T05:49:17.677Z" },
+]
+
+[[package]]
+name = "nvidia-nvjitlink"
+version = "13.0.88"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/56/7a/123e033aaff487c77107195fa5a2b8686795ca537935a24efae476c41f05/nvidia_nvjitlink-13.0.88-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl", hash = "sha256:13a74f429e23b921c1109976abefacc69835f2f433ebd323d3946e11d804e47b", size = 40713933, upload-time = "2025-09-04T08:35:43.553Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/2c/93c5250e64df4f894f1cbb397c6fd71f79813f9fd79d7cd61de3f97b3c2d/nvidia_nvjitlink-13.0.88-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:e931536ccc7d467a98ba1d8b89ff7fa7f1fa3b13f2b0069118cd7f47bff07d0c", size = 38768748, upload-time = "2025-09-04T08:35:20.008Z" },
+]
+
+[[package]]
+name = "nvidia-nvshmem-cu13"
+version = "3.4.5"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/dc/0f/05cc9c720236dcd2db9c1ab97fff629e96821be2e63103569da0c9b72f19/nvidia_nvshmem_cu13-3.4.5-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:6dc2a197f38e5d0376ad52cd1a2a3617d3cdc150fd5966f4aee9bcebb1d68fe9", size = 60215947, upload-time = "2025-09-06T00:32:20.022Z" },
+ { url = "https://files.pythonhosted.org/packages/3c/35/a9bf80a609e74e3b000fef598933235c908fcefcef9026042b8e6dfde2a9/nvidia_nvshmem_cu13-3.4.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:290f0a2ee94c9f3687a02502f3b9299a9f9fe826e6d0287ee18482e78d495b80", size = 60412546, upload-time = "2025-09-06T00:32:41.564Z" },
+]
+
+[[package]]
+name = "nvidia-nvtx"
+version = "13.0.85"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c2/f3/d86c845465a2723ad7e1e5c36dcd75ddb82898b3f53be47ebd429fb2fa5d/nvidia_nvtx-13.0.85-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:4936d1d6780fbe68db454f5e72a42ff64d1fd6397df9f363ae786930fd5c1cd4", size = 148047, upload-time = "2025-09-04T08:29:01.761Z" },
+ { url = "https://files.pythonhosted.org/packages/a8/64/3708a90d1ebe202ffdeb7185f878a3c84d15c2b2c31858da2ce0583e2def/nvidia_nvtx-13.0.85-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cb7780edb6b14107373c835bf8b72e7a178bac7367e23da7acb108f973f157a6", size = 148878, upload-time = "2025-09-04T08:28:53.627Z" },
+]
+
+[[package]]
+name = "omegaconf"
+version = "2.3.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "antlr4-python3-runtime" },
+ { name = "pyyaml" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/09/48/6388f1bb9da707110532cb70ec4d2822858ddfb44f1cdf1233c20a80ea4b/omegaconf-2.3.0.tar.gz", hash = "sha256:d5d4b6d29955cc50ad50c46dc269bcd92c6e00f5f90d23ab5fee7bfca4ba4cc7", size = 3298120, upload-time = "2022-12-08T20:59:22.753Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e3/94/1843518e420fa3ed6919835845df698c7e27e183cb997394e4a670973a65/omegaconf-2.3.0-py3-none-any.whl", hash = "sha256:7b4df175cdb08ba400f45cae3bdcae7ba8365db4d165fc65fd04b050ab63b46b", size = 79500, upload-time = "2022-12-08T20:59:19.686Z" },
+]
+
+[[package]]
+name = "opencv-python"
+version = "4.11.0.86"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "numpy" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/17/06/68c27a523103dad5837dc5b87e71285280c4f098c60e4fe8a8db6486ab09/opencv-python-4.11.0.86.tar.gz", hash = "sha256:03d60ccae62304860d232272e4a4fda93c39d595780cb40b161b310244b736a4", size = 95171956, upload-time = "2025-01-16T13:52:24.737Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/05/4d/53b30a2a3ac1f75f65a59eb29cf2ee7207ce64867db47036ad61743d5a23/opencv_python-4.11.0.86-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:432f67c223f1dc2824f5e73cdfcd9db0efc8710647d4e813012195dc9122a52a", size = 37326322, upload-time = "2025-01-16T13:52:25.887Z" },
+ { url = "https://files.pythonhosted.org/packages/3b/84/0a67490741867eacdfa37bc18df96e08a9d579583b419010d7f3da8ff503/opencv_python-4.11.0.86-cp37-abi3-macosx_13_0_x86_64.whl", hash = "sha256:9d05ef13d23fe97f575153558653e2d6e87103995d54e6a35db3f282fe1f9c66", size = 56723197, upload-time = "2025-01-16T13:55:21.222Z" },
+ { url = "https://files.pythonhosted.org/packages/f3/bd/29c126788da65c1fb2b5fb621b7fed0ed5f9122aa22a0868c5e2c15c6d23/opencv_python-4.11.0.86-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1b92ae2c8852208817e6776ba1ea0d6b1e0a1b5431e971a2a0ddd2a8cc398202", size = 42230439, upload-time = "2025-01-16T13:51:35.822Z" },
+ { url = "https://files.pythonhosted.org/packages/2c/8b/90eb44a40476fa0e71e05a0283947cfd74a5d36121a11d926ad6f3193cc4/opencv_python-4.11.0.86-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6b02611523803495003bd87362db3e1d2a0454a6a63025dc6658a9830570aa0d", size = 62986597, upload-time = "2025-01-16T13:52:08.836Z" },
+ { url = "https://files.pythonhosted.org/packages/fb/d7/1d5941a9dde095468b288d989ff6539dd69cd429dbf1b9e839013d21b6f0/opencv_python-4.11.0.86-cp37-abi3-win32.whl", hash = "sha256:810549cb2a4aedaa84ad9a1c92fbfdfc14090e2749cedf2c1589ad8359aa169b", size = 29384337, upload-time = "2025-01-16T13:52:13.549Z" },
+ { url = "https://files.pythonhosted.org/packages/a4/7d/f1c30a92854540bf789e9cd5dde7ef49bbe63f855b85a2e6b3db8135c591/opencv_python-4.11.0.86-cp37-abi3-win_amd64.whl", hash = "sha256:085ad9b77c18853ea66283e98affefe2de8cc4c1f43eda4c100cf9b2721142ec", size = 39488044, upload-time = "2025-01-16T13:52:21.928Z" },
+]
+
+[[package]]
+name = "overrides"
+version = "7.7.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/36/86/b585f53236dec60aba864e050778b25045f857e17f6e5ea0ae95fe80edd2/overrides-7.7.0.tar.gz", hash = "sha256:55158fa3d93b98cc75299b1e67078ad9003ca27945c76162c1c0766d6f91820a", size = 22812, upload-time = "2024-01-27T21:01:33.423Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/2c/ab/fc8290c6a4c722e5514d80f62b2dc4c4df1a68a41d1364e625c35990fcf3/overrides-7.7.0-py3-none-any.whl", hash = "sha256:c7ed9d062f78b8e4c1a7b70bd8796b35ead4d9f510227ef9c5dc7626c60d7e49", size = 17832, upload-time = "2024-01-27T21:01:31.393Z" },
+]
+
+[[package]]
+name = "packaging"
+version = "26.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/65/ee/299d360cdc32edc7d2cf530f3accf79c4fca01e96ffc950d8a52213bd8e4/packaging-26.0.tar.gz", hash = "sha256:00243ae351a257117b6a241061796684b084ed1c516a08c48a3f7e147a9d80b4", size = 143416, upload-time = "2026-01-21T20:50:39.064Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" },
+]
+
+[[package]]
+name = "pandas"
+version = "2.3.3"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+dependencies = [
+ { name = "numpy", marker = "python_full_version < '3.11'" },
+ { name = "python-dateutil", marker = "python_full_version < '3.11'" },
+ { name = "pytz", marker = "python_full_version < '3.11'" },
+ { name = "tzdata", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/33/01/d40b85317f86cf08d853a4f495195c73815fdf205eef3993821720274518/pandas-2.3.3.tar.gz", hash = "sha256:e05e1af93b977f7eafa636d043f9f94c7ee3ac81af99c13508215942e64c993b", size = 4495223, upload-time = "2025-09-29T23:34:51.853Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3d/f7/f425a00df4fcc22b292c6895c6831c0c8ae1d9fac1e024d16f98a9ce8749/pandas-2.3.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:376c6446ae31770764215a6c937f72d917f214b43560603cd60da6408f183b6c", size = 11555763, upload-time = "2025-09-29T23:16:53.287Z" },
+ { url = "https://files.pythonhosted.org/packages/13/4f/66d99628ff8ce7857aca52fed8f0066ce209f96be2fede6cef9f84e8d04f/pandas-2.3.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:e19d192383eab2f4ceb30b412b22ea30690c9e618f78870357ae1d682912015a", size = 10801217, upload-time = "2025-09-29T23:17:04.522Z" },
+ { url = "https://files.pythonhosted.org/packages/1d/03/3fc4a529a7710f890a239cc496fc6d50ad4a0995657dccc1d64695adb9f4/pandas-2.3.3-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5caf26f64126b6c7aec964f74266f435afef1c1b13da3b0636c7518a1fa3e2b1", size = 12148791, upload-time = "2025-09-29T23:17:18.444Z" },
+ { url = "https://files.pythonhosted.org/packages/40/a8/4dac1f8f8235e5d25b9955d02ff6f29396191d4e665d71122c3722ca83c5/pandas-2.3.3-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:dd7478f1463441ae4ca7308a70e90b33470fa593429f9d4c578dd00d1fa78838", size = 12769373, upload-time = "2025-09-29T23:17:35.846Z" },
+ { url = "https://files.pythonhosted.org/packages/df/91/82cc5169b6b25440a7fc0ef3a694582418d875c8e3ebf796a6d6470aa578/pandas-2.3.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:4793891684806ae50d1288c9bae9330293ab4e083ccd1c5e383c34549c6e4250", size = 13200444, upload-time = "2025-09-29T23:17:49.341Z" },
+ { url = "https://files.pythonhosted.org/packages/10/ae/89b3283800ab58f7af2952704078555fa60c807fff764395bb57ea0b0dbd/pandas-2.3.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:28083c648d9a99a5dd035ec125d42439c6c1c525098c58af0fc38dd1a7a1b3d4", size = 13858459, upload-time = "2025-09-29T23:18:03.722Z" },
+ { url = "https://files.pythonhosted.org/packages/85/72/530900610650f54a35a19476eca5104f38555afccda1aa11a92ee14cb21d/pandas-2.3.3-cp310-cp310-win_amd64.whl", hash = "sha256:503cf027cf9940d2ceaa1a93cfb5f8c8c7e6e90720a2850378f0b3f3b1e06826", size = 11346086, upload-time = "2025-09-29T23:18:18.505Z" },
+ { url = "https://files.pythonhosted.org/packages/c1/fa/7ac648108144a095b4fb6aa3de1954689f7af60a14cf25583f4960ecb878/pandas-2.3.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:602b8615ebcc4a0c1751e71840428ddebeb142ec02c786e8ad6b1ce3c8dec523", size = 11578790, upload-time = "2025-09-29T23:18:30.065Z" },
+ { url = "https://files.pythonhosted.org/packages/9b/35/74442388c6cf008882d4d4bdfc4109be87e9b8b7ccd097ad1e7f006e2e95/pandas-2.3.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:8fe25fc7b623b0ef6b5009149627e34d2a4657e880948ec3c840e9402e5c1b45", size = 10833831, upload-time = "2025-09-29T23:38:56.071Z" },
+ { url = "https://files.pythonhosted.org/packages/fe/e4/de154cbfeee13383ad58d23017da99390b91d73f8c11856f2095e813201b/pandas-2.3.3-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b468d3dad6ff947df92dcb32ede5b7bd41a9b3cceef0a30ed925f6d01fb8fa66", size = 12199267, upload-time = "2025-09-29T23:18:41.627Z" },
+ { url = "https://files.pythonhosted.org/packages/bf/c9/63f8d545568d9ab91476b1818b4741f521646cbdd151c6efebf40d6de6f7/pandas-2.3.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b98560e98cb334799c0b07ca7967ac361a47326e9b4e5a7dfb5ab2b1c9d35a1b", size = 12789281, upload-time = "2025-09-29T23:18:56.834Z" },
+ { url = "https://files.pythonhosted.org/packages/f2/00/a5ac8c7a0e67fd1a6059e40aa08fa1c52cc00709077d2300e210c3ce0322/pandas-2.3.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1d37b5848ba49824e5c30bedb9c830ab9b7751fd049bc7914533e01c65f79791", size = 13240453, upload-time = "2025-09-29T23:19:09.247Z" },
+ { url = "https://files.pythonhosted.org/packages/27/4d/5c23a5bc7bd209231618dd9e606ce076272c9bc4f12023a70e03a86b4067/pandas-2.3.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:db4301b2d1f926ae677a751eb2bd0e8c5f5319c9cb3f88b0becbbb0b07b34151", size = 13890361, upload-time = "2025-09-29T23:19:25.342Z" },
+ { url = "https://files.pythonhosted.org/packages/8e/59/712db1d7040520de7a4965df15b774348980e6df45c129b8c64d0dbe74ef/pandas-2.3.3-cp311-cp311-win_amd64.whl", hash = "sha256:f086f6fe114e19d92014a1966f43a3e62285109afe874f067f5abbdcbb10e59c", size = 11348702, upload-time = "2025-09-29T23:19:38.296Z" },
+ { url = "https://files.pythonhosted.org/packages/9c/fb/231d89e8637c808b997d172b18e9d4a4bc7bf31296196c260526055d1ea0/pandas-2.3.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:6d21f6d74eb1725c2efaa71a2bfc661a0689579b58e9c0ca58a739ff0b002b53", size = 11597846, upload-time = "2025-09-29T23:19:48.856Z" },
+ { url = "https://files.pythonhosted.org/packages/5c/bd/bf8064d9cfa214294356c2d6702b716d3cf3bb24be59287a6a21e24cae6b/pandas-2.3.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3fd2f887589c7aa868e02632612ba39acb0b8948faf5cc58f0850e165bd46f35", size = 10729618, upload-time = "2025-09-29T23:39:08.659Z" },
+ { url = "https://files.pythonhosted.org/packages/57/56/cf2dbe1a3f5271370669475ead12ce77c61726ffd19a35546e31aa8edf4e/pandas-2.3.3-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ecaf1e12bdc03c86ad4a7ea848d66c685cb6851d807a26aa245ca3d2017a1908", size = 11737212, upload-time = "2025-09-29T23:19:59.765Z" },
+ { url = "https://files.pythonhosted.org/packages/e5/63/cd7d615331b328e287d8233ba9fdf191a9c2d11b6af0c7a59cfcec23de68/pandas-2.3.3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b3d11d2fda7eb164ef27ffc14b4fcab16a80e1ce67e9f57e19ec0afaf715ba89", size = 12362693, upload-time = "2025-09-29T23:20:14.098Z" },
+ { url = "https://files.pythonhosted.org/packages/a6/de/8b1895b107277d52f2b42d3a6806e69cfef0d5cf1d0ba343470b9d8e0a04/pandas-2.3.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:a68e15f780eddf2b07d242e17a04aa187a7ee12b40b930bfdd78070556550e98", size = 12771002, upload-time = "2025-09-29T23:20:26.76Z" },
+ { url = "https://files.pythonhosted.org/packages/87/21/84072af3187a677c5893b170ba2c8fbe450a6ff911234916da889b698220/pandas-2.3.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:371a4ab48e950033bcf52b6527eccb564f52dc826c02afd9a1bc0ab731bba084", size = 13450971, upload-time = "2025-09-29T23:20:41.344Z" },
+ { url = "https://files.pythonhosted.org/packages/86/41/585a168330ff063014880a80d744219dbf1dd7a1c706e75ab3425a987384/pandas-2.3.3-cp312-cp312-win_amd64.whl", hash = "sha256:a16dcec078a01eeef8ee61bf64074b4e524a2a3f4b3be9326420cabe59c4778b", size = 10992722, upload-time = "2025-09-29T23:20:54.139Z" },
+]
+
+[[package]]
+name = "pandas"
+version = "3.0.0"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+]
+dependencies = [
+ { name = "numpy", marker = "python_full_version >= '3.11'" },
+ { name = "python-dateutil", marker = "python_full_version >= '3.11'" },
+ { name = "tzdata", marker = "(python_full_version >= '3.11' and sys_platform == 'emscripten') or (python_full_version >= '3.11' and sys_platform == 'win32')" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/de/da/b1dc0481ab8d55d0f46e343cfe67d4551a0e14fcee52bd38ca1bd73258d8/pandas-3.0.0.tar.gz", hash = "sha256:0facf7e87d38f721f0af46fe70d97373a37701b1c09f7ed7aeeb292ade5c050f", size = 4633005, upload-time = "2026-01-21T15:52:04.726Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/46/1e/b184654a856e75e975a6ee95d6577b51c271cd92cb2b020c9378f53e0032/pandas-3.0.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:d64ce01eb9cdca96a15266aa679ae50212ec52757c79204dbc7701a222401850", size = 10313247, upload-time = "2026-01-21T15:50:15.775Z" },
+ { url = "https://files.pythonhosted.org/packages/dd/5e/e04a547ad0f0183bf151fd7c7a477468e3b85ff2ad231c566389e6cc9587/pandas-3.0.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:613e13426069793aa1ec53bdcc3b86e8d32071daea138bbcf4fa959c9cdaa2e2", size = 9913131, upload-time = "2026-01-21T15:50:18.611Z" },
+ { url = "https://files.pythonhosted.org/packages/a2/93/bb77bfa9fc2aba9f7204db807d5d3fb69832ed2854c60ba91b4c65ba9219/pandas-3.0.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0192fee1f1a8e743b464a6607858ee4b071deb0b118eb143d71c2a1d170996d5", size = 10741925, upload-time = "2026-01-21T15:50:21.058Z" },
+ { url = "https://files.pythonhosted.org/packages/62/fb/89319812eb1d714bfc04b7f177895caeba8ab4a37ef6712db75ed786e2e0/pandas-3.0.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f0b853319dec8d5e0c8b875374c078ef17f2269986a78168d9bd57e49bf650ae", size = 11245979, upload-time = "2026-01-21T15:50:23.413Z" },
+ { url = "https://files.pythonhosted.org/packages/a9/63/684120486f541fc88da3862ed31165b3b3e12b6a1c7b93be4597bc84e26c/pandas-3.0.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:707a9a877a876c326ae2cb640fbdc4ef63b0a7b9e2ef55c6df9942dcee8e2af9", size = 11756337, upload-time = "2026-01-21T15:50:25.932Z" },
+ { url = "https://files.pythonhosted.org/packages/39/92/7eb0ad232312b59aec61550c3c81ad0743898d10af5df7f80bc5e5065416/pandas-3.0.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:afd0aa3d0b5cda6e0b8ffc10dbcca3b09ef3cbcd3fe2b27364f85fdc04e1989d", size = 12325517, upload-time = "2026-01-21T15:50:27.952Z" },
+ { url = "https://files.pythonhosted.org/packages/51/27/bf9436dd0a4fc3130acec0828951c7ef96a0631969613a9a35744baf27f6/pandas-3.0.0-cp311-cp311-win_amd64.whl", hash = "sha256:113b4cca2614ff7e5b9fee9b6f066618fe73c5a83e99d721ffc41217b2bf57dd", size = 9881576, upload-time = "2026-01-21T15:50:30.149Z" },
+ { url = "https://files.pythonhosted.org/packages/e7/2b/c618b871fce0159fd107516336e82891b404e3f340821853c2fc28c7830f/pandas-3.0.0-cp311-cp311-win_arm64.whl", hash = "sha256:c14837eba8e99a8da1527c0280bba29b0eb842f64aa94982c5e21227966e164b", size = 9140807, upload-time = "2026-01-21T15:50:32.308Z" },
+ { url = "https://files.pythonhosted.org/packages/0b/38/db33686f4b5fa64d7af40d96361f6a4615b8c6c8f1b3d334eee46ae6160e/pandas-3.0.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:9803b31f5039b3c3b10cc858c5e40054adb4b29b4d81cb2fd789f4121c8efbcd", size = 10334013, upload-time = "2026-01-21T15:50:34.771Z" },
+ { url = "https://files.pythonhosted.org/packages/a5/7b/9254310594e9774906bacdd4e732415e1f86ab7dbb4b377ef9ede58cd8ec/pandas-3.0.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:14c2a4099cd38a1d18ff108168ea417909b2dea3bd1ebff2ccf28ddb6a74d740", size = 9874154, upload-time = "2026-01-21T15:50:36.67Z" },
+ { url = "https://files.pythonhosted.org/packages/63/d4/726c5a67a13bc66643e66d2e9ff115cead482a44fc56991d0c4014f15aaf/pandas-3.0.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d257699b9a9960e6125686098d5714ac59d05222bef7a5e6af7a7fd87c650801", size = 10384433, upload-time = "2026-01-21T15:50:39.132Z" },
+ { url = "https://files.pythonhosted.org/packages/bf/2e/9211f09bedb04f9832122942de8b051804b31a39cfbad199a819bb88d9f3/pandas-3.0.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:69780c98f286076dcafca38d8b8eee1676adf220199c0a39f0ecbf976b68151a", size = 10864519, upload-time = "2026-01-21T15:50:41.043Z" },
+ { url = "https://files.pythonhosted.org/packages/00/8d/50858522cdc46ac88b9afdc3015e298959a70a08cd21e008a44e9520180c/pandas-3.0.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:4a66384f017240f3858a4c8a7cf21b0591c3ac885cddb7758a589f0f71e87ebb", size = 11394124, upload-time = "2026-01-21T15:50:43.377Z" },
+ { url = "https://files.pythonhosted.org/packages/86/3f/83b2577db02503cd93d8e95b0f794ad9d4be0ba7cb6c8bcdcac964a34a42/pandas-3.0.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:be8c515c9bc33989d97b89db66ea0cececb0f6e3c2a87fcc8b69443a6923e95f", size = 11920444, upload-time = "2026-01-21T15:50:45.932Z" },
+ { url = "https://files.pythonhosted.org/packages/64/2d/4f8a2f192ed12c90a0aab47f5557ece0e56b0370c49de9454a09de7381b2/pandas-3.0.0-cp312-cp312-win_amd64.whl", hash = "sha256:a453aad8c4f4e9f166436994a33884442ea62aa8b27d007311e87521b97246e1", size = 9730970, upload-time = "2026-01-21T15:50:47.962Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/64/ff571be435cf1e643ca98d0945d76732c0b4e9c37191a89c8550b105eed1/pandas-3.0.0-cp312-cp312-win_arm64.whl", hash = "sha256:da768007b5a33057f6d9053563d6b74dd6d029c337d93c6d0d22a763a5c2ecc0", size = 9041950, upload-time = "2026-01-21T15:50:50.422Z" },
+]
+
+[[package]]
+name = "pandocfilters"
+version = "1.5.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/70/6f/3dd4940bbe001c06a65f88e36bad298bc7a0de5036115639926b0c5c0458/pandocfilters-1.5.1.tar.gz", hash = "sha256:002b4a555ee4ebc03f8b66307e287fa492e4a77b4ea14d3f934328297bb4939e", size = 8454, upload-time = "2024-01-18T20:08:13.726Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ef/af/4fbc8cab944db5d21b7e2a5b8e9211a03a79852b1157e2c102fcc61ac440/pandocfilters-1.5.1-py2.py3-none-any.whl", hash = "sha256:93be382804a9cdb0a7267585f157e5d1731bbe5545a85b268d6f5fe6232de2bc", size = 8663, upload-time = "2024-01-18T20:08:11.28Z" },
+]
+
+[[package]]
+name = "parso"
+version = "0.8.5"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d4/de/53e0bcf53d13e005bd8c92e7855142494f41171b34c2536b86187474184d/parso-0.8.5.tar.gz", hash = "sha256:034d7354a9a018bdce352f48b2a8a450f05e9d6ee85db84764e9b6bd96dafe5a", size = 401205, upload-time = "2025-08-23T15:15:28.028Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/16/32/f8e3c85d1d5250232a5d3477a2a28cc291968ff175caeadaf3cc19ce0e4a/parso-0.8.5-py2.py3-none-any.whl", hash = "sha256:646204b5ee239c396d040b90f9e272e9a8017c630092bf59980beb62fd033887", size = 106668, upload-time = "2025-08-23T15:15:25.663Z" },
+]
+
+[[package]]
+name = "pathspec"
+version = "1.0.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/fa/36/e27608899f9b8d4dff0617b2d9ab17ca5608956ca44461ac14ac48b44015/pathspec-1.0.4.tar.gz", hash = "sha256:0210e2ae8a21a9137c0d470578cb0e595af87edaa6ebf12ff176f14a02e0e645", size = 131200, upload-time = "2026-01-27T03:59:46.938Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ef/3c/2c197d226f9ea224a9ab8d197933f9da0ae0aac5b6e0f884e2b8d9c8e9f7/pathspec-1.0.4-py3-none-any.whl", hash = "sha256:fb6ae2fd4e7c921a165808a552060e722767cfa526f99ca5156ed2ce45a5c723", size = 55206, upload-time = "2026-01-27T03:59:45.137Z" },
+]
+
+[[package]]
+name = "pexpect"
+version = "4.9.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "ptyprocess", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/42/92/cc564bf6381ff43ce1f4d06852fc19a2f11d180f23dc32d9588bee2f149d/pexpect-4.9.0.tar.gz", hash = "sha256:ee7d41123f3c9911050ea2c2dac107568dc43b2d3b0c7557a33212c398ead30f", size = 166450, upload-time = "2023-11-25T09:07:26.339Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9e/c3/059298687310d527a58bb01f3b1965787ee3b40dce76752eda8b44e9a2c5/pexpect-4.9.0-py2.py3-none-any.whl", hash = "sha256:7236d1e080e4936be2dc3e326cec0af72acf9212a7e1d060210e70a47e253523", size = 63772, upload-time = "2023-11-25T06:56:14.81Z" },
+]
+
+[[package]]
+name = "pillow"
+version = "12.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/d0/02/d52c733a2452ef1ffcc123b68e6606d07276b0e358db70eabad7e40042b7/pillow-12.1.0.tar.gz", hash = "sha256:5c5ae0a06e9ea030ab786b0251b32c7e4ce10e58d983c0d5c56029455180b5b9", size = 46977283, upload-time = "2026-01-02T09:13:29.892Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/fe/41/f73d92b6b883a579e79600d391f2e21cb0df767b2714ecbd2952315dfeef/pillow-12.1.0-cp310-cp310-macosx_10_10_x86_64.whl", hash = "sha256:fb125d860738a09d363a88daa0f59c4533529a90e564785e20fe875b200b6dbd", size = 5304089, upload-time = "2026-01-02T09:10:24.953Z" },
+ { url = "https://files.pythonhosted.org/packages/94/55/7aca2891560188656e4a91ed9adba305e914a4496800da6b5c0a15f09edf/pillow-12.1.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:cad302dc10fac357d3467a74a9561c90609768a6f73a1923b0fd851b6486f8b0", size = 4657815, upload-time = "2026-01-02T09:10:27.063Z" },
+ { url = "https://files.pythonhosted.org/packages/e9/d2/b28221abaa7b4c40b7dba948f0f6a708bd7342c4d47ce342f0ea39643974/pillow-12.1.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:a40905599d8079e09f25027423aed94f2823adaf2868940de991e53a449e14a8", size = 6222593, upload-time = "2026-01-02T09:10:29.115Z" },
+ { url = "https://files.pythonhosted.org/packages/71/b8/7a61fb234df6a9b0b479f69e66901209d89ff72a435b49933f9122f94cac/pillow-12.1.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:92a7fe4225365c5e3a8e598982269c6d6698d3e783b3b1ae979e7819f9cd55c1", size = 8027579, upload-time = "2026-01-02T09:10:31.182Z" },
+ { url = "https://files.pythonhosted.org/packages/ea/51/55c751a57cc524a15a0e3db20e5cde517582359508d62305a627e77fd295/pillow-12.1.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:f10c98f49227ed8383d28174ee95155a675c4ed7f85e2e573b04414f7e371bda", size = 6335760, upload-time = "2026-01-02T09:10:33.02Z" },
+ { url = "https://files.pythonhosted.org/packages/dc/7c/60e3e6f5e5891a1a06b4c910f742ac862377a6fe842f7184df4a274ce7bf/pillow-12.1.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8637e29d13f478bc4f153d8daa9ffb16455f0a6cb287da1b432fdad2bfbd66c7", size = 7027127, upload-time = "2026-01-02T09:10:35.009Z" },
+ { url = "https://files.pythonhosted.org/packages/06/37/49d47266ba50b00c27ba63a7c898f1bb41a29627ced8c09e25f19ebec0ff/pillow-12.1.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:21e686a21078b0f9cb8c8a961d99e6a4ddb88e0fc5ea6e130172ddddc2e5221a", size = 6449896, upload-time = "2026-01-02T09:10:36.793Z" },
+ { url = "https://files.pythonhosted.org/packages/f9/e5/67fd87d2913902462cd9b79c6211c25bfe95fcf5783d06e1367d6d9a741f/pillow-12.1.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:2415373395a831f53933c23ce051021e79c8cd7979822d8cc478547a3f4da8ef", size = 7151345, upload-time = "2026-01-02T09:10:39.064Z" },
+ { url = "https://files.pythonhosted.org/packages/bd/15/f8c7abf82af68b29f50d77c227e7a1f87ce02fdc66ded9bf603bc3b41180/pillow-12.1.0-cp310-cp310-win32.whl", hash = "sha256:e75d3dba8fc1ddfec0cd752108f93b83b4f8d6ab40e524a95d35f016b9683b09", size = 6325568, upload-time = "2026-01-02T09:10:41.035Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/24/7d1c0e160b6b5ac2605ef7d8be537e28753c0db5363d035948073f5513d7/pillow-12.1.0-cp310-cp310-win_amd64.whl", hash = "sha256:64efdf00c09e31efd754448a383ea241f55a994fd079866b92d2bbff598aad91", size = 7032367, upload-time = "2026-01-02T09:10:43.09Z" },
+ { url = "https://files.pythonhosted.org/packages/f4/03/41c038f0d7a06099254c60f618d0ec7be11e79620fc23b8e85e5b31d9a44/pillow-12.1.0-cp310-cp310-win_arm64.whl", hash = "sha256:f188028b5af6b8fb2e9a76ac0f841a575bd1bd396e46ef0840d9b88a48fdbcea", size = 2452345, upload-time = "2026-01-02T09:10:44.795Z" },
+ { url = "https://files.pythonhosted.org/packages/43/c4/bf8328039de6cc22182c3ef007a2abfbbdab153661c0a9aa78af8d706391/pillow-12.1.0-cp311-cp311-macosx_10_10_x86_64.whl", hash = "sha256:a83e0850cb8f5ac975291ebfc4170ba481f41a28065277f7f735c202cd8e0af3", size = 5304057, upload-time = "2026-01-02T09:10:46.627Z" },
+ { url = "https://files.pythonhosted.org/packages/43/06/7264c0597e676104cc22ca73ee48f752767cd4b1fe084662620b17e10120/pillow-12.1.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:b6e53e82ec2db0717eabb276aa56cf4e500c9a7cec2c2e189b55c24f65a3e8c0", size = 4657811, upload-time = "2026-01-02T09:10:49.548Z" },
+ { url = "https://files.pythonhosted.org/packages/72/64/f9189e44474610daf83da31145fa56710b627b5c4c0b9c235e34058f6b31/pillow-12.1.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:40a8e3b9e8773876d6e30daed22f016509e3987bab61b3b7fe309d7019a87451", size = 6232243, upload-time = "2026-01-02T09:10:51.62Z" },
+ { url = "https://files.pythonhosted.org/packages/ef/30/0df458009be6a4caca4ca2c52975e6275c387d4e5c95544e34138b41dc86/pillow-12.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:800429ac32c9b72909c671aaf17ecd13110f823ddb7db4dfef412a5587c2c24e", size = 8037872, upload-time = "2026-01-02T09:10:53.446Z" },
+ { url = "https://files.pythonhosted.org/packages/e4/86/95845d4eda4f4f9557e25381d70876aa213560243ac1a6d619c46caaedd9/pillow-12.1.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0b022eaaf709541b391ee069f0022ee5b36c709df71986e3f7be312e46f42c84", size = 6345398, upload-time = "2026-01-02T09:10:55.426Z" },
+ { url = "https://files.pythonhosted.org/packages/5c/1f/8e66ab9be3aaf1435bc03edd1ebdf58ffcd17f7349c1d970cafe87af27d9/pillow-12.1.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1f345e7bc9d7f368887c712aa5054558bad44d2a301ddf9248599f4161abc7c0", size = 7034667, upload-time = "2026-01-02T09:10:57.11Z" },
+ { url = "https://files.pythonhosted.org/packages/f9/f6/683b83cb9b1db1fb52b87951b1c0b99bdcfceaa75febf11406c19f82cb5e/pillow-12.1.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d70347c8a5b7ccd803ec0c85c8709f036e6348f1e6a5bf048ecd9c64d3550b8b", size = 6458743, upload-time = "2026-01-02T09:10:59.331Z" },
+ { url = "https://files.pythonhosted.org/packages/9a/7d/de833d63622538c1d58ce5395e7c6cb7e7dce80decdd8bde4a484e095d9f/pillow-12.1.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:1fcc52d86ce7a34fd17cb04e87cfdb164648a3662a6f20565910a99653d66c18", size = 7159342, upload-time = "2026-01-02T09:11:01.82Z" },
+ { url = "https://files.pythonhosted.org/packages/8c/40/50d86571c9e5868c42b81fe7da0c76ca26373f3b95a8dd675425f4a92ec1/pillow-12.1.0-cp311-cp311-win32.whl", hash = "sha256:3ffaa2f0659e2f740473bcf03c702c39a8d4b2b7ffc629052028764324842c64", size = 6328655, upload-time = "2026-01-02T09:11:04.556Z" },
+ { url = "https://files.pythonhosted.org/packages/6c/af/b1d7e301c4cd26cd45d4af884d9ee9b6fab893b0ad2450d4746d74a6968c/pillow-12.1.0-cp311-cp311-win_amd64.whl", hash = "sha256:806f3987ffe10e867bab0ddad45df1148a2b98221798457fa097ad85d6e8bc75", size = 7031469, upload-time = "2026-01-02T09:11:06.538Z" },
+ { url = "https://files.pythonhosted.org/packages/48/36/d5716586d887fb2a810a4a61518a327a1e21c8b7134c89283af272efe84b/pillow-12.1.0-cp311-cp311-win_arm64.whl", hash = "sha256:9f5fefaca968e700ad1a4a9de98bf0869a94e397fe3524c4c9450c1445252304", size = 2452515, upload-time = "2026-01-02T09:11:08.226Z" },
+ { url = "https://files.pythonhosted.org/packages/20/31/dc53fe21a2f2996e1b7d92bf671cdb157079385183ef7c1ae08b485db510/pillow-12.1.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:a332ac4ccb84b6dde65dbace8431f3af08874bf9770719d32a635c4ef411b18b", size = 5262642, upload-time = "2026-01-02T09:11:10.138Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/c1/10e45ac9cc79419cedf5121b42dcca5a50ad2b601fa080f58c22fb27626e/pillow-12.1.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:907bfa8a9cb790748a9aa4513e37c88c59660da3bcfffbd24a7d9e6abf224551", size = 4657464, upload-time = "2026-01-02T09:11:12.319Z" },
+ { url = "https://files.pythonhosted.org/packages/ad/26/7b82c0ab7ef40ebede7a97c72d473bda5950f609f8e0c77b04af574a0ddb/pillow-12.1.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:efdc140e7b63b8f739d09a99033aa430accce485ff78e6d311973a67b6bf3208", size = 6234878, upload-time = "2026-01-02T09:11:14.096Z" },
+ { url = "https://files.pythonhosted.org/packages/76/25/27abc9792615b5e886ca9411ba6637b675f1b77af3104710ac7353fe5605/pillow-12.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:bef9768cab184e7ae6e559c032e95ba8d07b3023c289f79a2bd36e8bf85605a5", size = 8044868, upload-time = "2026-01-02T09:11:15.903Z" },
+ { url = "https://files.pythonhosted.org/packages/0a/ea/f200a4c36d836100e7bc738fc48cd963d3ba6372ebc8298a889e0cfc3359/pillow-12.1.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:742aea052cf5ab5034a53c3846165bc3ce88d7c38e954120db0ab867ca242661", size = 6349468, upload-time = "2026-01-02T09:11:17.631Z" },
+ { url = "https://files.pythonhosted.org/packages/11/8f/48d0b77ab2200374c66d344459b8958c86693be99526450e7aee714e03e4/pillow-12.1.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a6dfc2af5b082b635af6e08e0d1f9f1c4e04d17d4e2ca0ef96131e85eda6eb17", size = 7041518, upload-time = "2026-01-02T09:11:19.389Z" },
+ { url = "https://files.pythonhosted.org/packages/1d/23/c281182eb986b5d31f0a76d2a2c8cd41722d6fb8ed07521e802f9bba52de/pillow-12.1.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:609e89d9f90b581c8d16358c9087df76024cf058fa693dd3e1e1620823f39670", size = 6462829, upload-time = "2026-01-02T09:11:21.28Z" },
+ { url = "https://files.pythonhosted.org/packages/25/ef/7018273e0faac099d7b00982abdcc39142ae6f3bd9ceb06de09779c4a9d6/pillow-12.1.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:43b4899cfd091a9693a1278c4982f3e50f7fb7cff5153b05174b4afc9593b616", size = 7166756, upload-time = "2026-01-02T09:11:23.559Z" },
+ { url = "https://files.pythonhosted.org/packages/8f/c8/993d4b7ab2e341fe02ceef9576afcf5830cdec640be2ac5bee1820d693d4/pillow-12.1.0-cp312-cp312-win32.whl", hash = "sha256:aa0c9cc0b82b14766a99fbe6084409972266e82f459821cd26997a488a7261a7", size = 6328770, upload-time = "2026-01-02T09:11:25.661Z" },
+ { url = "https://files.pythonhosted.org/packages/a7/87/90b358775a3f02765d87655237229ba64a997b87efa8ccaca7dd3e36e7a7/pillow-12.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:d70534cea9e7966169ad29a903b99fc507e932069a881d0965a1a84bb57f6c6d", size = 7033406, upload-time = "2026-01-02T09:11:27.474Z" },
+ { url = "https://files.pythonhosted.org/packages/5d/cf/881b457eccacac9e5b2ddd97d5071fb6d668307c57cbf4e3b5278e06e536/pillow-12.1.0-cp312-cp312-win_arm64.whl", hash = "sha256:65b80c1ee7e14a87d6a068dd3b0aea268ffcabfe0498d38661b00c5b4b22e74c", size = 2452612, upload-time = "2026-01-02T09:11:29.309Z" },
+ { url = "https://files.pythonhosted.org/packages/8b/bc/224b1d98cffd7164b14707c91aac83c07b047fbd8f58eba4066a3e53746a/pillow-12.1.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:ca94b6aac0d7af2a10ba08c0f888b3d5114439b6b3ef39968378723622fed377", size = 5228605, upload-time = "2026-01-02T09:13:14.084Z" },
+ { url = "https://files.pythonhosted.org/packages/0c/ca/49ca7769c4550107de049ed85208240ba0f330b3f2e316f24534795702ce/pillow-12.1.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:351889afef0f485b84078ea40fe33727a0492b9af3904661b0abbafee0355b72", size = 4622245, upload-time = "2026-01-02T09:13:15.964Z" },
+ { url = "https://files.pythonhosted.org/packages/73/48/fac807ce82e5955bcc2718642b94b1bd22a82a6d452aea31cbb678cddf12/pillow-12.1.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:bb0984b30e973f7e2884362b7d23d0a348c7143ee559f38ef3eaab640144204c", size = 5247593, upload-time = "2026-01-02T09:13:17.913Z" },
+ { url = "https://files.pythonhosted.org/packages/d2/95/3e0742fe358c4664aed4fd05d5f5373dcdad0b27af52aa0972568541e3f4/pillow-12.1.0-pp311-pypy311_pp73-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:84cabc7095dd535ca934d57e9ce2a72ffd216e435a84acb06b2277b1de2689bd", size = 6989008, upload-time = "2026-01-02T09:13:20.083Z" },
+ { url = "https://files.pythonhosted.org/packages/5a/74/fe2ac378e4e202e56d50540d92e1ef4ff34ed687f3c60f6a121bcf99437e/pillow-12.1.0-pp311-pypy311_pp73-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:53d8b764726d3af1a138dd353116f774e3862ec7e3794e0c8781e30db0f35dfc", size = 5313824, upload-time = "2026-01-02T09:13:22.405Z" },
+ { url = "https://files.pythonhosted.org/packages/f3/77/2a60dee1adee4e2655ac328dd05c02a955c1cd683b9f1b82ec3feb44727c/pillow-12.1.0-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5da841d81b1a05ef940a8567da92decaa15bc4d7dedb540a8c219ad83d91808a", size = 5963278, upload-time = "2026-01-02T09:13:24.706Z" },
+ { url = "https://files.pythonhosted.org/packages/2d/71/64e9b1c7f04ae0027f788a248e6297d7fcc29571371fe7d45495a78172c0/pillow-12.1.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:75af0b4c229ac519b155028fa1be632d812a519abba9b46b20e50c6caa184f19", size = 7029809, upload-time = "2026-01-02T09:13:26.541Z" },
+]
+
+[[package]]
+name = "platformdirs"
+version = "4.5.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/cf/86/0248f086a84f01b37aaec0fa567b397df1a119f73c16f6c7a9aac73ea309/platformdirs-4.5.1.tar.gz", hash = "sha256:61d5cdcc6065745cdd94f0f878977f8de9437be93de97c1c12f853c9c0cdcbda", size = 21715, upload-time = "2025-12-05T13:52:58.638Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/cb/28/3bfe2fa5a7b9c46fe7e13c97bda14c895fb10fa2ebf1d0abb90e0cea7ee1/platformdirs-4.5.1-py3-none-any.whl", hash = "sha256:d03afa3963c806a9bed9d5125c8f4cb2fdaf74a55ab60e5d59b3fde758104d31", size = 18731, upload-time = "2025-12-05T13:52:56.823Z" },
+]
+
+[[package]]
+name = "pluggy"
+version = "1.6.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
+]
+
+[[package]]
+name = "portalocker"
+version = "3.2.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "pywin32", marker = "sys_platform == 'win32'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5e/77/65b857a69ed876e1951e88aaba60f5ce6120c33703f7cb61a3c894b8c1b6/portalocker-3.2.0.tar.gz", hash = "sha256:1f3002956a54a8c3730586c5c77bf18fae4149e07eaf1c29fc3faf4d5a3f89ac", size = 95644, upload-time = "2025-06-14T13:20:40.03Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/4b/a6/38c8e2f318bf67d338f4d629e93b0b4b9af331f455f0390ea8ce4a099b26/portalocker-3.2.0-py3-none-any.whl", hash = "sha256:3cdc5f565312224bc570c49337bd21428bba0ef363bbcf58b9ef4a9f11779968", size = 22424, upload-time = "2025-06-14T13:20:38.083Z" },
+]
+
+[[package]]
+name = "prometheus-client"
+version = "0.24.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f0/58/a794d23feb6b00fc0c72787d7e87d872a6730dd9ed7c7b3e954637d8f280/prometheus_client-0.24.1.tar.gz", hash = "sha256:7e0ced7fbbd40f7b84962d5d2ab6f17ef88a72504dcf7c0b40737b43b2a461f9", size = 85616, upload-time = "2026-01-14T15:26:26.965Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/74/c3/24a2f845e3917201628ecaba4f18bab4d18a337834c1df2a159ee9d22a42/prometheus_client-0.24.1-py3-none-any.whl", hash = "sha256:150db128af71a5c2482b36e588fc8a6b95e498750da4b17065947c16070f4055", size = 64057, upload-time = "2026-01-14T15:26:24.42Z" },
+]
+
+[[package]]
+name = "prompt-toolkit"
+version = "3.0.52"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "wcwidth" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a1/96/06e01a7b38dce6fe1db213e061a4602dd6032a8a97ef6c1a862537732421/prompt_toolkit-3.0.52.tar.gz", hash = "sha256:28cde192929c8e7321de85de1ddbe736f1375148b02f2e17edd840042b1be855", size = 434198, upload-time = "2025-08-27T15:24:02.057Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/84/03/0d3ce49e2505ae70cf43bc5bb3033955d2fc9f932163e84dc0779cc47f48/prompt_toolkit-3.0.52-py3-none-any.whl", hash = "sha256:9aac639a3bbd33284347de5ad8d68ecc044b91a762dc39b7c21095fcd6a19955", size = 391431, upload-time = "2025-08-27T15:23:59.498Z" },
+]
+
+[[package]]
+name = "protobuf"
+version = "6.33.5"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ba/25/7c72c307aafc96fa87062aa6291d9f7c94836e43214d43722e86037aac02/protobuf-6.33.5.tar.gz", hash = "sha256:6ddcac2a081f8b7b9642c09406bc6a4290128fce5f471cddd165960bb9119e5c", size = 444465, upload-time = "2026-01-29T21:51:33.494Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/b1/79/af92d0a8369732b027e6d6084251dd8e782c685c72da161bd4a2e00fbabb/protobuf-6.33.5-cp310-abi3-win32.whl", hash = "sha256:d71b040839446bac0f4d162e758bea99c8251161dae9d0983a3b88dee345153b", size = 425769, upload-time = "2026-01-29T21:51:21.751Z" },
+ { url = "https://files.pythonhosted.org/packages/55/75/bb9bc917d10e9ee13dee8607eb9ab963b7cf8be607c46e7862c748aa2af7/protobuf-6.33.5-cp310-abi3-win_amd64.whl", hash = "sha256:3093804752167bcab3998bec9f1048baae6e29505adaf1afd14a37bddede533c", size = 437118, upload-time = "2026-01-29T21:51:24.022Z" },
+ { url = "https://files.pythonhosted.org/packages/a2/6b/e48dfc1191bc5b52950246275bf4089773e91cb5ba3592621723cdddca62/protobuf-6.33.5-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:a5cb85982d95d906df1e2210e58f8e4f1e3cdc088e52c921a041f9c9a0386de5", size = 427766, upload-time = "2026-01-29T21:51:25.413Z" },
+ { url = "https://files.pythonhosted.org/packages/4e/b1/c79468184310de09d75095ed1314b839eb2f72df71097db9d1404a1b2717/protobuf-6.33.5-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:9b71e0281f36f179d00cbcb119cb19dec4d14a81393e5ea220f64b286173e190", size = 324638, upload-time = "2026-01-29T21:51:26.423Z" },
+ { url = "https://files.pythonhosted.org/packages/c5/f5/65d838092fd01c44d16037953fd4c2cc851e783de9b8f02b27ec4ffd906f/protobuf-6.33.5-cp39-abi3-manylinux2014_s390x.whl", hash = "sha256:8afa18e1d6d20af15b417e728e9f60f3aa108ee76f23c3b2c07a2c3b546d3afd", size = 339411, upload-time = "2026-01-29T21:51:27.446Z" },
+ { url = "https://files.pythonhosted.org/packages/9b/53/a9443aa3ca9ba8724fdfa02dd1887c1bcd8e89556b715cfbacca6b63dbec/protobuf-6.33.5-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:cbf16ba3350fb7b889fca858fb215967792dc125b35c7976ca4818bee3521cf0", size = 323465, upload-time = "2026-01-29T21:51:28.925Z" },
+ { url = "https://files.pythonhosted.org/packages/57/bf/2086963c69bdac3d7cff1cc7ff79b8ce5ea0bec6797a017e1be338a46248/protobuf-6.33.5-py3-none-any.whl", hash = "sha256:69915a973dd0f60f31a08b8318b73eab2bd6a392c79184b3612226b0a3f8ec02", size = 170687, upload-time = "2026-01-29T21:51:32.557Z" },
+]
+
+[[package]]
+name = "psutil"
+version = "7.2.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/aa/c6/d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/psutil-7.2.2.tar.gz", hash = "sha256:0746f5f8d406af344fd547f1c8daa5f5c33dbc293bb8d6a16d80b4bb88f59372", size = 493740, upload-time = "2026-01-28T18:14:54.428Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e7/36/5ee6e05c9bd427237b11b3937ad82bb8ad2752d72c6969314590dd0c2f6e/psutil-7.2.2-cp36-abi3-macosx_10_9_x86_64.whl", hash = "sha256:ed0cace939114f62738d808fdcecd4c869222507e266e574799e9c0faa17d486", size = 129090, upload-time = "2026-01-28T18:15:22.168Z" },
+ { url = "https://files.pythonhosted.org/packages/80/c4/f5af4c1ca8c1eeb2e92ccca14ce8effdeec651d5ab6053c589b074eda6e1/psutil-7.2.2-cp36-abi3-macosx_11_0_arm64.whl", hash = "sha256:1a7b04c10f32cc88ab39cbf606e117fd74721c831c98a27dc04578deb0c16979", size = 129859, upload-time = "2026-01-28T18:15:23.795Z" },
+ { url = "https://files.pythonhosted.org/packages/b5/70/5d8df3b09e25bce090399cf48e452d25c935ab72dad19406c77f4e828045/psutil-7.2.2-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:076a2d2f923fd4821644f5ba89f059523da90dc9014e85f8e45a5774ca5bc6f9", size = 155560, upload-time = "2026-01-28T18:15:25.976Z" },
+ { url = "https://files.pythonhosted.org/packages/63/65/37648c0c158dc222aba51c089eb3bdfa238e621674dc42d48706e639204f/psutil-7.2.2-cp36-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b0726cecd84f9474419d67252add4ac0cd9811b04d61123054b9fb6f57df6e9e", size = 156997, upload-time = "2026-01-28T18:15:27.794Z" },
+ { url = "https://files.pythonhosted.org/packages/8e/13/125093eadae863ce03c6ffdbae9929430d116a246ef69866dad94da3bfbc/psutil-7.2.2-cp36-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:fd04ef36b4a6d599bbdb225dd1d3f51e00105f6d48a28f006da7f9822f2606d8", size = 148972, upload-time = "2026-01-28T18:15:29.342Z" },
+ { url = "https://files.pythonhosted.org/packages/04/78/0acd37ca84ce3ddffaa92ef0f571e073faa6d8ff1f0559ab1272188ea2be/psutil-7.2.2-cp36-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:b58fabe35e80b264a4e3bb23e6b96f9e45a3df7fb7eed419ac0e5947c61e47cc", size = 148266, upload-time = "2026-01-28T18:15:31.597Z" },
+ { url = "https://files.pythonhosted.org/packages/b4/90/e2159492b5426be0c1fef7acba807a03511f97c5f86b3caeda6ad92351a7/psutil-7.2.2-cp37-abi3-win_amd64.whl", hash = "sha256:eb7e81434c8d223ec4a219b5fc1c47d0417b12be7ea866e24fb5ad6e84b3d988", size = 137737, upload-time = "2026-01-28T18:15:33.849Z" },
+ { url = "https://files.pythonhosted.org/packages/8c/c7/7bb2e321574b10df20cbde462a94e2b71d05f9bbda251ef27d104668306a/psutil-7.2.2-cp37-abi3-win_arm64.whl", hash = "sha256:8c233660f575a5a89e6d4cb65d9f938126312bca76d8fe087b947b3a1aaac9ee", size = 134617, upload-time = "2026-01-28T18:15:36.514Z" },
+]
+
+[[package]]
+name = "ptyprocess"
+version = "0.7.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/20/e5/16ff212c1e452235a90aeb09066144d0c5a6a8c0834397e03f5224495c4e/ptyprocess-0.7.0.tar.gz", hash = "sha256:5c5d0a3b48ceee0b48485e0c26037c0acd7d29765ca3fbb5cb3831d347423220", size = 70762, upload-time = "2020-12-28T15:15:30.155Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/22/a6/858897256d0deac81a172289110f31629fc4cee19b6f01283303e18c8db3/ptyprocess-0.7.0-py2.py3-none-any.whl", hash = "sha256:4b41f3967fce3af57cc7e94b888626c18bf37a083e3651ca8feeb66d492fef35", size = 13993, upload-time = "2020-12-28T15:15:28.35Z" },
+]
+
+[[package]]
+name = "pure-eval"
+version = "0.2.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/cd/05/0a34433a064256a578f1783a10da6df098ceaa4a57bbeaa96a6c0352786b/pure_eval-0.2.3.tar.gz", hash = "sha256:5f4e983f40564c576c7c8635ae88db5956bb2229d7e9237d03b3c0b0190eaf42", size = 19752, upload-time = "2024-07-21T12:58:21.801Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/8e/37/efad0257dc6e593a18957422533ff0f87ede7c9c6ea010a2177d738fb82f/pure_eval-0.2.3-py3-none-any.whl", hash = "sha256:1db8e35b67b3d218d818ae653e27f06c3aa420901fa7b081ca98cbedc874e0d0", size = 11842, upload-time = "2024-07-21T12:58:20.04Z" },
+]
+
+[[package]]
+name = "pycocotools"
+version = "2.0.11"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "numpy" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a2/df/32354b5dda963ffdfc8f75c9acf8828ef7890723a4ed57bb3ff2dc1d6f7e/pycocotools-2.0.11.tar.gz", hash = "sha256:34254d76da85576fcaf5c1f3aa9aae16b8cb15418334ba4283b800796bd1993d", size = 25381, upload-time = "2025-12-15T22:31:46.148Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/dd/4b/0c040fcda2c4fa4827b1a64e3185d99d5f954e45cc9463ba7385a1173a77/pycocotools-2.0.11-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:484d33515353186aadba9e2a290d81b107275cdb9565084e31a5568a52a0b120", size = 160351, upload-time = "2025-12-15T22:30:53.998Z" },
+ { url = "https://files.pythonhosted.org/packages/49/fe/861db6515824815eaabce27734653a6b100ddb22364b3345dd862b2c5b65/pycocotools-2.0.11-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ca9f120f719ec405ad0c74ccfdb8402b0c37bd5f88ab5b6482a0de2efd5a36f4", size = 463947, upload-time = "2025-12-15T22:30:55.419Z" },
+ { url = "https://files.pythonhosted.org/packages/c5/a1/b4b49b85763043372e66baa10dffa42337cf4687d6db22546c27f3a4d732/pycocotools-2.0.11-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e40a3a898c6e5340b8d70cf7984868b9bff8c3d80187de9a3b661d504d665978", size = 472455, upload-time = "2025-12-15T22:30:56.895Z" },
+ { url = "https://files.pythonhosted.org/packages/48/70/fac670296e6a2b45eb7434d0480b9af6cb85a8de4f4848b49b01154bc859/pycocotools-2.0.11-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:7cd4cdfd2c676f30838aa0b1047441892fb4f97d70bf3df480bcc7a18a64d7d4", size = 457911, upload-time = "2025-12-15T22:30:58.377Z" },
+ { url = "https://files.pythonhosted.org/packages/33/f5/6158de63354dfcb677c8da34a4d205cc532e3277338ab7e6dea1310ba8de/pycocotools-2.0.11-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:08c79789fd79e801ae4ecfcfeec32b31e36254e7a2b4019af28c104975d5e730", size = 476472, upload-time = "2025-12-15T22:30:59.736Z" },
+ { url = "https://files.pythonhosted.org/packages/fc/01/46d2a782cda19ba1beb7c431f417e1e478f0bf1273fa5fe5d10de7c18d76/pycocotools-2.0.11-cp310-cp310-win_amd64.whl", hash = "sha256:f78cbb1a32d061fcad4bdba083de70a39a21c1c3d9235a3f77d8f007541ec5ef", size = 80165, upload-time = "2025-12-15T22:31:00.886Z" },
+ { url = "https://files.pythonhosted.org/packages/ee/5c/6bd945781bb04c2148929183d1d67b05ce07996313b0f87bb88c6a805493/pycocotools-2.0.11-cp310-cp310-win_arm64.whl", hash = "sha256:e21311ea71f85591680d8992858e2d44a2a156dc3b2bf1c5c901c4a19348177b", size = 69358, upload-time = "2025-12-15T22:31:01.815Z" },
+ { url = "https://files.pythonhosted.org/packages/b3/3f/41ce3fce61b7721158f21b61727eb054805babc0088cfa48506935b80a36/pycocotools-2.0.11-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:81bdceebb4c64e9265213e2d733808a12f9c18dfb14457323cc6b9af07fa0e61", size = 158947, upload-time = "2025-12-15T22:31:03.291Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/9b/a739705b246445bd1376394bf9d1ec2dd292b16740e92f203461b2bb12ed/pycocotools-2.0.11-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a1c05f91ccc658dfe01325267209c4b435da1722c93eeb5749fabc1d087b6882", size = 485174, upload-time = "2025-12-15T22:31:04.395Z" },
+ { url = "https://files.pythonhosted.org/packages/34/70/7a12752784e57d8034a76c245c618a2f88a9d2463862b990f314aea7e5d6/pycocotools-2.0.11-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:18ba75ff58cedb33a85ce2c18f1452f1fe20c9dd59925eec5300b2bf6205dbe1", size = 493172, upload-time = "2025-12-15T22:31:05.504Z" },
+ { url = "https://files.pythonhosted.org/packages/5c/fc/d703599ac728209dba08aea8d4bee884d5adabfcd9041abed1658d863747/pycocotools-2.0.11-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:693417797f0377fd094eb815c0a1e7d1c3c0251b71e3b3779fce3b3cf24793c5", size = 480506, upload-time = "2025-12-15T22:31:06.77Z" },
+ { url = "https://files.pythonhosted.org/packages/81/d9/e1cfc320bbb2cd58c3b4398c3821cbe75d93c16ed3135ac9e774a18a02d3/pycocotools-2.0.11-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:b6a07071c441d0f5e480a8f287106191582e40289d4e242dfe684e0c8a751088", size = 497595, upload-time = "2025-12-15T22:31:08.277Z" },
+ { url = "https://files.pythonhosted.org/packages/a2/23/d17f6111c2a6ae8631d4fa90202bea05844da715d61431fbc34d276462d5/pycocotools-2.0.11-cp311-cp311-win_amd64.whl", hash = "sha256:8e159232adae3aef6b4e2d37b008bff107b26e9ed3b48e70ea6482302834bd34", size = 80519, upload-time = "2025-12-15T22:31:09.613Z" },
+ { url = "https://files.pythonhosted.org/packages/00/4c/76b00b31a724c3f5ccdab0f85e578afb2ca38d33be0a0e98f1770cafd958/pycocotools-2.0.11-cp311-cp311-win_arm64.whl", hash = "sha256:4fc9889e819452b9c142036e1eabac8a13a8bd552d8beba299a57e0da6bfa1ec", size = 69304, upload-time = "2025-12-15T22:31:10.592Z" },
+ { url = "https://files.pythonhosted.org/packages/87/12/2f2292332456e4e4aba1dec0e3de8f1fc40fb2f4fdb0ca1cb17db9861682/pycocotools-2.0.11-cp312-abi3-macosx_10_13_universal2.whl", hash = "sha256:a2e9634bc7cadfb01c88e0b98589aaf0bd12983c7927bde93f19c0103e5441f4", size = 147795, upload-time = "2025-12-15T22:31:11.519Z" },
+ { url = "https://files.pythonhosted.org/packages/63/3c/68d7ea376aada9046e7ea2d7d0dad0d27e1ae8b4b3c26a28346689390ab2/pycocotools-2.0.11-cp312-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7fd4121766cc057133534679c0ec3f9023dbd96e9b31cf95c86a069ebdac2b65", size = 398434, upload-time = "2025-12-15T22:31:12.558Z" },
+ { url = "https://files.pythonhosted.org/packages/23/59/dc81895beff4e1207a829d40d442ea87cefaac9f6499151965f05c479619/pycocotools-2.0.11-cp312-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a82d1c9ed83f75da0b3f244f2a3cf559351a283307bd9b79a4ee2b93ab3231dd", size = 411685, upload-time = "2025-12-15T22:31:13.995Z" },
+ { url = "https://files.pythonhosted.org/packages/0b/0b/5a8a7de300862a2eb5e2ecd3cb015126231379206cd3ebba8f025388d770/pycocotools-2.0.11-cp312-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:89e853425018e2c2920ee0f2112cf7c140a1dcf5f4f49abd9c2da112c3e0f4b3", size = 390500, upload-time = "2025-12-15T22:31:15.138Z" },
+ { url = "https://files.pythonhosted.org/packages/63/b5/519bb68647f06feea03d5f355c33c05800aeae4e57b9482b2859eb00752e/pycocotools-2.0.11-cp312-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:87af87b8d06d5b852a885a319d9362dca3bed9f8bbcc3feb6513acb1f88ea242", size = 409790, upload-time = "2025-12-15T22:31:16.326Z" },
+ { url = "https://files.pythonhosted.org/packages/83/b4/f6708404ff494706b80e714b919f76dc4ec9845a4007affd6d6b0843f928/pycocotools-2.0.11-cp312-abi3-win_amd64.whl", hash = "sha256:ffe806ce535f5996445188f9a35643791dc54beabc61bd81e2b03367356d604f", size = 77570, upload-time = "2025-12-15T22:31:17.703Z" },
+ { url = "https://files.pythonhosted.org/packages/6e/63/778cd0ddc9d4a78915ac0a72b56d7fb204f7c3fabdad067d67ea0089762e/pycocotools-2.0.11-cp312-abi3-win_arm64.whl", hash = "sha256:c230f5e7b14bd19085217b4f40bba81bf14a182b150b8e9fab1c15d504ade343", size = 64564, upload-time = "2025-12-15T22:31:18.652Z" },
+]
+
+[[package]]
+name = "pycparser"
+version = "3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/1b/7d/92392ff7815c21062bea51aa7b87d45576f649f16458d78b7cf94b9ab2e6/pycparser-3.0.tar.gz", hash = "sha256:600f49d217304a5902ac3c37e1281c9fe94e4d0489de643a9504c5cdfdfc6b29", size = 103492, upload-time = "2026-01-21T14:26:51.89Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/0c/c3/44f3fbbfa403ea2a7c779186dc20772604442dde72947e7d01069cbe98e3/pycparser-3.0-py3-none-any.whl", hash = "sha256:b727414169a36b7d524c1c3e31839a521725078d7b2ff038656844266160a992", size = 48172, upload-time = "2026-01-21T14:26:50.693Z" },
+]
+
+[[package]]
+name = "pygments"
+version = "2.19.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b0/77/a5b8c569bf593b0140bde72ea885a803b82086995367bf2037de0159d924/pygments-2.19.2.tar.gz", hash = "sha256:636cb2477cec7f8952536970bc533bc43743542f70392ae026374600add5b887", size = 4968631, upload-time = "2025-06-21T13:39:12.283Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c7/21/705964c7812476f378728bdf590ca4b771ec72385c533964653c68e86bdc/pygments-2.19.2-py3-none-any.whl", hash = "sha256:86540386c03d588bb81d44bc3928634ff26449851e99741617ecb9037ee5ec0b", size = 1225217, upload-time = "2025-06-21T13:39:07.939Z" },
+]
+
+[[package]]
+name = "pyparsing"
+version = "3.3.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f3/91/9c6ee907786a473bf81c5f53cf703ba0957b23ab84c264080fb5a450416f/pyparsing-3.3.2.tar.gz", hash = "sha256:c777f4d763f140633dcb6d8a3eda953bf7a214dc4eff598413c070bcdc117cbc", size = 6851574, upload-time = "2026-01-21T03:57:59.36Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/10/bd/c038d7cc38edc1aa5bf91ab8068b63d4308c66c4c8bb3cbba7dfbc049f9c/pyparsing-3.3.2-py3-none-any.whl", hash = "sha256:850ba148bd908d7e2411587e247a1e4f0327839c40e2e5e6d05a007ecc69911d", size = 122781, upload-time = "2026-01-21T03:57:55.912Z" },
+]
+
+[[package]]
+name = "pytest"
+version = "9.0.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "colorama", marker = "sys_platform == 'win32'" },
+ { name = "exceptiongroup", marker = "python_full_version < '3.11'" },
+ { name = "iniconfig" },
+ { name = "packaging" },
+ { name = "pluggy" },
+ { name = "pygments" },
+ { name = "tomli", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/d1/db/7ef3487e0fb0049ddb5ce41d3a49c235bf9ad299b6a25d5780a89f19230f/pytest-9.0.2.tar.gz", hash = "sha256:75186651a92bd89611d1d9fc20f0b4345fd827c41ccd5c299a868a05d70edf11", size = 1568901, upload-time = "2025-12-06T21:30:51.014Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3b/ab/b3226f0bd7cdcf710fbede2b3548584366da3b19b5021e74f5bde2a8fa3f/pytest-9.0.2-py3-none-any.whl", hash = "sha256:711ffd45bf766d5264d487b917733b453d917afd2b0ad65223959f59089f875b", size = 374801, upload-time = "2025-12-06T21:30:49.154Z" },
+]
+
+[[package]]
+name = "pytest-cov"
+version = "7.0.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "coverage", extra = ["toml"] },
+ { name = "pluggy" },
+ { name = "pytest" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5e/f7/c933acc76f5208b3b00089573cf6a2bc26dc80a8aece8f52bb7d6b1855ca/pytest_cov-7.0.0.tar.gz", hash = "sha256:33c97eda2e049a0c5298e91f519302a1334c26ac65c1a483d6206fd458361af1", size = 54328, upload-time = "2025-09-09T10:57:02.113Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ee/49/1377b49de7d0c1ce41292161ea0f721913fa8722c19fb9c1e3aa0367eecb/pytest_cov-7.0.0-py3-none-any.whl", hash = "sha256:3b8e9558b16cc1479da72058bdecf8073661c7f57f7d3c5f22a1c23507f2d861", size = 22424, upload-time = "2025-09-09T10:57:00.695Z" },
+]
+
+[[package]]
+name = "python-dateutil"
+version = "2.9.0.post0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "six" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/66/c0/0c8b6ad9f17a802ee498c46e004a0eb49bc148f2fd230864601a86dcf6db/python-dateutil-2.9.0.post0.tar.gz", hash = "sha256:37dd54208da7e1cd875388217d5e00ebd4179249f90fb72437e91a35459a0ad3", size = 342432, upload-time = "2024-03-01T18:36:20.211Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl", hash = "sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427", size = 229892, upload-time = "2024-03-01T18:36:18.57Z" },
+]
+
+[[package]]
+name = "python-json-logger"
+version = "4.0.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/29/bf/eca6a3d43db1dae7070f70e160ab20b807627ba953663ba07928cdd3dc58/python_json_logger-4.0.0.tar.gz", hash = "sha256:f58e68eb46e1faed27e0f574a55a0455eecd7b8a5b88b85a784519ba3cff047f", size = 17683, upload-time = "2025-10-06T04:15:18.984Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/51/e5/fecf13f06e5e5f67e8837d777d1bc43fac0ed2b77a676804df5c34744727/python_json_logger-4.0.0-py3-none-any.whl", hash = "sha256:af09c9daf6a813aa4cc7180395f50f2a9e5fa056034c9953aec92e381c5ba1e2", size = 15548, upload-time = "2025-10-06T04:15:17.553Z" },
+]
+
+[[package]]
+name = "python-rapidjson"
+version = "1.23"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/26/3a/c32aee1dc385e50c1d6e78e56abdbc6aca283127f06f6ec0be1a86b2e3c1/python_rapidjson-1.23.tar.gz", hash = "sha256:0f845daeb26be147f5720a8c410308235092bb4fbb81ea408aa77203e26296fb", size = 239605, upload-time = "2025-12-07T06:14:27.51Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/d7/df/653bedef7af1137d015501eb00cb2e1a46964015b3bb5c4e8096451af577/python_rapidjson-1.23-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:dbb0958c5132d3def9d2be178dee45a2c41071573558348091bc539fe2671cd2", size = 215755, upload-time = "2025-12-07T07:18:44.814Z" },
+ { url = "https://files.pythonhosted.org/packages/aa/5a/7e3f00667b949dff400e3ea36f2df4b0c7c9d12016820d2ad8289a9c2c05/python_rapidjson-1.23-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:7f9314d155f342bc95912a0db9c1087a8d043ecceccaff33c21d257de4376f20", size = 212834, upload-time = "2025-12-07T07:18:46.098Z" },
+ { url = "https://files.pythonhosted.org/packages/37/53/c3dfca7a9d1c2166bb1a74c786049c730a7534d480c11772802c6a6955e5/python_rapidjson-1.23-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:332c972ceaaa7faff559d370702682ed20bbe36fdab477679b157a0f3c0391d5", size = 1689856, upload-time = "2025-12-07T07:18:47.368Z" },
+ { url = "https://files.pythonhosted.org/packages/a4/f9/3101f069dbf95b64a1ce5d04a88161a653f7542a1312604b6361a2837610/python_rapidjson-1.23-cp310-cp310-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:c9f92eed9120b83acd7be9df932318de1c0d6e40d01594e9d855fef41cdb55c7", size = 1749875, upload-time = "2025-12-07T07:18:48.599Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/f3/87fb25da840ba08c08ca39755a3c2ef3cec74969436fb2e216f937e98a21/python_rapidjson-1.23-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b79f9a4af92188098008623b17743bf8a300abeacb383d0424d070b4fbfbf83e", size = 1727398, upload-time = "2025-12-07T07:18:50.627Z" },
+ { url = "https://files.pythonhosted.org/packages/bc/94/959517668e91294cd5881d2dfa22b78602bf39225889f76dd98d003725f7/python_rapidjson-1.23-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:e60f8aa275d403a407abeb31a6bf319826c9f6b565a569a8dfb930fe0a2cca1a", size = 2534620, upload-time = "2025-12-07T07:18:52.285Z" },
+ { url = "https://files.pythonhosted.org/packages/b8/7b/b2f08b3bbf16f30ddcd63de63cdb3b2306a6f2795d13fbf66dcaf1966352/python_rapidjson-1.23-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:4c871e878c30f87077249d65ed65c0d5c24dd93c412f487cd95f648bec209909", size = 2665485, upload-time = "2025-12-07T07:18:53.914Z" },
+ { url = "https://files.pythonhosted.org/packages/c4/fd/5cae94b4b0615a479ac2644485c9350f83bcf522d975c40e3b757b590e67/python_rapidjson-1.23-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:fc2b0c06b8e25528bbdeef45fda1ea373f62c277eb8f77518646e82ecc9b0718", size = 2648031, upload-time = "2025-12-07T07:18:55.568Z" },
+ { url = "https://files.pythonhosted.org/packages/b7/8e/3eba493d0b83d00752d6b6b4a158088888e6bc377e3621b8d895ab25fc44/python_rapidjson-1.23-cp310-cp310-win32.whl", hash = "sha256:027c2bd096ac505c52a7ff1a7f2860b0ee451bc75e69777b49fd3d842efff544", size = 130264, upload-time = "2025-12-07T07:18:57.021Z" },
+ { url = "https://files.pythonhosted.org/packages/a5/28/01844118ab8689ea65c83f3c7008bab358e0110decf350d4aab9177863c6/python_rapidjson-1.23-cp310-cp310-win_amd64.whl", hash = "sha256:6516b8538b2081bbaf737326dba472b354d1bf873adbb155fc7c735916761afb", size = 150862, upload-time = "2025-12-07T07:18:58.35Z" },
+ { url = "https://files.pythonhosted.org/packages/d8/aa/f2252ef867c46d92d9ecf7af0b461993be3900be2e4ac4d7952c5c2541f2/python_rapidjson-1.23-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:6da3f5726c4ca30595b9ccab0e0bcfc18b624c61af5f6e3877b80e00bef62995", size = 215753, upload-time = "2025-12-07T07:18:59.636Z" },
+ { url = "https://files.pythonhosted.org/packages/c4/95/7df68969ef598f32c0ee187712a215666cf382e1e85bcd8855efb253ac89/python_rapidjson-1.23-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:316e1541c98af3af4bc0908cb9baaa44019dcedcad69a9944e2237607deb3ce5", size = 212838, upload-time = "2025-12-07T07:19:00.645Z" },
+ { url = "https://files.pythonhosted.org/packages/93/af/26ea25c2bf5c5c04aba0d66eba1ce1dd7f5dac0c3203d4ceb72617a99b20/python_rapidjson-1.23-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e771315f7abd9c345c95f82dc013b6d60f041b6d708b26585a84e011dd5e092d", size = 1708625, upload-time = "2025-12-07T07:19:02.055Z" },
+ { url = "https://files.pythonhosted.org/packages/6c/ae/ea28dee1cce61f06768b7ea56c1644a08f5373acb1c660ff8ab186688d77/python_rapidjson-1.23-cp311-cp311-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:6ea4f914033440f4474931838b93890567f9214e26dd89d741b4260d1e243e56", size = 1768734, upload-time = "2025-12-07T07:19:03.541Z" },
+ { url = "https://files.pythonhosted.org/packages/99/8f/fb06132f7dc816b9689d43294bf58dd979da702cbdfe9fb265c5e7d54e6f/python_rapidjson-1.23-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d0702eae42704948320df851a35371db3e7a9494c123896cbb069cc8b56e3c4e", size = 1745736, upload-time = "2025-12-07T07:19:04.798Z" },
+ { url = "https://files.pythonhosted.org/packages/0a/2f/4a7f28d170c5c2498f18eba3fa2781c597bf10832898bca7a1089e0e252a/python_rapidjson-1.23-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:0cdc820cf2c7b5f4ac5ba4d697f049e0b93d2e9776b53b02a77051472d38cede", size = 2553385, upload-time = "2025-12-07T07:19:06.087Z" },
+ { url = "https://files.pythonhosted.org/packages/8a/41/dfc3d019fde28479a48bd9783f69b24a9e38f011688f9f54a157442053bb/python_rapidjson-1.23-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:016a35f2d93ee6be13c938ab4bc30dcf525c4ee6e4c3fbd5c75c2320fccea875", size = 2682591, upload-time = "2025-12-07T07:19:07.325Z" },
+ { url = "https://files.pythonhosted.org/packages/a9/46/d0f4edf6fef6ae6e544f823d1d4baa35d5e0e940c6485899ddb0577a7ceb/python_rapidjson-1.23-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:40990adbf47bcb7f80b96b586e4ce114b497c19516a8e6a0700d61447ca8d28b", size = 2663035, upload-time = "2025-12-07T07:19:08.673Z" },
+ { url = "https://files.pythonhosted.org/packages/39/4e/63a50c0ec7838da2997d3743104bf85f8c3444391ad413aca270ab93f2d9/python_rapidjson-1.23-cp311-cp311-win32.whl", hash = "sha256:8a2dc5faba744b643901489e82f037cef099be92b9d4d0eea597c1d5aea910be", size = 130259, upload-time = "2025-12-07T07:19:10.142Z" },
+ { url = "https://files.pythonhosted.org/packages/40/18/2c2836d38b0b19bbad406b1e3a138c8b28880a4f858ba008ad7ff31f1935/python_rapidjson-1.23-cp311-cp311-win_amd64.whl", hash = "sha256:6d2db055ed9728071117a8395ebb1552b937c3d5bbcf6f610420f9b6f926c654", size = 150796, upload-time = "2025-12-07T07:19:11.295Z" },
+ { url = "https://files.pythonhosted.org/packages/08/e0/a78486cfb25a8c65d5e2a947aaa000bfd211b4705dc4e0657a42c6385cc5/python_rapidjson-1.23-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:56e557fb6a7d7babfeb8ebaa4d096d4ce127477ecf46fe7de7f1edf2e1d8e4d6", size = 216508, upload-time = "2025-12-07T07:19:12.614Z" },
+ { url = "https://files.pythonhosted.org/packages/6d/f2/b8d9a47cf55e25d76865d7f1691b2b94b38061c5f3fa4b385848a362366e/python_rapidjson-1.23-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:d8e107121f5c1e98cb4f0e5fde443e0f66b45eadc3269bc2416e31261535f444", size = 213921, upload-time = "2025-12-07T07:19:13.908Z" },
+ { url = "https://files.pythonhosted.org/packages/8a/ae/700b6f039fa799c3690193424185b1a2f1a49b035dd8cf81b73406dfbfca/python_rapidjson-1.23-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fc45ef1f725b3a9a27cdedcf9997f1f8c5a523ac03882d3925c6f764b33e5e1b", size = 1722258, upload-time = "2025-12-07T07:19:15.249Z" },
+ { url = "https://files.pythonhosted.org/packages/95/89/b4d2308a065d9a5ff3afc5c93c21358b5d82f944bbed4e54847231e24f81/python_rapidjson-1.23-cp312-cp312-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:f87de7b994d65da2327fffdc5d3d7166782e3ca99c76c0560c8a7f1e109a5b54", size = 1780680, upload-time = "2025-12-07T07:19:16.71Z" },
+ { url = "https://files.pythonhosted.org/packages/61/89/7b0047dfaa014cc456b29cf66913143bd0541225defaacf1727eee13291e/python_rapidjson-1.23-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6067810f0fd57713ec733b0b6ae265ef169e13b2ce04a4938b1807cddd8b4db4", size = 1760351, upload-time = "2025-12-07T07:19:17.946Z" },
+ { url = "https://files.pythonhosted.org/packages/70/60/a2dfb056a3ad6ca07c049c9376cfa509648765e805d9588c0f48bb998c33/python_rapidjson-1.23-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:83306643cf31c0833b226d4317e8738b1b5ed4371e310f3c552be994c01a3df0", size = 2570107, upload-time = "2025-12-07T07:19:19.17Z" },
+ { url = "https://files.pythonhosted.org/packages/b8/a6/e8873f34a07a524f4cb87a8934c783207674d5587533a50d0f2c55064d7b/python_rapidjson-1.23-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:13797fdcd43e558b81d3344c637bf878878fd6dede84409769d6910f8f6a9024", size = 2696763, upload-time = "2025-12-07T07:19:21.01Z" },
+ { url = "https://files.pythonhosted.org/packages/23/cb/ad2a16d6b20a457e8acd745dca416f19cf0de738311d213c544112260cc8/python_rapidjson-1.23-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ad674edb9dfe8181fb704a14149e5eb30ae179a92021484ebe8935b8d0f88495", size = 2675144, upload-time = "2025-12-07T07:19:22.609Z" },
+ { url = "https://files.pythonhosted.org/packages/65/27/943fef83837f002d990274b82d5193d066aeef128c2ba6c009d549d0e5ad/python_rapidjson-1.23-cp312-cp312-win32.whl", hash = "sha256:0c64958048ce714ccc42c659ef954812ed6de79fe4800322b3926ca46f60ffd9", size = 130858, upload-time = "2025-12-07T07:19:23.887Z" },
+ { url = "https://files.pythonhosted.org/packages/89/cd/ef6c1bc784c3a081fabcf867c1b3affcb18ba1ffd9d71aa036f96a2ef979/python_rapidjson-1.23-cp312-cp312-win_amd64.whl", hash = "sha256:cbb0a67a5330d28279a5c3b68068e901deedcd21ade0ec23be1bcc250948ae62", size = 151270, upload-time = "2025-12-07T07:19:25.057Z" },
+]
+
+[[package]]
+name = "pytz"
+version = "2025.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f8/bf/abbd3cdfb8fbc7fb3d4d38d320f2441b1e7cbe29be4f23797b4a2b5d8aac/pytz-2025.2.tar.gz", hash = "sha256:360b9e3dbb49a209c21ad61809c7fb453643e048b38924c765813546746e81c3", size = 320884, upload-time = "2025-03-25T02:25:00.538Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/81/c4/34e93fe5f5429d7570ec1fa436f1986fb1f00c3e0f43a589fe2bbcd22c3f/pytz-2025.2-py2.py3-none-any.whl", hash = "sha256:5ddf76296dd8c44c26eb8f4b6f35488f3ccbf6fbbd7adee0b7262d43f0ec2f00", size = 509225, upload-time = "2025-03-25T02:24:58.468Z" },
+]
+
+[[package]]
+name = "pywin32"
+version = "311"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/7b/40/44efbb0dfbd33aca6a6483191dae0716070ed99e2ecb0c53683f400a0b4f/pywin32-311-cp310-cp310-win32.whl", hash = "sha256:d03ff496d2a0cd4a5893504789d4a15399133fe82517455e78bad62efbb7f0a3", size = 8760432, upload-time = "2025-07-14T20:13:05.9Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/bf/360243b1e953bd254a82f12653974be395ba880e7ec23e3731d9f73921cc/pywin32-311-cp310-cp310-win_amd64.whl", hash = "sha256:797c2772017851984b97180b0bebe4b620bb86328e8a884bb626156295a63b3b", size = 9590103, upload-time = "2025-07-14T20:13:07.698Z" },
+ { url = "https://files.pythonhosted.org/packages/57/38/d290720e6f138086fb3d5ffe0b6caa019a791dd57866940c82e4eeaf2012/pywin32-311-cp310-cp310-win_arm64.whl", hash = "sha256:0502d1facf1fed4839a9a51ccbcc63d952cf318f78ffc00a7e78528ac27d7a2b", size = 8778557, upload-time = "2025-07-14T20:13:11.11Z" },
+ { url = "https://files.pythonhosted.org/packages/7c/af/449a6a91e5d6db51420875c54f6aff7c97a86a3b13a0b4f1a5c13b988de3/pywin32-311-cp311-cp311-win32.whl", hash = "sha256:184eb5e436dea364dcd3d2316d577d625c0351bf237c4e9a5fabbcfa5a58b151", size = 8697031, upload-time = "2025-07-14T20:13:13.266Z" },
+ { url = "https://files.pythonhosted.org/packages/51/8f/9bb81dd5bb77d22243d33c8397f09377056d5c687aa6d4042bea7fbf8364/pywin32-311-cp311-cp311-win_amd64.whl", hash = "sha256:3ce80b34b22b17ccbd937a6e78e7225d80c52f5ab9940fe0506a1a16f3dab503", size = 9508308, upload-time = "2025-07-14T20:13:15.147Z" },
+ { url = "https://files.pythonhosted.org/packages/44/7b/9c2ab54f74a138c491aba1b1cd0795ba61f144c711daea84a88b63dc0f6c/pywin32-311-cp311-cp311-win_arm64.whl", hash = "sha256:a733f1388e1a842abb67ffa8e7aad0e70ac519e09b0f6a784e65a136ec7cefd2", size = 8703930, upload-time = "2025-07-14T20:13:16.945Z" },
+ { url = "https://files.pythonhosted.org/packages/e7/ab/01ea1943d4eba0f850c3c61e78e8dd59757ff815ff3ccd0a84de5f541f42/pywin32-311-cp312-cp312-win32.whl", hash = "sha256:750ec6e621af2b948540032557b10a2d43b0cee2ae9758c54154d711cc852d31", size = 8706543, upload-time = "2025-07-14T20:13:20.765Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/a8/a0e8d07d4d051ec7502cd58b291ec98dcc0c3fff027caad0470b72cfcc2f/pywin32-311-cp312-cp312-win_amd64.whl", hash = "sha256:b8c095edad5c211ff31c05223658e71bf7116daa0ecf3ad85f3201ea3190d067", size = 9495040, upload-time = "2025-07-14T20:13:22.543Z" },
+ { url = "https://files.pythonhosted.org/packages/ba/3a/2ae996277b4b50f17d61f0603efd8253cb2d79cc7ae159468007b586396d/pywin32-311-cp312-cp312-win_arm64.whl", hash = "sha256:e286f46a9a39c4a18b319c28f59b61de793654af2f395c102b4f819e584b5852", size = 8710102, upload-time = "2025-07-14T20:13:24.682Z" },
+]
+
+[[package]]
+name = "pywinpty"
+version = "3.0.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f7/54/37c7370ba91f579235049dc26cd2c5e657d2a943e01820844ffc81f32176/pywinpty-3.0.3.tar.gz", hash = "sha256:523441dc34d231fb361b4b00f8c99d3f16de02f5005fd544a0183112bcc22412", size = 31309, upload-time = "2026-02-04T21:51:09.524Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/62/28/a652709bd76ca7533cd1c443b03add9f5051fdf71bc6bdb8801dddd4e7a3/pywinpty-3.0.3-cp310-cp310-win_amd64.whl", hash = "sha256:ff05f12d775b142b11c6fe085129bdd759b61cf7d41da6c745e78e3a1ef5bf40", size = 2114320, upload-time = "2026-02-04T21:53:50.972Z" },
+ { url = "https://files.pythonhosted.org/packages/b2/13/a0181cc5c2d5635d3dbc3802b97bc8e3ad4fa7502ccef576651a5e08e54c/pywinpty-3.0.3-cp310-cp310-win_arm64.whl", hash = "sha256:340ccacb4d74278a631923794ccd758471cfc8eeeeee4610b280420a17ad1e82", size = 235670, upload-time = "2026-02-04T21:50:20.324Z" },
+ { url = "https://files.pythonhosted.org/packages/79/c3/3e75075c7f71735f22b66fab0481f2c98e3a4d58cba55cb50ba29114bcf6/pywinpty-3.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:dff25a9a6435f527d7c65608a7e62783fc12076e7d44487a4911ee91be5a8ac8", size = 2114430, upload-time = "2026-02-04T21:54:19.485Z" },
+ { url = "https://files.pythonhosted.org/packages/8d/1e/8a54166a8c5e4f5cb516514bdf4090be4d51a71e8d9f6d98c0aa00fe45d4/pywinpty-3.0.3-cp311-cp311-win_arm64.whl", hash = "sha256:fbc1e230e5b193eef4431cba3f39996a288f9958f9c9f092c8a961d930ee8f68", size = 236191, upload-time = "2026-02-04T21:50:36.239Z" },
+ { url = "https://files.pythonhosted.org/packages/7c/d4/aeb5e1784d2c5bff6e189138a9ca91a090117459cea0c30378e1f2db3d54/pywinpty-3.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:c9081df0e49ffa86d15db4a6ba61530630e48707f987df42c9d3313537e81fc0", size = 2113098, upload-time = "2026-02-04T21:54:37.711Z" },
+ { url = "https://files.pythonhosted.org/packages/b9/53/7278223c493ccfe4883239cf06c823c56460a8010e0fc778eef67858dc14/pywinpty-3.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:15e79d870e18b678fb8a5a6105fd38496b55697c66e6fc0378236026bc4d59e9", size = 234901, upload-time = "2026-02-04T21:53:31.35Z" },
+]
+
+[[package]]
+name = "pyyaml"
+version = "6.0.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/05/8e/961c0007c59b8dd7729d542c61a4d537767a59645b82a0b521206e1e25c2/pyyaml-6.0.3.tar.gz", hash = "sha256:d76623373421df22fb4cf8817020cbb7ef15c725b9d5e45f17e189bfc384190f", size = 130960, upload-time = "2025-09-25T21:33:16.546Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/f4/a0/39350dd17dd6d6c6507025c0e53aef67a9293a6d37d3511f23ea510d5800/pyyaml-6.0.3-cp310-cp310-macosx_10_13_x86_64.whl", hash = "sha256:214ed4befebe12df36bcc8bc2b64b396ca31be9304b8f59e25c11cf94a4c033b", size = 184227, upload-time = "2025-09-25T21:31:46.04Z" },
+ { url = "https://files.pythonhosted.org/packages/05/14/52d505b5c59ce73244f59c7a50ecf47093ce4765f116cdb98286a71eeca2/pyyaml-6.0.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:02ea2dfa234451bbb8772601d7b8e426c2bfa197136796224e50e35a78777956", size = 174019, upload-time = "2025-09-25T21:31:47.706Z" },
+ { url = "https://files.pythonhosted.org/packages/43/f7/0e6a5ae5599c838c696adb4e6330a59f463265bfa1e116cfd1fbb0abaaae/pyyaml-6.0.3-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b30236e45cf30d2b8e7b3e85881719e98507abed1011bf463a8fa23e9c3e98a8", size = 740646, upload-time = "2025-09-25T21:31:49.21Z" },
+ { url = "https://files.pythonhosted.org/packages/2f/3a/61b9db1d28f00f8fd0ae760459a5c4bf1b941baf714e207b6eb0657d2578/pyyaml-6.0.3-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:66291b10affd76d76f54fad28e22e51719ef9ba22b29e1d7d03d6777a9174198", size = 840793, upload-time = "2025-09-25T21:31:50.735Z" },
+ { url = "https://files.pythonhosted.org/packages/7a/1e/7acc4f0e74c4b3d9531e24739e0ab832a5edf40e64fbae1a9c01941cabd7/pyyaml-6.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9c7708761fccb9397fe64bbc0395abcae8c4bf7b0eac081e12b809bf47700d0b", size = 770293, upload-time = "2025-09-25T21:31:51.828Z" },
+ { url = "https://files.pythonhosted.org/packages/8b/ef/abd085f06853af0cd59fa5f913d61a8eab65d7639ff2a658d18a25d6a89d/pyyaml-6.0.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:418cf3f2111bc80e0933b2cd8cd04f286338bb88bdc7bc8e6dd775ebde60b5e0", size = 732872, upload-time = "2025-09-25T21:31:53.282Z" },
+ { url = "https://files.pythonhosted.org/packages/1f/15/2bc9c8faf6450a8b3c9fc5448ed869c599c0a74ba2669772b1f3a0040180/pyyaml-6.0.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:5e0b74767e5f8c593e8c9b5912019159ed0533c70051e9cce3e8b6aa699fcd69", size = 758828, upload-time = "2025-09-25T21:31:54.807Z" },
+ { url = "https://files.pythonhosted.org/packages/a3/00/531e92e88c00f4333ce359e50c19b8d1de9fe8d581b1534e35ccfbc5f393/pyyaml-6.0.3-cp310-cp310-win32.whl", hash = "sha256:28c8d926f98f432f88adc23edf2e6d4921ac26fb084b028c733d01868d19007e", size = 142415, upload-time = "2025-09-25T21:31:55.885Z" },
+ { url = "https://files.pythonhosted.org/packages/2a/fa/926c003379b19fca39dd4634818b00dec6c62d87faf628d1394e137354d4/pyyaml-6.0.3-cp310-cp310-win_amd64.whl", hash = "sha256:bdb2c67c6c1390b63c6ff89f210c8fd09d9a1217a465701eac7316313c915e4c", size = 158561, upload-time = "2025-09-25T21:31:57.406Z" },
+ { url = "https://files.pythonhosted.org/packages/6d/16/a95b6757765b7b031c9374925bb718d55e0a9ba8a1b6a12d25962ea44347/pyyaml-6.0.3-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:44edc647873928551a01e7a563d7452ccdebee747728c1080d881d68af7b997e", size = 185826, upload-time = "2025-09-25T21:31:58.655Z" },
+ { url = "https://files.pythonhosted.org/packages/16/19/13de8e4377ed53079ee996e1ab0a9c33ec2faf808a4647b7b4c0d46dd239/pyyaml-6.0.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:652cb6edd41e718550aad172851962662ff2681490a8a711af6a4d288dd96824", size = 175577, upload-time = "2025-09-25T21:32:00.088Z" },
+ { url = "https://files.pythonhosted.org/packages/0c/62/d2eb46264d4b157dae1275b573017abec435397aa59cbcdab6fc978a8af4/pyyaml-6.0.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:10892704fc220243f5305762e276552a0395f7beb4dbf9b14ec8fd43b57f126c", size = 775556, upload-time = "2025-09-25T21:32:01.31Z" },
+ { url = "https://files.pythonhosted.org/packages/10/cb/16c3f2cf3266edd25aaa00d6c4350381c8b012ed6f5276675b9eba8d9ff4/pyyaml-6.0.3-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:850774a7879607d3a6f50d36d04f00ee69e7fc816450e5f7e58d7f17f1ae5c00", size = 882114, upload-time = "2025-09-25T21:32:03.376Z" },
+ { url = "https://files.pythonhosted.org/packages/71/60/917329f640924b18ff085ab889a11c763e0b573da888e8404ff486657602/pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b8bb0864c5a28024fac8a632c443c87c5aa6f215c0b126c449ae1a150412f31d", size = 806638, upload-time = "2025-09-25T21:32:04.553Z" },
+ { url = "https://files.pythonhosted.org/packages/dd/6f/529b0f316a9fd167281a6c3826b5583e6192dba792dd55e3203d3f8e655a/pyyaml-6.0.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1d37d57ad971609cf3c53ba6a7e365e40660e3be0e5175fa9f2365a379d6095a", size = 767463, upload-time = "2025-09-25T21:32:06.152Z" },
+ { url = "https://files.pythonhosted.org/packages/f2/6a/b627b4e0c1dd03718543519ffb2f1deea4a1e6d42fbab8021936a4d22589/pyyaml-6.0.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:37503bfbfc9d2c40b344d06b2199cf0e96e97957ab1c1b546fd4f87e53e5d3e4", size = 794986, upload-time = "2025-09-25T21:32:07.367Z" },
+ { url = "https://files.pythonhosted.org/packages/45/91/47a6e1c42d9ee337c4839208f30d9f09caa9f720ec7582917b264defc875/pyyaml-6.0.3-cp311-cp311-win32.whl", hash = "sha256:8098f252adfa6c80ab48096053f512f2321f0b998f98150cea9bd23d83e1467b", size = 142543, upload-time = "2025-09-25T21:32:08.95Z" },
+ { url = "https://files.pythonhosted.org/packages/da/e3/ea007450a105ae919a72393cb06f122f288ef60bba2dc64b26e2646fa315/pyyaml-6.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:9f3bfb4965eb874431221a3ff3fdcddc7e74e3b07799e0e84ca4a0f867d449bf", size = 158763, upload-time = "2025-09-25T21:32:09.96Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/33/422b98d2195232ca1826284a76852ad5a86fe23e31b009c9886b2d0fb8b2/pyyaml-6.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7f047e29dcae44602496db43be01ad42fc6f1cc0d8cd6c83d342306c32270196", size = 182063, upload-time = "2025-09-25T21:32:11.445Z" },
+ { url = "https://files.pythonhosted.org/packages/89/a0/6cf41a19a1f2f3feab0e9c0b74134aa2ce6849093d5517a0c550fe37a648/pyyaml-6.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:fc09d0aa354569bc501d4e787133afc08552722d3ab34836a80547331bb5d4a0", size = 173973, upload-time = "2025-09-25T21:32:12.492Z" },
+ { url = "https://files.pythonhosted.org/packages/ed/23/7a778b6bd0b9a8039df8b1b1d80e2e2ad78aa04171592c8a5c43a56a6af4/pyyaml-6.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9149cad251584d5fb4981be1ecde53a1ca46c891a79788c0df828d2f166bda28", size = 775116, upload-time = "2025-09-25T21:32:13.652Z" },
+ { url = "https://files.pythonhosted.org/packages/65/30/d7353c338e12baef4ecc1b09e877c1970bd3382789c159b4f89d6a70dc09/pyyaml-6.0.3-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5fdec68f91a0c6739b380c83b951e2c72ac0197ace422360e6d5a959d8d97b2c", size = 844011, upload-time = "2025-09-25T21:32:15.21Z" },
+ { url = "https://files.pythonhosted.org/packages/8b/9d/b3589d3877982d4f2329302ef98a8026e7f4443c765c46cfecc8858c6b4b/pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ba1cc08a7ccde2d2ec775841541641e4548226580ab850948cbfda66a1befcdc", size = 807870, upload-time = "2025-09-25T21:32:16.431Z" },
+ { url = "https://files.pythonhosted.org/packages/05/c0/b3be26a015601b822b97d9149ff8cb5ead58c66f981e04fedf4e762f4bd4/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8dc52c23056b9ddd46818a57b78404882310fb473d63f17b07d5c40421e47f8e", size = 761089, upload-time = "2025-09-25T21:32:17.56Z" },
+ { url = "https://files.pythonhosted.org/packages/be/8e/98435a21d1d4b46590d5459a22d88128103f8da4c2d4cb8f14f2a96504e1/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:41715c910c881bc081f1e8872880d3c650acf13dfa8214bad49ed4cede7c34ea", size = 790181, upload-time = "2025-09-25T21:32:18.834Z" },
+ { url = "https://files.pythonhosted.org/packages/74/93/7baea19427dcfbe1e5a372d81473250b379f04b1bd3c4c5ff825e2327202/pyyaml-6.0.3-cp312-cp312-win32.whl", hash = "sha256:96b533f0e99f6579b3d4d4995707cf36df9100d67e0c8303a0c55b27b5f99bc5", size = 137658, upload-time = "2025-09-25T21:32:20.209Z" },
+ { url = "https://files.pythonhosted.org/packages/86/bf/899e81e4cce32febab4fb42bb97dcdf66bc135272882d1987881a4b519e9/pyyaml-6.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:5fcd34e47f6e0b794d17de1b4ff496c00986e1c83f7ab2fb8fcfe9616ff7477b", size = 154003, upload-time = "2025-09-25T21:32:21.167Z" },
+ { url = "https://files.pythonhosted.org/packages/1a/08/67bd04656199bbb51dbed1439b7f27601dfb576fb864099c7ef0c3e55531/pyyaml-6.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:64386e5e707d03a7e172c0701abfb7e10f0fb753ee1d773128192742712a98fd", size = 140344, upload-time = "2025-09-25T21:32:22.617Z" },
+]
+
+[[package]]
+name = "pyzmq"
+version = "27.1.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "cffi", marker = "implementation_name == 'pypy'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/04/0b/3c9baedbdf613ecaa7aa07027780b8867f57b6293b6ee50de316c9f3222b/pyzmq-27.1.0.tar.gz", hash = "sha256:ac0765e3d44455adb6ddbf4417dcce460fc40a05978c08efdf2948072f6db540", size = 281750, upload-time = "2025-09-08T23:10:18.157Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/67/b9/52aa9ec2867528b54f1e60846728d8b4d84726630874fee3a91e66c7df81/pyzmq-27.1.0-cp310-cp310-macosx_10_15_universal2.whl", hash = "sha256:508e23ec9bc44c0005c4946ea013d9317ae00ac67778bd47519fdf5a0e930ff4", size = 1329850, upload-time = "2025-09-08T23:07:26.274Z" },
+ { url = "https://files.pythonhosted.org/packages/99/64/5653e7b7425b169f994835a2b2abf9486264401fdef18df91ddae47ce2cc/pyzmq-27.1.0-cp310-cp310-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:507b6f430bdcf0ee48c0d30e734ea89ce5567fd7b8a0f0044a369c176aa44556", size = 906380, upload-time = "2025-09-08T23:07:29.78Z" },
+ { url = "https://files.pythonhosted.org/packages/73/78/7d713284dbe022f6440e391bd1f3c48d9185673878034cfb3939cdf333b2/pyzmq-27.1.0-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:bf7b38f9fd7b81cb6d9391b2946382c8237fd814075c6aa9c3b746d53076023b", size = 666421, upload-time = "2025-09-08T23:07:31.263Z" },
+ { url = "https://files.pythonhosted.org/packages/30/76/8f099f9d6482450428b17c4d6b241281af7ce6a9de8149ca8c1c649f6792/pyzmq-27.1.0-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:03ff0b279b40d687691a6217c12242ee71f0fba28bf8626ff50e3ef0f4410e1e", size = 854149, upload-time = "2025-09-08T23:07:33.17Z" },
+ { url = "https://files.pythonhosted.org/packages/59/f0/37fbfff06c68016019043897e4c969ceab18bde46cd2aca89821fcf4fb2e/pyzmq-27.1.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:677e744fee605753eac48198b15a2124016c009a11056f93807000ab11ce6526", size = 1655070, upload-time = "2025-09-08T23:07:35.205Z" },
+ { url = "https://files.pythonhosted.org/packages/47/14/7254be73f7a8edc3587609554fcaa7bfd30649bf89cd260e4487ca70fdaa/pyzmq-27.1.0-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:dd2fec2b13137416a1c5648b7009499bcc8fea78154cd888855fa32514f3dad1", size = 2033441, upload-time = "2025-09-08T23:07:37.432Z" },
+ { url = "https://files.pythonhosted.org/packages/22/dc/49f2be26c6f86f347e796a4d99b19167fc94503f0af3fd010ad262158822/pyzmq-27.1.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:08e90bb4b57603b84eab1d0ca05b3bbb10f60c1839dc471fc1c9e1507bef3386", size = 1891529, upload-time = "2025-09-08T23:07:39.047Z" },
+ { url = "https://files.pythonhosted.org/packages/a3/3e/154fb963ae25be70c0064ce97776c937ecc7d8b0259f22858154a9999769/pyzmq-27.1.0-cp310-cp310-win32.whl", hash = "sha256:a5b42d7a0658b515319148875fcb782bbf118dd41c671b62dae33666c2213bda", size = 567276, upload-time = "2025-09-08T23:07:40.695Z" },
+ { url = "https://files.pythonhosted.org/packages/62/b2/f4ab56c8c595abcb26b2be5fd9fa9e6899c1e5ad54964e93ae8bb35482be/pyzmq-27.1.0-cp310-cp310-win_amd64.whl", hash = "sha256:c0bb87227430ee3aefcc0ade2088100e528d5d3298a0a715a64f3d04c60ba02f", size = 632208, upload-time = "2025-09-08T23:07:42.298Z" },
+ { url = "https://files.pythonhosted.org/packages/3b/e3/be2cc7ab8332bdac0522fdb64c17b1b6241a795bee02e0196636ec5beb79/pyzmq-27.1.0-cp310-cp310-win_arm64.whl", hash = "sha256:9a916f76c2ab8d045b19f2286851a38e9ac94ea91faf65bd64735924522a8b32", size = 559766, upload-time = "2025-09-08T23:07:43.869Z" },
+ { url = "https://files.pythonhosted.org/packages/06/5d/305323ba86b284e6fcb0d842d6adaa2999035f70f8c38a9b6d21ad28c3d4/pyzmq-27.1.0-cp311-cp311-macosx_10_15_universal2.whl", hash = "sha256:226b091818d461a3bef763805e75685e478ac17e9008f49fce2d3e52b3d58b86", size = 1333328, upload-time = "2025-09-08T23:07:45.946Z" },
+ { url = "https://files.pythonhosted.org/packages/bd/a0/fc7e78a23748ad5443ac3275943457e8452da67fda347e05260261108cbc/pyzmq-27.1.0-cp311-cp311-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:0790a0161c281ca9723f804871b4027f2e8b5a528d357c8952d08cd1a9c15581", size = 908803, upload-time = "2025-09-08T23:07:47.551Z" },
+ { url = "https://files.pythonhosted.org/packages/7e/22/37d15eb05f3bdfa4abea6f6d96eb3bb58585fbd3e4e0ded4e743bc650c97/pyzmq-27.1.0-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c895a6f35476b0c3a54e3eb6ccf41bf3018de937016e6e18748317f25d4e925f", size = 668836, upload-time = "2025-09-08T23:07:49.436Z" },
+ { url = "https://files.pythonhosted.org/packages/b1/c4/2a6fe5111a01005fc7af3878259ce17684fabb8852815eda6225620f3c59/pyzmq-27.1.0-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5bbf8d3630bf96550b3be8e1fc0fea5cbdc8d5466c1192887bd94869da17a63e", size = 857038, upload-time = "2025-09-08T23:07:51.234Z" },
+ { url = "https://files.pythonhosted.org/packages/cb/eb/bfdcb41d0db9cd233d6fb22dc131583774135505ada800ebf14dfb0a7c40/pyzmq-27.1.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:15c8bd0fe0dabf808e2d7a681398c4e5ded70a551ab47482067a572c054c8e2e", size = 1657531, upload-time = "2025-09-08T23:07:52.795Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/21/e3180ca269ed4a0de5c34417dfe71a8ae80421198be83ee619a8a485b0c7/pyzmq-27.1.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:bafcb3dd171b4ae9f19ee6380dfc71ce0390fefaf26b504c0e5f628d7c8c54f2", size = 2034786, upload-time = "2025-09-08T23:07:55.047Z" },
+ { url = "https://files.pythonhosted.org/packages/3b/b1/5e21d0b517434b7f33588ff76c177c5a167858cc38ef740608898cd329f2/pyzmq-27.1.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:e829529fcaa09937189178115c49c504e69289abd39967cd8a4c215761373394", size = 1894220, upload-time = "2025-09-08T23:07:57.172Z" },
+ { url = "https://files.pythonhosted.org/packages/03/f2/44913a6ff6941905efc24a1acf3d3cb6146b636c546c7406c38c49c403d4/pyzmq-27.1.0-cp311-cp311-win32.whl", hash = "sha256:6df079c47d5902af6db298ec92151db82ecb557af663098b92f2508c398bb54f", size = 567155, upload-time = "2025-09-08T23:07:59.05Z" },
+ { url = "https://files.pythonhosted.org/packages/23/6d/d8d92a0eb270a925c9b4dd039c0b4dc10abc2fcbc48331788824ef113935/pyzmq-27.1.0-cp311-cp311-win_amd64.whl", hash = "sha256:190cbf120fbc0fc4957b56866830def56628934a9d112aec0e2507aa6a032b97", size = 633428, upload-time = "2025-09-08T23:08:00.663Z" },
+ { url = "https://files.pythonhosted.org/packages/ae/14/01afebc96c5abbbd713ecfc7469cfb1bc801c819a74ed5c9fad9a48801cb/pyzmq-27.1.0-cp311-cp311-win_arm64.whl", hash = "sha256:eca6b47df11a132d1745eb3b5b5e557a7dae2c303277aa0e69c6ba91b8736e07", size = 559497, upload-time = "2025-09-08T23:08:02.15Z" },
+ { url = "https://files.pythonhosted.org/packages/92/e7/038aab64a946d535901103da16b953c8c9cc9c961dadcbf3609ed6428d23/pyzmq-27.1.0-cp312-abi3-macosx_10_15_universal2.whl", hash = "sha256:452631b640340c928fa343801b0d07eb0c3789a5ffa843f6e1a9cee0ba4eb4fc", size = 1306279, upload-time = "2025-09-08T23:08:03.807Z" },
+ { url = "https://files.pythonhosted.org/packages/e8/5e/c3c49fdd0f535ef45eefcc16934648e9e59dace4a37ee88fc53f6cd8e641/pyzmq-27.1.0-cp312-abi3-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:1c179799b118e554b66da67d88ed66cd37a169f1f23b5d9f0a231b4e8d44a113", size = 895645, upload-time = "2025-09-08T23:08:05.301Z" },
+ { url = "https://files.pythonhosted.org/packages/f8/e5/b0b2504cb4e903a74dcf1ebae157f9e20ebb6ea76095f6cfffea28c42ecd/pyzmq-27.1.0-cp312-abi3-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3837439b7f99e60312f0c926a6ad437b067356dc2bc2ec96eb395fd0fe804233", size = 652574, upload-time = "2025-09-08T23:08:06.828Z" },
+ { url = "https://files.pythonhosted.org/packages/f8/9b/c108cdb55560eaf253f0cbdb61b29971e9fb34d9c3499b0e96e4e60ed8a5/pyzmq-27.1.0-cp312-abi3-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:43ad9a73e3da1fab5b0e7e13402f0b2fb934ae1c876c51d0afff0e7c052eca31", size = 840995, upload-time = "2025-09-08T23:08:08.396Z" },
+ { url = "https://files.pythonhosted.org/packages/c2/bb/b79798ca177b9eb0825b4c9998c6af8cd2a7f15a6a1a4272c1d1a21d382f/pyzmq-27.1.0-cp312-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:0de3028d69d4cdc475bfe47a6128eb38d8bc0e8f4d69646adfbcd840facbac28", size = 1642070, upload-time = "2025-09-08T23:08:09.989Z" },
+ { url = "https://files.pythonhosted.org/packages/9c/80/2df2e7977c4ede24c79ae39dcef3899bfc5f34d1ca7a5b24f182c9b7a9ca/pyzmq-27.1.0-cp312-abi3-musllinux_1_2_i686.whl", hash = "sha256:cf44a7763aea9298c0aa7dbf859f87ed7012de8bda0f3977b6fb1d96745df856", size = 2021121, upload-time = "2025-09-08T23:08:11.907Z" },
+ { url = "https://files.pythonhosted.org/packages/46/bd/2d45ad24f5f5ae7e8d01525eb76786fa7557136555cac7d929880519e33a/pyzmq-27.1.0-cp312-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:f30f395a9e6fbca195400ce833c731e7b64c3919aa481af4d88c3759e0cb7496", size = 1878550, upload-time = "2025-09-08T23:08:13.513Z" },
+ { url = "https://files.pythonhosted.org/packages/e6/2f/104c0a3c778d7c2ab8190e9db4f62f0b6957b53c9d87db77c284b69f33ea/pyzmq-27.1.0-cp312-abi3-win32.whl", hash = "sha256:250e5436a4ba13885494412b3da5d518cd0d3a278a1ae640e113c073a5f88edd", size = 559184, upload-time = "2025-09-08T23:08:15.163Z" },
+ { url = "https://files.pythonhosted.org/packages/fc/7f/a21b20d577e4100c6a41795842028235998a643b1ad406a6d4163ea8f53e/pyzmq-27.1.0-cp312-abi3-win_amd64.whl", hash = "sha256:9ce490cf1d2ca2ad84733aa1d69ce6855372cb5ce9223802450c9b2a7cba0ccf", size = 619480, upload-time = "2025-09-08T23:08:17.192Z" },
+ { url = "https://files.pythonhosted.org/packages/78/c2/c012beae5f76b72f007a9e91ee9401cb88c51d0f83c6257a03e785c81cc2/pyzmq-27.1.0-cp312-abi3-win_arm64.whl", hash = "sha256:75a2f36223f0d535a0c919e23615fc85a1e23b71f40c7eb43d7b1dedb4d8f15f", size = 552993, upload-time = "2025-09-08T23:08:18.926Z" },
+ { url = "https://files.pythonhosted.org/packages/f3/81/a65e71c1552f74dec9dff91d95bafb6e0d33338a8dfefbc88aa562a20c92/pyzmq-27.1.0-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:c17e03cbc9312bee223864f1a2b13a99522e0dc9f7c5df0177cd45210ac286e6", size = 836266, upload-time = "2025-09-08T23:09:40.048Z" },
+ { url = "https://files.pythonhosted.org/packages/58/ed/0202ca350f4f2b69faa95c6d931e3c05c3a397c184cacb84cb4f8f42f287/pyzmq-27.1.0-pp310-pypy310_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:f328d01128373cb6763823b2b4e7f73bdf767834268c565151eacb3b7a392f90", size = 800206, upload-time = "2025-09-08T23:09:41.902Z" },
+ { url = "https://files.pythonhosted.org/packages/47/42/1ff831fa87fe8f0a840ddb399054ca0009605d820e2b44ea43114f5459f4/pyzmq-27.1.0-pp310-pypy310_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9c1790386614232e1b3a40a958454bdd42c6d1811837b15ddbb052a032a43f62", size = 567747, upload-time = "2025-09-08T23:09:43.741Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/db/5c4d6807434751e3f21231bee98109aa57b9b9b55e058e450d0aef59b70f/pyzmq-27.1.0-pp310-pypy310_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:448f9cb54eb0cee4732b46584f2710c8bc178b0e5371d9e4fc8125201e413a74", size = 747371, upload-time = "2025-09-08T23:09:45.575Z" },
+ { url = "https://files.pythonhosted.org/packages/26/af/78ce193dbf03567eb8c0dc30e3df2b9e56f12a670bf7eb20f9fb532c7e8a/pyzmq-27.1.0-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:05b12f2d32112bf8c95ef2e74ec4f1d4beb01f8b5e703b38537f8849f92cb9ba", size = 544862, upload-time = "2025-09-08T23:09:47.448Z" },
+ { url = "https://files.pythonhosted.org/packages/4c/c6/c4dcdecdbaa70969ee1fdced6d7b8f60cfabe64d25361f27ac4665a70620/pyzmq-27.1.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:18770c8d3563715387139060d37859c02ce40718d1faf299abddcdcc6a649066", size = 836265, upload-time = "2025-09-08T23:09:49.376Z" },
+ { url = "https://files.pythonhosted.org/packages/3e/79/f38c92eeaeb03a2ccc2ba9866f0439593bb08c5e3b714ac1d553e5c96e25/pyzmq-27.1.0-pp311-pypy311_pp73-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:ac25465d42f92e990f8d8b0546b01c391ad431c3bf447683fdc40565941d0604", size = 800208, upload-time = "2025-09-08T23:09:51.073Z" },
+ { url = "https://files.pythonhosted.org/packages/49/0e/3f0d0d335c6b3abb9b7b723776d0b21fa7f3a6c819a0db6097059aada160/pyzmq-27.1.0-pp311-pypy311_pp73-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:53b40f8ae006f2734ee7608d59ed661419f087521edbfc2149c3932e9c14808c", size = 567747, upload-time = "2025-09-08T23:09:52.698Z" },
+ { url = "https://files.pythonhosted.org/packages/a1/cf/f2b3784d536250ffd4be70e049f3b60981235d70c6e8ce7e3ef21e1adb25/pyzmq-27.1.0-pp311-pypy311_pp73-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f605d884e7c8be8fe1aa94e0a783bf3f591b84c24e4bc4f3e7564c82ac25e271", size = 747371, upload-time = "2025-09-08T23:09:54.563Z" },
+ { url = "https://files.pythonhosted.org/packages/01/1b/5dbe84eefc86f48473947e2f41711aded97eecef1231f4558f1f02713c12/pyzmq-27.1.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:c9f7f6e13dff2e44a6afeaf2cf54cee5929ad64afaf4d40b50f93c58fc687355", size = 544862, upload-time = "2025-09-08T23:09:56.509Z" },
+]
+
+[[package]]
+name = "referencing"
+version = "0.37.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "attrs" },
+ { name = "rpds-py" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/22/f5/df4e9027acead3ecc63e50fe1e36aca1523e1719559c499951bb4b53188f/referencing-0.37.0.tar.gz", hash = "sha256:44aefc3142c5b842538163acb373e24cce6632bd54bdb01b21ad5863489f50d8", size = 78036, upload-time = "2025-10-13T15:30:48.871Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/2c/58/ca301544e1fa93ed4f80d724bf5b194f6e4b945841c5bfd555878eea9fcb/referencing-0.37.0-py3-none-any.whl", hash = "sha256:381329a9f99628c9069361716891d34ad94af76e461dcb0335825aecc7692231", size = 26766, upload-time = "2025-10-13T15:30:47.625Z" },
+]
+
+[[package]]
+name = "regex"
+version = "2026.1.15"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/0b/86/07d5056945f9ec4590b518171c4254a5925832eb727b56d3c38a7476f316/regex-2026.1.15.tar.gz", hash = "sha256:164759aa25575cbc0651bef59a0b18353e54300d79ace8084c818ad8ac72b7d5", size = 414811, upload-time = "2026-01-14T23:18:02.775Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ea/d2/e6ee96b7dff201a83f650241c52db8e5bd080967cb93211f57aa448dc9d6/regex-2026.1.15-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:4e3dd93c8f9abe8aa4b6c652016da9a3afa190df5ad822907efe6b206c09896e", size = 488166, upload-time = "2026-01-14T23:13:46.408Z" },
+ { url = "https://files.pythonhosted.org/packages/23/8a/819e9ce14c9f87af026d0690901b3931f3101160833e5d4c8061fa3a1b67/regex-2026.1.15-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:97499ff7862e868b1977107873dd1a06e151467129159a6ffd07b66706ba3a9f", size = 290632, upload-time = "2026-01-14T23:13:48.688Z" },
+ { url = "https://files.pythonhosted.org/packages/d5/c3/23dfe15af25d1d45b07dfd4caa6003ad710dcdcb4c4b279909bdfe7a2de8/regex-2026.1.15-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:0bda75ebcac38d884240914c6c43d8ab5fb82e74cde6da94b43b17c411aa4c2b", size = 288500, upload-time = "2026-01-14T23:13:50.503Z" },
+ { url = "https://files.pythonhosted.org/packages/c6/31/1adc33e2f717df30d2f4d973f8776d2ba6ecf939301efab29fca57505c95/regex-2026.1.15-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7dcc02368585334f5bc81fc73a2a6a0bbade60e7d83da21cead622faf408f32c", size = 781670, upload-time = "2026-01-14T23:13:52.453Z" },
+ { url = "https://files.pythonhosted.org/packages/23/ce/21a8a22d13bc4adcb927c27b840c948f15fc973e21ed2346c1bd0eae22dc/regex-2026.1.15-cp310-cp310-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:693b465171707bbe882a7a05de5e866f33c76aa449750bee94a8d90463533cc9", size = 850820, upload-time = "2026-01-14T23:13:54.894Z" },
+ { url = "https://files.pythonhosted.org/packages/6c/4f/3eeacdf587a4705a44484cd0b30e9230a0e602811fb3e2cc32268c70d509/regex-2026.1.15-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:b0d190e6f013ea938623a58706d1469a62103fb2a241ce2873a9906e0386582c", size = 898777, upload-time = "2026-01-14T23:13:56.908Z" },
+ { url = "https://files.pythonhosted.org/packages/79/a9/1898a077e2965c35fc22796488141a22676eed2d73701e37c73ad7c0b459/regex-2026.1.15-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5ff818702440a5878a81886f127b80127f5d50563753a28211482867f8318106", size = 791750, upload-time = "2026-01-14T23:13:58.527Z" },
+ { url = "https://files.pythonhosted.org/packages/4c/84/e31f9d149a178889b3817212827f5e0e8c827a049ff31b4b381e76b26e2d/regex-2026.1.15-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:f052d1be37ef35a54e394de66136e30fa1191fab64f71fc06ac7bc98c9a84618", size = 782674, upload-time = "2026-01-14T23:13:59.874Z" },
+ { url = "https://files.pythonhosted.org/packages/d2/ff/adf60063db24532add6a1676943754a5654dcac8237af024ede38244fd12/regex-2026.1.15-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:6bfc31a37fd1592f0c4fc4bfc674b5c42e52efe45b4b7a6a14f334cca4bcebe4", size = 767906, upload-time = "2026-01-14T23:14:01.298Z" },
+ { url = "https://files.pythonhosted.org/packages/af/3e/e6a216cee1e2780fec11afe7fc47b6f3925d7264e8149c607ac389fd9b1a/regex-2026.1.15-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:3d6ce5ae80066b319ae3bc62fd55a557c9491baa5efd0d355f0de08c4ba54e79", size = 774798, upload-time = "2026-01-14T23:14:02.715Z" },
+ { url = "https://files.pythonhosted.org/packages/0f/98/23a4a8378a9208514ed3efc7e7850c27fa01e00ed8557c958df0335edc4a/regex-2026.1.15-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:1704d204bd42b6bb80167df0e4554f35c255b579ba99616def38f69e14a5ccb9", size = 845861, upload-time = "2026-01-14T23:14:04.824Z" },
+ { url = "https://files.pythonhosted.org/packages/f8/57/d7605a9d53bd07421a8785d349cd29677fe660e13674fa4c6cbd624ae354/regex-2026.1.15-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:e3174a5ed4171570dc8318afada56373aa9289eb6dc0d96cceb48e7358b0e220", size = 755648, upload-time = "2026-01-14T23:14:06.371Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/76/6f2e24aa192da1e299cc1101674a60579d3912391867ce0b946ba83e2194/regex-2026.1.15-cp310-cp310-musllinux_1_2_s390x.whl", hash = "sha256:87adf5bd6d72e3e17c9cb59ac4096b1faaf84b7eb3037a5ffa61c4b4370f0f13", size = 836250, upload-time = "2026-01-14T23:14:08.343Z" },
+ { url = "https://files.pythonhosted.org/packages/11/3a/1f2a1d29453299a7858eab7759045fc3d9d1b429b088dec2dc85b6fa16a2/regex-2026.1.15-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:e85dc94595f4d766bd7d872a9de5ede1ca8d3063f3bdf1e2c725f5eb411159e3", size = 779919, upload-time = "2026-01-14T23:14:09.954Z" },
+ { url = "https://files.pythonhosted.org/packages/c0/67/eab9bc955c9dcc58e9b222c801e39cff7ca0b04261792a2149166ce7e792/regex-2026.1.15-cp310-cp310-win32.whl", hash = "sha256:21ca32c28c30d5d65fc9886ff576fc9b59bbca08933e844fa2363e530f4c8218", size = 265888, upload-time = "2026-01-14T23:14:11.35Z" },
+ { url = "https://files.pythonhosted.org/packages/1d/62/31d16ae24e1f8803bddb0885508acecaec997fcdcde9c243787103119ae4/regex-2026.1.15-cp310-cp310-win_amd64.whl", hash = "sha256:3038a62fc7d6e5547b8915a3d927a0fbeef84cdbe0b1deb8c99bbd4a8961b52a", size = 277830, upload-time = "2026-01-14T23:14:12.908Z" },
+ { url = "https://files.pythonhosted.org/packages/e5/36/5d9972bccd6417ecd5a8be319cebfd80b296875e7f116c37fb2a2deecebf/regex-2026.1.15-cp310-cp310-win_arm64.whl", hash = "sha256:505831646c945e3e63552cc1b1b9b514f0e93232972a2d5bedbcc32f15bc82e3", size = 270376, upload-time = "2026-01-14T23:14:14.782Z" },
+ { url = "https://files.pythonhosted.org/packages/d0/c9/0c80c96eab96948363d270143138d671d5731c3a692b417629bf3492a9d6/regex-2026.1.15-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:1ae6020fb311f68d753b7efa9d4b9a5d47a5d6466ea0d5e3b5a471a960ea6e4a", size = 488168, upload-time = "2026-01-14T23:14:16.129Z" },
+ { url = "https://files.pythonhosted.org/packages/17/f0/271c92f5389a552494c429e5cc38d76d1322eb142fb5db3c8ccc47751468/regex-2026.1.15-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:eddf73f41225942c1f994914742afa53dc0d01a6e20fe14b878a1b1edc74151f", size = 290636, upload-time = "2026-01-14T23:14:17.715Z" },
+ { url = "https://files.pythonhosted.org/packages/a0/f9/5f1fd077d106ca5655a0f9ff8f25a1ab55b92128b5713a91ed7134ff688e/regex-2026.1.15-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:1e8cd52557603f5c66a548f69421310886b28b7066853089e1a71ee710e1cdc1", size = 288496, upload-time = "2026-01-14T23:14:19.326Z" },
+ { url = "https://files.pythonhosted.org/packages/b5/e1/8f43b03a4968c748858ec77f746c286d81f896c2e437ccf050ebc5d3128c/regex-2026.1.15-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5170907244b14303edc5978f522f16c974f32d3aa92109fabc2af52411c9433b", size = 793503, upload-time = "2026-01-14T23:14:20.922Z" },
+ { url = "https://files.pythonhosted.org/packages/8d/4e/a39a5e8edc5377a46a7c875c2f9a626ed3338cb3bb06931be461c3e1a34a/regex-2026.1.15-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:2748c1ec0663580b4510bd89941a31560b4b439a0b428b49472a3d9944d11cd8", size = 860535, upload-time = "2026-01-14T23:14:22.405Z" },
+ { url = "https://files.pythonhosted.org/packages/dc/1c/9dce667a32a9477f7a2869c1c767dc00727284a9fa3ff5c09a5c6c03575e/regex-2026.1.15-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:2f2775843ca49360508d080eaa87f94fa248e2c946bbcd963bb3aae14f333413", size = 907225, upload-time = "2026-01-14T23:14:23.897Z" },
+ { url = "https://files.pythonhosted.org/packages/a4/3c/87ca0a02736d16b6262921425e84b48984e77d8e4e572c9072ce96e66c30/regex-2026.1.15-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d9ea2604370efc9a174c1b5dcc81784fb040044232150f7f33756049edfc9026", size = 800526, upload-time = "2026-01-14T23:14:26.039Z" },
+ { url = "https://files.pythonhosted.org/packages/4b/ff/647d5715aeea7c87bdcbd2f578f47b415f55c24e361e639fe8c0cc88878f/regex-2026.1.15-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:0dcd31594264029b57bf16f37fd7248a70b3b764ed9e0839a8f271b2d22c0785", size = 773446, upload-time = "2026-01-14T23:14:28.109Z" },
+ { url = "https://files.pythonhosted.org/packages/af/89/bf22cac25cb4ba0fe6bff52ebedbb65b77a179052a9d6037136ae93f42f4/regex-2026.1.15-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:c08c1f3e34338256732bd6938747daa3c0d5b251e04b6e43b5813e94d503076e", size = 783051, upload-time = "2026-01-14T23:14:29.929Z" },
+ { url = "https://files.pythonhosted.org/packages/1e/f4/6ed03e71dca6348a5188363a34f5e26ffd5db1404780288ff0d79513bce4/regex-2026.1.15-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:e43a55f378df1e7a4fa3547c88d9a5a9b7113f653a66821bcea4718fe6c58763", size = 854485, upload-time = "2026-01-14T23:14:31.366Z" },
+ { url = "https://files.pythonhosted.org/packages/d9/9a/8e8560bd78caded8eb137e3e47612430a05b9a772caf60876435192d670a/regex-2026.1.15-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:f82110ab962a541737bd0ce87978d4c658f06e7591ba899192e2712a517badbb", size = 762195, upload-time = "2026-01-14T23:14:32.802Z" },
+ { url = "https://files.pythonhosted.org/packages/38/6b/61fc710f9aa8dfcd764fe27d37edfaa023b1a23305a0d84fccd5adb346ea/regex-2026.1.15-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:27618391db7bdaf87ac6c92b31e8f0dfb83a9de0075855152b720140bda177a2", size = 845986, upload-time = "2026-01-14T23:14:34.898Z" },
+ { url = "https://files.pythonhosted.org/packages/fd/2e/fbee4cb93f9d686901a7ca8d94285b80405e8c34fe4107f63ffcbfb56379/regex-2026.1.15-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:bfb0d6be01fbae8d6655c8ca21b3b72458606c4aec9bbc932db758d47aba6db1", size = 788992, upload-time = "2026-01-14T23:14:37.116Z" },
+ { url = "https://files.pythonhosted.org/packages/ed/14/3076348f3f586de64b1ab75a3fbabdaab7684af7f308ad43be7ef1849e55/regex-2026.1.15-cp311-cp311-win32.whl", hash = "sha256:b10e42a6de0e32559a92f2f8dc908478cc0fa02838d7dbe764c44dca3fa13569", size = 265893, upload-time = "2026-01-14T23:14:38.426Z" },
+ { url = "https://files.pythonhosted.org/packages/0f/19/772cf8b5fc803f5c89ba85d8b1870a1ca580dc482aa030383a9289c82e44/regex-2026.1.15-cp311-cp311-win_amd64.whl", hash = "sha256:e9bf3f0bbdb56633c07d7116ae60a576f846efdd86a8848f8d62b749e1209ca7", size = 277840, upload-time = "2026-01-14T23:14:39.785Z" },
+ { url = "https://files.pythonhosted.org/packages/78/84/d05f61142709474da3c0853222d91086d3e1372bcdab516c6fd8d80f3297/regex-2026.1.15-cp311-cp311-win_arm64.whl", hash = "sha256:41aef6f953283291c4e4e6850607bd71502be67779586a61472beacb315c97ec", size = 270374, upload-time = "2026-01-14T23:14:41.592Z" },
+ { url = "https://files.pythonhosted.org/packages/92/81/10d8cf43c807d0326efe874c1b79f22bfb0fb226027b0b19ebc26d301408/regex-2026.1.15-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:4c8fcc5793dde01641a35905d6731ee1548f02b956815f8f1cab89e515a5bdf1", size = 489398, upload-time = "2026-01-14T23:14:43.741Z" },
+ { url = "https://files.pythonhosted.org/packages/90/b0/7c2a74e74ef2a7c32de724658a69a862880e3e4155cba992ba04d1c70400/regex-2026.1.15-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:bfd876041a956e6a90ad7cdb3f6a630c07d491280bfeed4544053cd434901681", size = 291339, upload-time = "2026-01-14T23:14:45.183Z" },
+ { url = "https://files.pythonhosted.org/packages/19/4d/16d0773d0c818417f4cc20aa0da90064b966d22cd62a8c46765b5bd2d643/regex-2026.1.15-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:9250d087bc92b7d4899ccd5539a1b2334e44eee85d848c4c1aef8e221d3f8c8f", size = 289003, upload-time = "2026-01-14T23:14:47.25Z" },
+ { url = "https://files.pythonhosted.org/packages/c6/e4/1fc4599450c9f0863d9406e944592d968b8d6dfd0d552a7d569e43bceada/regex-2026.1.15-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c8a154cf6537ebbc110e24dabe53095e714245c272da9c1be05734bdad4a61aa", size = 798656, upload-time = "2026-01-14T23:14:48.77Z" },
+ { url = "https://files.pythonhosted.org/packages/b2/e6/59650d73a73fa8a60b3a590545bfcf1172b4384a7df2e7fe7b9aab4e2da9/regex-2026.1.15-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:8050ba2e3ea1d8731a549e83c18d2f0999fbc99a5f6bd06b4c91449f55291804", size = 864252, upload-time = "2026-01-14T23:14:50.528Z" },
+ { url = "https://files.pythonhosted.org/packages/6e/ab/1d0f4d50a1638849a97d731364c9a80fa304fec46325e48330c170ee8e80/regex-2026.1.15-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:0bf065240704cb8951cc04972cf107063917022511273e0969bdb34fc173456c", size = 912268, upload-time = "2026-01-14T23:14:52.952Z" },
+ { url = "https://files.pythonhosted.org/packages/dd/df/0d722c030c82faa1d331d1921ee268a4e8fb55ca8b9042c9341c352f17fa/regex-2026.1.15-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c32bef3e7aeee75746748643667668ef941d28b003bfc89994ecf09a10f7a1b5", size = 803589, upload-time = "2026-01-14T23:14:55.182Z" },
+ { url = "https://files.pythonhosted.org/packages/66/23/33289beba7ccb8b805c6610a8913d0131f834928afc555b241caabd422a9/regex-2026.1.15-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:d5eaa4a4c5b1906bd0d2508d68927f15b81821f85092e06f1a34a4254b0e1af3", size = 775700, upload-time = "2026-01-14T23:14:56.707Z" },
+ { url = "https://files.pythonhosted.org/packages/e7/65/bf3a42fa6897a0d3afa81acb25c42f4b71c274f698ceabd75523259f6688/regex-2026.1.15-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:86c1077a3cc60d453d4084d5b9649065f3bf1184e22992bd322e1f081d3117fb", size = 787928, upload-time = "2026-01-14T23:14:58.312Z" },
+ { url = "https://files.pythonhosted.org/packages/f4/f5/13bf65864fc314f68cdd6d8ca94adcab064d4d39dbd0b10fef29a9da48fc/regex-2026.1.15-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:2b091aefc05c78d286657cd4db95f2e6313375ff65dcf085e42e4c04d9c8d410", size = 858607, upload-time = "2026-01-14T23:15:00.657Z" },
+ { url = "https://files.pythonhosted.org/packages/a3/31/040e589834d7a439ee43fb0e1e902bc81bd58a5ba81acffe586bb3321d35/regex-2026.1.15-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:57e7d17f59f9ebfa9667e6e5a1c0127b96b87cb9cede8335482451ed00788ba4", size = 763729, upload-time = "2026-01-14T23:15:02.248Z" },
+ { url = "https://files.pythonhosted.org/packages/9b/84/6921e8129687a427edf25a34a5594b588b6d88f491320b9de5b6339a4fcb/regex-2026.1.15-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:c6c4dcdfff2c08509faa15d36ba7e5ef5fcfab25f1e8f85a0c8f45bc3a30725d", size = 850697, upload-time = "2026-01-14T23:15:03.878Z" },
+ { url = "https://files.pythonhosted.org/packages/8a/87/3d06143d4b128f4229158f2de5de6c8f2485170c7221e61bf381313314b2/regex-2026.1.15-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:cf8ff04c642716a7f2048713ddc6278c5fd41faa3b9cab12607c7abecd012c22", size = 789849, upload-time = "2026-01-14T23:15:06.102Z" },
+ { url = "https://files.pythonhosted.org/packages/77/69/c50a63842b6bd48850ebc7ab22d46e7a2a32d824ad6c605b218441814639/regex-2026.1.15-cp312-cp312-win32.whl", hash = "sha256:82345326b1d8d56afbe41d881fdf62f1926d7264b2fc1537f99ae5da9aad7913", size = 266279, upload-time = "2026-01-14T23:15:07.678Z" },
+ { url = "https://files.pythonhosted.org/packages/f2/36/39d0b29d087e2b11fd8191e15e81cce1b635fcc845297c67f11d0d19274d/regex-2026.1.15-cp312-cp312-win_amd64.whl", hash = "sha256:4def140aa6156bc64ee9912383d4038f3fdd18fee03a6f222abd4de6357ce42a", size = 277166, upload-time = "2026-01-14T23:15:09.257Z" },
+ { url = "https://files.pythonhosted.org/packages/28/32/5b8e476a12262748851fa8ab1b0be540360692325975b094e594dfebbb52/regex-2026.1.15-cp312-cp312-win_arm64.whl", hash = "sha256:c6c565d9a6e1a8d783c1948937ffc377dd5771e83bd56de8317c450a954d2056", size = 270415, upload-time = "2026-01-14T23:15:10.743Z" },
+]
+
+[[package]]
+name = "requests"
+version = "2.32.5"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "certifi" },
+ { name = "charset-normalizer" },
+ { name = "idna" },
+ { name = "urllib3" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c9/74/b3ff8e6c8446842c3f5c837e9c3dfcfe2018ea6ecef224c710c85ef728f4/requests-2.32.5.tar.gz", hash = "sha256:dbba0bac56e100853db0ea71b82b4dfd5fe2bf6d3754a8893c3af500cec7d7cf", size = 134517, upload-time = "2025-08-18T20:46:02.573Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl", hash = "sha256:2462f94637a34fd532264295e186976db0f5d453d1cdd31473c85a6a161affb6", size = 64738, upload-time = "2025-08-18T20:46:00.542Z" },
+]
+
+[[package]]
+name = "rfc3339-validator"
+version = "0.1.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "six" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/28/ea/a9387748e2d111c3c2b275ba970b735e04e15cdb1eb30693b6b5708c4dbd/rfc3339_validator-0.1.4.tar.gz", hash = "sha256:138a2abdf93304ad60530167e51d2dfb9549521a836871b88d7f4695d0022f6b", size = 5513, upload-time = "2021-05-12T16:37:54.178Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/7b/44/4e421b96b67b2daff264473f7465db72fbdf36a07e05494f50300cc7b0c6/rfc3339_validator-0.1.4-py2.py3-none-any.whl", hash = "sha256:24f6ec1eda14ef823da9e36ec7113124b39c04d50a4d3d3a3c2859577e7791fa", size = 3490, upload-time = "2021-05-12T16:37:52.536Z" },
+]
+
+[[package]]
+name = "rfc3986-validator"
+version = "0.1.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/da/88/f270de456dd7d11dcc808abfa291ecdd3f45ff44e3b549ffa01b126464d0/rfc3986_validator-0.1.1.tar.gz", hash = "sha256:3d44bde7921b3b9ec3ae4e3adca370438eccebc676456449b145d533b240d055", size = 6760, upload-time = "2019-10-28T16:00:19.144Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9e/51/17023c0f8f1869d8806b979a2bffa3f861f26a3f1a66b094288323fba52f/rfc3986_validator-0.1.1-py2.py3-none-any.whl", hash = "sha256:2f235c432ef459970b4306369336b9d5dbdda31b510ca1e327636e01f528bfa9", size = 4242, upload-time = "2019-10-28T16:00:13.976Z" },
+]
+
+[[package]]
+name = "rfc3987-syntax"
+version = "1.1.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "lark" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/2c/06/37c1a5557acf449e8e406a830a05bf885ac47d33270aec454ef78675008d/rfc3987_syntax-1.1.0.tar.gz", hash = "sha256:717a62cbf33cffdd16dfa3a497d81ce48a660ea691b1ddd7be710c22f00b4a0d", size = 14239, upload-time = "2025-07-18T01:05:05.015Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/7e/71/44ce230e1b7fadd372515a97e32a83011f906ddded8d03e3c6aafbdedbb7/rfc3987_syntax-1.1.0-py3-none-any.whl", hash = "sha256:6c3d97604e4c5ce9f714898e05401a0445a641cfa276432b0a648c80856f6a3f", size = 8046, upload-time = "2025-07-18T01:05:03.843Z" },
+]
+
+[[package]]
+name = "rpds-py"
+version = "0.30.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/20/af/3f2f423103f1113b36230496629986e0ef7e199d2aa8392452b484b38ced/rpds_py-0.30.0.tar.gz", hash = "sha256:dd8ff7cf90014af0c0f787eea34794ebf6415242ee1d6fa91eaba725cc441e84", size = 69469, upload-time = "2025-11-30T20:24:38.837Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/06/0c/0c411a0ec64ccb6d104dcabe0e713e05e153a9a2c3c2bd2b32ce412166fe/rpds_py-0.30.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:679ae98e00c0e8d68a7fda324e16b90fd5260945b45d3b824c892cec9eea3288", size = 370490, upload-time = "2025-11-30T20:21:33.256Z" },
+ { url = "https://files.pythonhosted.org/packages/19/6a/4ba3d0fb7297ebae71171822554abe48d7cab29c28b8f9f2c04b79988c05/rpds_py-0.30.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:4cc2206b76b4f576934f0ed374b10d7ca5f457858b157ca52064bdfc26b9fc00", size = 359751, upload-time = "2025-11-30T20:21:34.591Z" },
+ { url = "https://files.pythonhosted.org/packages/cd/7c/e4933565ef7f7a0818985d87c15d9d273f1a649afa6a52ea35ad011195ea/rpds_py-0.30.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:389a2d49eded1896c3d48b0136ead37c48e221b391c052fba3f4055c367f60a6", size = 389696, upload-time = "2025-11-30T20:21:36.122Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/01/6271a2511ad0815f00f7ed4390cf2567bec1d4b1da39e2c27a41e6e3b4de/rpds_py-0.30.0-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:32c8528634e1bf7121f3de08fa85b138f4e0dc47657866630611b03967f041d7", size = 403136, upload-time = "2025-11-30T20:21:37.728Z" },
+ { url = "https://files.pythonhosted.org/packages/55/64/c857eb7cd7541e9b4eee9d49c196e833128a55b89a9850a9c9ac33ccf897/rpds_py-0.30.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f207f69853edd6f6700b86efb84999651baf3789e78a466431df1331608e5324", size = 524699, upload-time = "2025-11-30T20:21:38.92Z" },
+ { url = "https://files.pythonhosted.org/packages/9c/ed/94816543404078af9ab26159c44f9e98e20fe47e2126d5d32c9d9948d10a/rpds_py-0.30.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:67b02ec25ba7a9e8fa74c63b6ca44cf5707f2fbfadae3ee8e7494297d56aa9df", size = 412022, upload-time = "2025-11-30T20:21:40.407Z" },
+ { url = "https://files.pythonhosted.org/packages/61/b5/707f6cf0066a6412aacc11d17920ea2e19e5b2f04081c64526eb35b5c6e7/rpds_py-0.30.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0c0e95f6819a19965ff420f65578bacb0b00f251fefe2c8b23347c37174271f3", size = 390522, upload-time = "2025-11-30T20:21:42.17Z" },
+ { url = "https://files.pythonhosted.org/packages/13/4e/57a85fda37a229ff4226f8cbcf09f2a455d1ed20e802ce5b2b4a7f5ed053/rpds_py-0.30.0-cp310-cp310-manylinux_2_31_riscv64.whl", hash = "sha256:a452763cc5198f2f98898eb98f7569649fe5da666c2dc6b5ddb10fde5a574221", size = 404579, upload-time = "2025-11-30T20:21:43.769Z" },
+ { url = "https://files.pythonhosted.org/packages/f9/da/c9339293513ec680a721e0e16bf2bac3db6e5d7e922488de471308349bba/rpds_py-0.30.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:e0b65193a413ccc930671c55153a03ee57cecb49e6227204b04fae512eb657a7", size = 421305, upload-time = "2025-11-30T20:21:44.994Z" },
+ { url = "https://files.pythonhosted.org/packages/f9/be/522cb84751114f4ad9d822ff5a1aa3c98006341895d5f084779b99596e5c/rpds_py-0.30.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:858738e9c32147f78b3ac24dc0edb6610000e56dc0f700fd5f651d0a0f0eb9ff", size = 572503, upload-time = "2025-11-30T20:21:46.91Z" },
+ { url = "https://files.pythonhosted.org/packages/a2/9b/de879f7e7ceddc973ea6e4629e9b380213a6938a249e94b0cdbcc325bb66/rpds_py-0.30.0-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:da279aa314f00acbb803da1e76fa18666778e8a8f83484fba94526da5de2cba7", size = 598322, upload-time = "2025-11-30T20:21:48.709Z" },
+ { url = "https://files.pythonhosted.org/packages/48/ac/f01fc22efec3f37d8a914fc1b2fb9bcafd56a299edbe96406f3053edea5a/rpds_py-0.30.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:7c64d38fb49b6cdeda16ab49e35fe0da2e1e9b34bc38bd78386530f218b37139", size = 560792, upload-time = "2025-11-30T20:21:50.024Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/da/4e2b19d0f131f35b6146425f846563d0ce036763e38913d917187307a671/rpds_py-0.30.0-cp310-cp310-win32.whl", hash = "sha256:6de2a32a1665b93233cde140ff8b3467bdb9e2af2b91079f0333a0974d12d464", size = 221901, upload-time = "2025-11-30T20:21:51.32Z" },
+ { url = "https://files.pythonhosted.org/packages/96/cb/156d7a5cf4f78a7cc571465d8aec7a3c447c94f6749c5123f08438bcf7bc/rpds_py-0.30.0-cp310-cp310-win_amd64.whl", hash = "sha256:1726859cd0de969f88dc8673bdd954185b9104e05806be64bcd87badbe313169", size = 235823, upload-time = "2025-11-30T20:21:52.505Z" },
+ { url = "https://files.pythonhosted.org/packages/4d/6e/f964e88b3d2abee2a82c1ac8366da848fce1c6d834dc2132c3fda3970290/rpds_py-0.30.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:a2bffea6a4ca9f01b3f8e548302470306689684e61602aa3d141e34da06cf425", size = 370157, upload-time = "2025-11-30T20:21:53.789Z" },
+ { url = "https://files.pythonhosted.org/packages/94/ba/24e5ebb7c1c82e74c4e4f33b2112a5573ddc703915b13a073737b59b86e0/rpds_py-0.30.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:dc4f992dfe1e2bc3ebc7444f6c7051b4bc13cd8e33e43511e8ffd13bf407010d", size = 359676, upload-time = "2025-11-30T20:21:55.475Z" },
+ { url = "https://files.pythonhosted.org/packages/84/86/04dbba1b087227747d64d80c3b74df946b986c57af0a9f0c98726d4d7a3b/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:422c3cb9856d80b09d30d2eb255d0754b23e090034e1deb4083f8004bd0761e4", size = 389938, upload-time = "2025-11-30T20:21:57.079Z" },
+ { url = "https://files.pythonhosted.org/packages/42/bb/1463f0b1722b7f45431bdd468301991d1328b16cffe0b1c2918eba2c4eee/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:07ae8a593e1c3c6b82ca3292efbe73c30b61332fd612e05abee07c79359f292f", size = 402932, upload-time = "2025-11-30T20:21:58.47Z" },
+ { url = "https://files.pythonhosted.org/packages/99/ee/2520700a5c1f2d76631f948b0736cdf9b0acb25abd0ca8e889b5c62ac2e3/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:12f90dd7557b6bd57f40abe7747e81e0c0b119bef015ea7726e69fe550e394a4", size = 525830, upload-time = "2025-11-30T20:21:59.699Z" },
+ { url = "https://files.pythonhosted.org/packages/e0/ad/bd0331f740f5705cc555a5e17fdf334671262160270962e69a2bdef3bf76/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:99b47d6ad9a6da00bec6aabe5a6279ecd3c06a329d4aa4771034a21e335c3a97", size = 412033, upload-time = "2025-11-30T20:22:00.991Z" },
+ { url = "https://files.pythonhosted.org/packages/f8/1e/372195d326549bb51f0ba0f2ecb9874579906b97e08880e7a65c3bef1a99/rpds_py-0.30.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:33f559f3104504506a44bb666b93a33f5d33133765b0c216a5bf2f1e1503af89", size = 390828, upload-time = "2025-11-30T20:22:02.723Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/2b/d88bb33294e3e0c76bc8f351a3721212713629ffca1700fa94979cb3eae8/rpds_py-0.30.0-cp311-cp311-manylinux_2_31_riscv64.whl", hash = "sha256:946fe926af6e44f3697abbc305ea168c2c31d3e3ef1058cf68f379bf0335a78d", size = 404683, upload-time = "2025-11-30T20:22:04.367Z" },
+ { url = "https://files.pythonhosted.org/packages/50/32/c759a8d42bcb5289c1fac697cd92f6fe01a018dd937e62ae77e0e7f15702/rpds_py-0.30.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:495aeca4b93d465efde585977365187149e75383ad2684f81519f504f5c13038", size = 421583, upload-time = "2025-11-30T20:22:05.814Z" },
+ { url = "https://files.pythonhosted.org/packages/2b/81/e729761dbd55ddf5d84ec4ff1f47857f4374b0f19bdabfcf929164da3e24/rpds_py-0.30.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d9a0ca5da0386dee0655b4ccdf46119df60e0f10da268d04fe7cc87886872ba7", size = 572496, upload-time = "2025-11-30T20:22:07.713Z" },
+ { url = "https://files.pythonhosted.org/packages/14/f6/69066a924c3557c9c30baa6ec3a0aa07526305684c6f86c696b08860726c/rpds_py-0.30.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:8d6d1cc13664ec13c1b84241204ff3b12f9bb82464b8ad6e7a5d3486975c2eed", size = 598669, upload-time = "2025-11-30T20:22:09.312Z" },
+ { url = "https://files.pythonhosted.org/packages/5f/48/905896b1eb8a05630d20333d1d8ffd162394127b74ce0b0784ae04498d32/rpds_py-0.30.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:3896fa1be39912cf0757753826bc8bdc8ca331a28a7c4ae46b7a21280b06bb85", size = 561011, upload-time = "2025-11-30T20:22:11.309Z" },
+ { url = "https://files.pythonhosted.org/packages/22/16/cd3027c7e279d22e5eb431dd3c0fbc677bed58797fe7581e148f3f68818b/rpds_py-0.30.0-cp311-cp311-win32.whl", hash = "sha256:55f66022632205940f1827effeff17c4fa7ae1953d2b74a8581baaefb7d16f8c", size = 221406, upload-time = "2025-11-30T20:22:13.101Z" },
+ { url = "https://files.pythonhosted.org/packages/fa/5b/e7b7aa136f28462b344e652ee010d4de26ee9fd16f1bfd5811f5153ccf89/rpds_py-0.30.0-cp311-cp311-win_amd64.whl", hash = "sha256:a51033ff701fca756439d641c0ad09a41d9242fa69121c7d8769604a0a629825", size = 236024, upload-time = "2025-11-30T20:22:14.853Z" },
+ { url = "https://files.pythonhosted.org/packages/14/a6/364bba985e4c13658edb156640608f2c9e1d3ea3c81b27aa9d889fff0e31/rpds_py-0.30.0-cp311-cp311-win_arm64.whl", hash = "sha256:47b0ef6231c58f506ef0b74d44e330405caa8428e770fec25329ed2cb971a229", size = 229069, upload-time = "2025-11-30T20:22:16.577Z" },
+ { url = "https://files.pythonhosted.org/packages/03/e7/98a2f4ac921d82f33e03f3835f5bf3a4a40aa1bfdc57975e74a97b2b4bdd/rpds_py-0.30.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a161f20d9a43006833cd7068375a94d035714d73a172b681d8881820600abfad", size = 375086, upload-time = "2025-11-30T20:22:17.93Z" },
+ { url = "https://files.pythonhosted.org/packages/4d/a1/bca7fd3d452b272e13335db8d6b0b3ecde0f90ad6f16f3328c6fb150c889/rpds_py-0.30.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6abc8880d9d036ecaafe709079969f56e876fcf107f7a8e9920ba6d5a3878d05", size = 359053, upload-time = "2025-11-30T20:22:19.297Z" },
+ { url = "https://files.pythonhosted.org/packages/65/1c/ae157e83a6357eceff62ba7e52113e3ec4834a84cfe07fa4b0757a7d105f/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ca28829ae5f5d569bb62a79512c842a03a12576375d5ece7d2cadf8abe96ec28", size = 390763, upload-time = "2025-11-30T20:22:21.661Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/36/eb2eb8515e2ad24c0bd43c3ee9cd74c33f7ca6430755ccdb240fd3144c44/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a1010ed9524c73b94d15919ca4d41d8780980e1765babf85f9a2f90d247153dd", size = 408951, upload-time = "2025-11-30T20:22:23.408Z" },
+ { url = "https://files.pythonhosted.org/packages/d6/65/ad8dc1784a331fabbd740ef6f71ce2198c7ed0890dab595adb9ea2d775a1/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f8d1736cfb49381ba528cd5baa46f82fdc65c06e843dab24dd70b63d09121b3f", size = 514622, upload-time = "2025-11-30T20:22:25.16Z" },
+ { url = "https://files.pythonhosted.org/packages/63/8e/0cfa7ae158e15e143fe03993b5bcd743a59f541f5952e1546b1ac1b5fd45/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d948b135c4693daff7bc2dcfc4ec57237a29bd37e60c2fabf5aff2bbacf3e2f1", size = 414492, upload-time = "2025-11-30T20:22:26.505Z" },
+ { url = "https://files.pythonhosted.org/packages/60/1b/6f8f29f3f995c7ffdde46a626ddccd7c63aefc0efae881dc13b6e5d5bb16/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:47f236970bccb2233267d89173d3ad2703cd36a0e2a6e92d0560d333871a3d23", size = 394080, upload-time = "2025-11-30T20:22:27.934Z" },
+ { url = "https://files.pythonhosted.org/packages/6d/d5/a266341051a7a3ca2f4b750a3aa4abc986378431fc2da508c5034d081b70/rpds_py-0.30.0-cp312-cp312-manylinux_2_31_riscv64.whl", hash = "sha256:2e6ecb5a5bcacf59c3f912155044479af1d0b6681280048b338b28e364aca1f6", size = 408680, upload-time = "2025-11-30T20:22:29.341Z" },
+ { url = "https://files.pythonhosted.org/packages/10/3b/71b725851df9ab7a7a4e33cf36d241933da66040d195a84781f49c50490c/rpds_py-0.30.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a8fa71a2e078c527c3e9dc9fc5a98c9db40bcc8a92b4e8858e36d329f8684b51", size = 423589, upload-time = "2025-11-30T20:22:31.469Z" },
+ { url = "https://files.pythonhosted.org/packages/00/2b/e59e58c544dc9bd8bd8384ecdb8ea91f6727f0e37a7131baeff8d6f51661/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:73c67f2db7bc334e518d097c6d1e6fed021bbc9b7d678d6cc433478365d1d5f5", size = 573289, upload-time = "2025-11-30T20:22:32.997Z" },
+ { url = "https://files.pythonhosted.org/packages/da/3e/a18e6f5b460893172a7d6a680e86d3b6bc87a54c1f0b03446a3c8c7b588f/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:5ba103fb455be00f3b1c2076c9d4264bfcb037c976167a6047ed82f23153f02e", size = 599737, upload-time = "2025-11-30T20:22:34.419Z" },
+ { url = "https://files.pythonhosted.org/packages/5c/e2/714694e4b87b85a18e2c243614974413c60aa107fd815b8cbc42b873d1d7/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7cee9c752c0364588353e627da8a7e808a66873672bcb5f52890c33fd965b394", size = 563120, upload-time = "2025-11-30T20:22:35.903Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/ab/d5d5e3bcedb0a77f4f613706b750e50a5a3ba1c15ccd3665ecc636c968fd/rpds_py-0.30.0-cp312-cp312-win32.whl", hash = "sha256:1ab5b83dbcf55acc8b08fc62b796ef672c457b17dbd7820a11d6c52c06839bdf", size = 223782, upload-time = "2025-11-30T20:22:37.271Z" },
+ { url = "https://files.pythonhosted.org/packages/39/3b/f786af9957306fdc38a74cef405b7b93180f481fb48453a114bb6465744a/rpds_py-0.30.0-cp312-cp312-win_amd64.whl", hash = "sha256:a090322ca841abd453d43456ac34db46e8b05fd9b3b4ac0c78bcde8b089f959b", size = 240463, upload-time = "2025-11-30T20:22:39.021Z" },
+ { url = "https://files.pythonhosted.org/packages/f3/d2/b91dc748126c1559042cfe41990deb92c4ee3e2b415f6b5234969ffaf0cc/rpds_py-0.30.0-cp312-cp312-win_arm64.whl", hash = "sha256:669b1805bd639dd2989b281be2cfd951c6121b65e729d9b843e9639ef1fd555e", size = 230868, upload-time = "2025-11-30T20:22:40.493Z" },
+ { url = "https://files.pythonhosted.org/packages/69/71/3f34339ee70521864411f8b6992e7ab13ac30d8e4e3309e07c7361767d91/rpds_py-0.30.0-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:c2262bdba0ad4fc6fb5545660673925c2d2a5d9e2e0fb603aad545427be0fc58", size = 372292, upload-time = "2025-11-30T20:24:16.537Z" },
+ { url = "https://files.pythonhosted.org/packages/57/09/f183df9b8f2d66720d2ef71075c59f7e1b336bec7ee4c48f0a2b06857653/rpds_py-0.30.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:ee6af14263f25eedc3bb918a3c04245106a42dfd4f5c2285ea6f997b1fc3f89a", size = 362128, upload-time = "2025-11-30T20:24:18.086Z" },
+ { url = "https://files.pythonhosted.org/packages/7a/68/5c2594e937253457342e078f0cc1ded3dd7b2ad59afdbf2d354869110a02/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3adbb8179ce342d235c31ab8ec511e66c73faa27a47e076ccc92421add53e2bb", size = 391542, upload-time = "2025-11-30T20:24:20.092Z" },
+ { url = "https://files.pythonhosted.org/packages/49/5c/31ef1afd70b4b4fbdb2800249f34c57c64beb687495b10aec0365f53dfc4/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:250fa00e9543ac9b97ac258bd37367ff5256666122c2d0f2bc97577c60a1818c", size = 404004, upload-time = "2025-11-30T20:24:22.231Z" },
+ { url = "https://files.pythonhosted.org/packages/e3/63/0cfbea38d05756f3440ce6534d51a491d26176ac045e2707adc99bb6e60a/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9854cf4f488b3d57b9aaeb105f06d78e5529d3145b1e4a41750167e8c213c6d3", size = 527063, upload-time = "2025-11-30T20:24:24.302Z" },
+ { url = "https://files.pythonhosted.org/packages/42/e6/01e1f72a2456678b0f618fc9a1a13f882061690893c192fcad9f2926553a/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:993914b8e560023bc0a8bf742c5f303551992dcb85e247b1e5c7f4a7d145bda5", size = 413099, upload-time = "2025-11-30T20:24:25.916Z" },
+ { url = "https://files.pythonhosted.org/packages/b8/25/8df56677f209003dcbb180765520c544525e3ef21ea72279c98b9aa7c7fb/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:58edca431fb9b29950807e301826586e5bbf24163677732429770a697ffe6738", size = 392177, upload-time = "2025-11-30T20:24:27.834Z" },
+ { url = "https://files.pythonhosted.org/packages/4a/b4/0a771378c5f16f8115f796d1f437950158679bcd2a7c68cf251cfb00ed5b/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_31_riscv64.whl", hash = "sha256:dea5b552272a944763b34394d04577cf0f9bd013207bc32323b5a89a53cf9c2f", size = 406015, upload-time = "2025-11-30T20:24:29.457Z" },
+ { url = "https://files.pythonhosted.org/packages/36/d8/456dbba0af75049dc6f63ff295a2f92766b9d521fa00de67a2bd6427d57a/rpds_py-0.30.0-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:ba3af48635eb83d03f6c9735dfb21785303e73d22ad03d489e88adae6eab8877", size = 423736, upload-time = "2025-11-30T20:24:31.22Z" },
+ { url = "https://files.pythonhosted.org/packages/13/64/b4d76f227d5c45a7e0b796c674fd81b0a6c4fbd48dc29271857d8219571c/rpds_py-0.30.0-pp311-pypy311_pp73-musllinux_1_2_aarch64.whl", hash = "sha256:dff13836529b921e22f15cb099751209a60009731a68519630a24d61f0b1b30a", size = 573981, upload-time = "2025-11-30T20:24:32.934Z" },
+ { url = "https://files.pythonhosted.org/packages/20/91/092bacadeda3edf92bf743cc96a7be133e13a39cdbfd7b5082e7ab638406/rpds_py-0.30.0-pp311-pypy311_pp73-musllinux_1_2_i686.whl", hash = "sha256:1b151685b23929ab7beec71080a8889d4d6d9fa9a983d213f07121205d48e2c4", size = 599782, upload-time = "2025-11-30T20:24:35.169Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/b7/b95708304cd49b7b6f82fdd039f1748b66ec2b21d6a45180910802f1abf1/rpds_py-0.30.0-pp311-pypy311_pp73-musllinux_1_2_x86_64.whl", hash = "sha256:ac37f9f516c51e5753f27dfdef11a88330f04de2d564be3991384b2f3535d02e", size = 562191, upload-time = "2025-11-30T20:24:36.853Z" },
+]
+
+[[package]]
+name = "ruff-api"
+version = "0.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/06/4b/3959bc9feb67876d70cb252f4b832510d2604fb86e75888a904e3aa86c97/ruff_api-0.1.0.tar.gz", hash = "sha256:6d9a9ebc54b159bb37b6e6980e89493b8f34a4347f9e45c7a937420bc465ceac", size = 30963, upload-time = "2024-10-25T04:51:41.586Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/46/11/88eccc1b2c400482738d291b37ca6165c50306a49aed65841f497386ec03/ruff_api-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:607541be57a2e430ac9bd5bfbab39a2e33515947f2e40f3f1ca8aff6964d04b5", size = 4737011, upload-time = "2024-10-25T04:50:44.542Z" },
+ { url = "https://files.pythonhosted.org/packages/3b/76/8d6795062b15aac71b3df8e31a51ab16f17e6c3b03dfcf95453d8d65d91f/ruff_api-0.1.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:7e1ae962e331477e0c4b6c0ce7adc11081d911223ce1c25c48df231741c58699", size = 4534195, upload-time = "2024-10-25T04:50:46.611Z" },
+ { url = "https://files.pythonhosted.org/packages/8a/e5/9b28d315772034aa7a9058bbcf7f4099182f0a4d3c343c1e74c14e40f1d0/ruff_api-0.1.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3ec868721932df4fd1e7bcd90c5302e59001234a3e2c03ed4f0eaf938cf5b798", size = 5019248, upload-time = "2024-10-25T04:50:48.652Z" },
+ { url = "https://files.pythonhosted.org/packages/bf/f5/258896061aaba8e5f5d2feb03fd5f0f0b73fb369f3a8ab85da5a2ba3302e/ruff_api-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:29c70028542b4008557e31bcfea9494da1ecf19e99656692b34ebcaae9d98f43", size = 5233762, upload-time = "2024-10-25T04:50:51.32Z" },
+ { url = "https://files.pythonhosted.org/packages/e7/09/e6ef93c3606807b1207318c3f7726cc7905f6253e9abc34cea32c2e90d8b/ruff_api-0.1.0-cp310-none-win_amd64.whl", hash = "sha256:5b9e1f00cb46edbdbe6f96e685ac4c0f95e202fae6950066c936e34dbddf2290", size = 4481629, upload-time = "2024-10-25T04:50:52.837Z" },
+ { url = "https://files.pythonhosted.org/packages/22/a0/e674c2153ddd6861b11cf54d9be8fb536d0acbcb6324d7376f7a356be80c/ruff_api-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:5a336489eca057bf4c2ef170a6848a4d8f6cbba40c583fad2708f0fc2626af5d", size = 4737030, upload-time = "2024-10-25T04:50:54.738Z" },
+ { url = "https://files.pythonhosted.org/packages/30/db/53036d387a9040244a270a475125293309b5250cce4dbca6d9634311e811/ruff_api-0.1.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:e6d122093ef95d58ebf9ad7ed287c63534eb9c3110f1ea10a1e8bf0cb75824bb", size = 4534484, upload-time = "2024-10-25T04:50:56.411Z" },
+ { url = "https://files.pythonhosted.org/packages/17/ce/53c501b232a946a6d48b655b4e4ceb1aeeb09e5e020252f2d67b35bf327a/ruff_api-0.1.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:647bc419cc56aa4ddc11cd145c9048f9d97527b5dce4e31d06ffef6494a39c65", size = 5019460, upload-time = "2024-10-25T04:50:58.309Z" },
+ { url = "https://files.pythonhosted.org/packages/a5/eb/a545e2c08c65becb8f5f562933208c36998d78dbd403952aca5ddf32829e/ruff_api-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f22137f4547ee0644b235b4ba9516a1cd549755fc22ce161c10abc516decefa9", size = 5233883, upload-time = "2024-10-25T04:51:00.267Z" },
+ { url = "https://files.pythonhosted.org/packages/eb/a2/11dbd2b4757061faaa6472f9ea49b888f89c03931d84527da594fdcff17b/ruff_api-0.1.0-cp311-none-win_amd64.whl", hash = "sha256:091fdd14bb256427b2eb72b719cf3df1dad58446ce08053238da1d3e0a64e8bd", size = 4481633, upload-time = "2024-10-25T04:51:02.105Z" },
+ { url = "https://files.pythonhosted.org/packages/bb/ef/dda4aba0c637ea4f108498a4b625fe4dde6e42586be17821bd900c43be44/ruff_api-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:6a3826d8b88ccd9e3a307de204303d416549c01600e0d0a6d7e083f06ea7626a", size = 4734446, upload-time = "2024-10-25T04:51:04.302Z" },
+ { url = "https://files.pythonhosted.org/packages/61/47/4b3ebbb24bed476136bca34ac51451ed9e5c90d36e96289193f353121ab7/ruff_api-0.1.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:9eeb2183e37d14ef5c24c0413efcc4f8d646c647d42bcde6722a139f5d313ceb", size = 4532360, upload-time = "2024-10-25T04:51:05.898Z" },
+ { url = "https://files.pythonhosted.org/packages/47/91/e04c66fc6b02a0c2cea519f9d417bbf8feea67f1f438ffc2ce5c99fcb4ea/ruff_api-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a144747e3583cf476f1509f2733969fff3dbf73b08725aa4fdf7a385f3d2238b", size = 5019318, upload-time = "2024-10-25T04:51:07.929Z" },
+ { url = "https://files.pythonhosted.org/packages/19/5d/fa705a70d1a338751df15a1a7b30efab8ac86e89207b4977ff065ae6c01b/ruff_api-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4412c2e9fccefe33ff5f8c99749995e9d56a324d2d107c805c8a1d9e5c9e229a", size = 5233200, upload-time = "2024-10-25T04:51:10.266Z" },
+ { url = "https://files.pythonhosted.org/packages/24/2a/22560db538635f98046e0d338ead76d66c5019250060eb8964156c9eef01/ruff_api-0.1.0-cp312-none-win_amd64.whl", hash = "sha256:647456e1d24adf2809b120fb299e2908a59da789462c30eefd917ac8251e7643", size = 4481760, upload-time = "2024-10-25T04:51:12.284Z" },
+]
+
+[[package]]
+name = "safetensors"
+version = "0.7.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/29/9c/6e74567782559a63bd040a236edca26fd71bc7ba88de2ef35d75df3bca5e/safetensors-0.7.0.tar.gz", hash = "sha256:07663963b67e8bd9f0b8ad15bb9163606cd27cc5a1b96235a50d8369803b96b0", size = 200878, upload-time = "2025-11-19T15:18:43.199Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/fa/47/aef6c06649039accf914afef490268e1067ed82be62bcfa5b7e886ad15e8/safetensors-0.7.0-cp38-abi3-macosx_10_12_x86_64.whl", hash = "sha256:c82f4d474cf725255d9e6acf17252991c3c8aac038d6ef363a4bf8be2f6db517", size = 467781, upload-time = "2025-11-19T15:18:35.84Z" },
+ { url = "https://files.pythonhosted.org/packages/e8/00/374c0c068e30cd31f1e1b46b4b5738168ec79e7689ca82ee93ddfea05109/safetensors-0.7.0-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:94fd4858284736bb67a897a41608b5b0c2496c9bdb3bf2af1fa3409127f20d57", size = 447058, upload-time = "2025-11-19T15:18:34.416Z" },
+ { url = "https://files.pythonhosted.org/packages/f1/06/578ffed52c2296f93d7fd2d844cabfa92be51a587c38c8afbb8ae449ca89/safetensors-0.7.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e07d91d0c92a31200f25351f4acb2bc6aff7f48094e13ebb1d0fb995b54b6542", size = 491748, upload-time = "2025-11-19T15:18:09.79Z" },
+ { url = "https://files.pythonhosted.org/packages/ae/33/1debbbb70e4791dde185edb9413d1fe01619255abb64b300157d7f15dddd/safetensors-0.7.0-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:8469155f4cb518bafb4acf4865e8bb9d6804110d2d9bdcaa78564b9fd841e104", size = 503881, upload-time = "2025-11-19T15:18:16.145Z" },
+ { url = "https://files.pythonhosted.org/packages/8e/1c/40c2ca924d60792c3be509833df711b553c60effbd91da6f5284a83f7122/safetensors-0.7.0-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:54bef08bf00a2bff599982f6b08e8770e09cc012d7bba00783fc7ea38f1fb37d", size = 623463, upload-time = "2025-11-19T15:18:21.11Z" },
+ { url = "https://files.pythonhosted.org/packages/9b/3a/13784a9364bd43b0d61eef4bea2845039bc2030458b16594a1bd787ae26e/safetensors-0.7.0-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:42cb091236206bb2016d245c377ed383aa7f78691748f3bb6ee1bfa51ae2ce6a", size = 532855, upload-time = "2025-11-19T15:18:25.719Z" },
+ { url = "https://files.pythonhosted.org/packages/a0/60/429e9b1cb3fc651937727befe258ea24122d9663e4d5709a48c9cbfceecb/safetensors-0.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dac7252938f0696ddea46f5e855dd3138444e82236e3be475f54929f0c510d48", size = 507152, upload-time = "2025-11-19T15:18:33.023Z" },
+ { url = "https://files.pythonhosted.org/packages/3c/a8/4b45e4e059270d17af60359713ffd83f97900d45a6afa73aaa0d737d48b6/safetensors-0.7.0-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:1d060c70284127fa805085d8f10fbd0962792aed71879d00864acda69dbab981", size = 541856, upload-time = "2025-11-19T15:18:31.075Z" },
+ { url = "https://files.pythonhosted.org/packages/06/87/d26d8407c44175d8ae164a95b5a62707fcc445f3c0c56108e37d98070a3d/safetensors-0.7.0-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:cdab83a366799fa730f90a4ebb563e494f28e9e92c4819e556152ad55e43591b", size = 674060, upload-time = "2025-11-19T15:18:37.211Z" },
+ { url = "https://files.pythonhosted.org/packages/11/f5/57644a2ff08dc6325816ba7217e5095f17269dada2554b658442c66aed51/safetensors-0.7.0-cp38-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:672132907fcad9f2aedcb705b2d7b3b93354a2aec1b2f706c4db852abe338f85", size = 771715, upload-time = "2025-11-19T15:18:38.689Z" },
+ { url = "https://files.pythonhosted.org/packages/86/31/17883e13a814bd278ae6e266b13282a01049b0c81341da7fd0e3e71a80a3/safetensors-0.7.0-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:5d72abdb8a4d56d4020713724ba81dac065fedb7f3667151c4a637f1d3fb26c0", size = 714377, upload-time = "2025-11-19T15:18:40.162Z" },
+ { url = "https://files.pythonhosted.org/packages/4a/d8/0c8a7dc9b41dcac53c4cbf9df2b9c83e0e0097203de8b37a712b345c0be5/safetensors-0.7.0-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:b0f6d66c1c538d5a94a73aa9ddca8ccc4227e6c9ff555322ea40bdd142391dd4", size = 677368, upload-time = "2025-11-19T15:18:41.627Z" },
+ { url = "https://files.pythonhosted.org/packages/05/e5/cb4b713c8a93469e3c5be7c3f8d77d307e65fe89673e731f5c2bfd0a9237/safetensors-0.7.0-cp38-abi3-win32.whl", hash = "sha256:c74af94bf3ac15ac4d0f2a7c7b4663a15f8c2ab15ed0fc7531ca61d0835eccba", size = 326423, upload-time = "2025-11-19T15:18:45.74Z" },
+ { url = "https://files.pythonhosted.org/packages/5d/e6/ec8471c8072382cb91233ba7267fd931219753bb43814cbc71757bfd4dab/safetensors-0.7.0-cp38-abi3-win_amd64.whl", hash = "sha256:d1239932053f56f3456f32eb9625590cc7582e905021f94636202a864d470755", size = 341380, upload-time = "2025-11-19T15:18:44.427Z" },
+ { url = "https://files.pythonhosted.org/packages/a7/6a/4d08d89a6fcbe905c5ae68b8b34f0791850882fc19782d0d02c65abbdf3b/safetensors-0.7.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f4729811a6640d019a4b7ba8638ee2fd21fa5ca8c7e7bdf0fed62068fcaac737", size = 492430, upload-time = "2025-11-19T15:18:11.884Z" },
+ { url = "https://files.pythonhosted.org/packages/dd/29/59ed8152b30f72c42d00d241e58eaca558ae9dbfa5695206e2e0f54c7063/safetensors-0.7.0-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:12f49080303fa6bb424b362149a12949dfbbf1e06811a88f2307276b0c131afd", size = 503977, upload-time = "2025-11-19T15:18:17.523Z" },
+ { url = "https://files.pythonhosted.org/packages/d3/0b/4811bfec67fa260e791369b16dab105e4bae82686120554cc484064e22b4/safetensors-0.7.0-pp310-pypy310_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:0071bffba4150c2f46cae1432d31995d77acfd9f8db598b5d1a2ce67e8440ad2", size = 623890, upload-time = "2025-11-19T15:18:22.666Z" },
+ { url = "https://files.pythonhosted.org/packages/58/5b/632a58724221ef03d78ab65062e82a1010e1bef8e8e0b9d7c6d7b8044841/safetensors-0.7.0-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:473b32699f4200e69801bf5abf93f1a4ecd432a70984df164fc22ccf39c4a6f3", size = 531885, upload-time = "2025-11-19T15:18:27.146Z" },
+]
+
+[[package]]
+name = "sam3"
+source = { editable = "." }
+dependencies = [
+ { name = "einops" },
+ { name = "ftfy" },
+ { name = "huggingface-hub" },
+ { name = "iopath" },
+ { name = "numpy" },
+ { name = "psutil" },
+ { name = "regex" },
+ { name = "timm" },
+ { name = "torch" },
+ { name = "torchvision" },
+ { name = "tqdm" },
+ { name = "typing-extensions" },
+]
+
+[package.optional-dependencies]
+dev = [
+ { name = "black" },
+ { name = "gitpython" },
+ { name = "numba" },
+ { name = "opencv-python" },
+ { name = "pandas", version = "2.3.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "pandas", version = "3.0.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "pycocotools" },
+ { name = "pytest" },
+ { name = "pytest-cov" },
+ { name = "python-rapidjson" },
+ { name = "ruff-api" },
+ { name = "ufmt" },
+ { name = "usort" },
+ { name = "yt-dlp" },
+]
+notebooks = [
+ { name = "decord" },
+ { name = "einops" },
+ { name = "ipycanvas" },
+ { name = "ipympl" },
+ { name = "ipywidgets" },
+ { name = "jupyter" },
+ { name = "matplotlib" },
+ { name = "notebook" },
+ { name = "opencv-python" },
+ { name = "pycocotools" },
+ { name = "scikit-image", version = "0.25.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "scikit-image", version = "0.26.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "scikit-learn", version = "1.7.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "scikit-learn", version = "1.8.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+]
+train = [
+ { name = "fairscale" },
+ { name = "fvcore" },
+ { name = "hydra-core" },
+ { name = "scikit-image", version = "0.25.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "scikit-image", version = "0.26.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "scikit-learn", version = "1.7.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "scikit-learn", version = "1.8.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "scipy", version = "1.15.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "scipy", version = "1.17.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "submitit" },
+ { name = "tensorboard" },
+ { name = "torchmetrics" },
+ { name = "zstandard" },
+]
+
+[package.dev-dependencies]
+dev = [
+ { name = "black" },
+ { name = "gitpython" },
+ { name = "numba" },
+ { name = "opencv-python" },
+ { name = "pandas", version = "2.3.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "pandas", version = "3.0.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "pycocotools" },
+ { name = "pytest" },
+ { name = "pytest-cov" },
+ { name = "python-rapidjson" },
+ { name = "ruff-api" },
+ { name = "ufmt" },
+ { name = "usort" },
+ { name = "yt-dlp" },
+]
+
+[package.metadata]
+requires-dist = [
+ { name = "black", marker = "extra == 'dev'", specifier = "==24.2.0" },
+ { name = "decord", marker = "extra == 'notebooks'" },
+ { name = "einops" },
+ { name = "einops", marker = "extra == 'notebooks'" },
+ { name = "fairscale", marker = "extra == 'train'" },
+ { name = "ftfy", specifier = "==6.1.1" },
+ { name = "fvcore", marker = "extra == 'train'" },
+ { name = "gitpython", marker = "extra == 'dev'", specifier = "==3.1.31" },
+ { name = "huggingface-hub" },
+ { name = "hydra-core", marker = "extra == 'train'" },
+ { name = "iopath", specifier = ">=0.1.10" },
+ { name = "ipycanvas", marker = "extra == 'notebooks'" },
+ { name = "ipympl", marker = "extra == 'notebooks'" },
+ { name = "ipywidgets", marker = "extra == 'notebooks'" },
+ { name = "jupyter", marker = "extra == 'notebooks'" },
+ { name = "matplotlib", marker = "extra == 'notebooks'" },
+ { name = "notebook", marker = "extra == 'notebooks'" },
+ { name = "numba", marker = "extra == 'dev'" },
+ { name = "numpy", specifier = ">=1.26,<2" },
+ { name = "opencv-python", marker = "extra == 'dev'" },
+ { name = "opencv-python", marker = "extra == 'notebooks'" },
+ { name = "pandas", marker = "extra == 'dev'" },
+ { name = "psutil" },
+ { name = "pycocotools", marker = "extra == 'dev'" },
+ { name = "pycocotools", marker = "extra == 'notebooks'" },
+ { name = "pytest", marker = "extra == 'dev'" },
+ { name = "pytest-cov", marker = "extra == 'dev'" },
+ { name = "python-rapidjson", marker = "extra == 'dev'" },
+ { name = "regex" },
+ { name = "ruff-api", marker = "extra == 'dev'", specifier = "==0.1.0" },
+ { name = "scikit-image", marker = "extra == 'notebooks'" },
+ { name = "scikit-image", marker = "extra == 'train'" },
+ { name = "scikit-learn", marker = "extra == 'notebooks'" },
+ { name = "scikit-learn", marker = "extra == 'train'" },
+ { name = "scipy", marker = "extra == 'train'" },
+ { name = "submitit", marker = "extra == 'train'" },
+ { name = "tensorboard", marker = "extra == 'train'" },
+ { name = "timm", specifier = ">=1.0.17" },
+ { name = "torch", specifier = ">=2.11,<2.12" },
+ { name = "torchmetrics", marker = "extra == 'train'" },
+ { name = "torchvision", specifier = ">=0.26,<0.27" },
+ { name = "tqdm" },
+ { name = "typing-extensions" },
+ { name = "ufmt", marker = "extra == 'dev'", specifier = "==2.8.0" },
+ { name = "usort", marker = "extra == 'dev'", specifier = "==1.0.2" },
+ { name = "yt-dlp", marker = "extra == 'dev'" },
+ { name = "zstandard", marker = "extra == 'train'" },
+]
+provides-extras = ["dev", "notebooks", "train"]
+
+[package.metadata.requires-dev]
+dev = [
+ { name = "black", specifier = "==24.2.0" },
+ { name = "gitpython", specifier = "==3.1.31" },
+ { name = "numba" },
+ { name = "opencv-python" },
+ { name = "pandas" },
+ { name = "pycocotools" },
+ { name = "pytest" },
+ { name = "pytest-cov" },
+ { name = "python-rapidjson" },
+ { name = "ruff-api", specifier = "==0.1.0" },
+ { name = "ufmt", specifier = "==2.8.0" },
+ { name = "usort", specifier = "==1.0.2" },
+ { name = "yt-dlp" },
+]
+
+[[package]]
+name = "scikit-image"
+version = "0.25.2"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+dependencies = [
+ { name = "imageio", marker = "python_full_version < '3.11'" },
+ { name = "lazy-loader", marker = "python_full_version < '3.11'" },
+ { name = "networkx", version = "3.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "numpy", marker = "python_full_version < '3.11'" },
+ { name = "packaging", marker = "python_full_version < '3.11'" },
+ { name = "pillow", marker = "python_full_version < '3.11'" },
+ { name = "scipy", version = "1.15.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "tifffile", version = "2025.5.10", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c7/a8/3c0f256012b93dd2cb6fda9245e9f4bff7dc0486880b248005f15ea2255e/scikit_image-0.25.2.tar.gz", hash = "sha256:e5a37e6cd4d0c018a7a55b9d601357e3382826d3888c10d0213fc63bff977dde", size = 22693594, upload-time = "2025-02-18T18:05:24.538Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/11/cb/016c63f16065c2d333c8ed0337e18a5cdf9bc32d402e4f26b0db362eb0e2/scikit_image-0.25.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:d3278f586793176599df6a4cf48cb6beadae35c31e58dc01a98023af3dc31c78", size = 13988922, upload-time = "2025-02-18T18:04:11.069Z" },
+ { url = "https://files.pythonhosted.org/packages/30/ca/ff4731289cbed63c94a0c9a5b672976603118de78ed21910d9060c82e859/scikit_image-0.25.2-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:5c311069899ce757d7dbf1d03e32acb38bb06153236ae77fcd820fd62044c063", size = 13192698, upload-time = "2025-02-18T18:04:15.362Z" },
+ { url = "https://files.pythonhosted.org/packages/39/6d/a2aadb1be6d8e149199bb9b540ccde9e9622826e1ab42fe01de4c35ab918/scikit_image-0.25.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:be455aa7039a6afa54e84f9e38293733a2622b8c2fb3362b822d459cc5605e99", size = 14153634, upload-time = "2025-02-18T18:04:18.496Z" },
+ { url = "https://files.pythonhosted.org/packages/96/08/916e7d9ee4721031b2f625db54b11d8379bd51707afaa3e5a29aecf10bc4/scikit_image-0.25.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a4c464b90e978d137330be433df4e76d92ad3c5f46a22f159520ce0fdbea8a09", size = 14767545, upload-time = "2025-02-18T18:04:22.556Z" },
+ { url = "https://files.pythonhosted.org/packages/5f/ee/c53a009e3997dda9d285402f19226fbd17b5b3cb215da391c4ed084a1424/scikit_image-0.25.2-cp310-cp310-win_amd64.whl", hash = "sha256:60516257c5a2d2f74387c502aa2f15a0ef3498fbeaa749f730ab18f0a40fd054", size = 12812908, upload-time = "2025-02-18T18:04:26.364Z" },
+ { url = "https://files.pythonhosted.org/packages/c4/97/3051c68b782ee3f1fb7f8f5bb7d535cf8cb92e8aae18fa9c1cdf7e15150d/scikit_image-0.25.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:f4bac9196fb80d37567316581c6060763b0f4893d3aca34a9ede3825bc035b17", size = 14003057, upload-time = "2025-02-18T18:04:30.395Z" },
+ { url = "https://files.pythonhosted.org/packages/19/23/257fc696c562639826065514d551b7b9b969520bd902c3a8e2fcff5b9e17/scikit_image-0.25.2-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:d989d64ff92e0c6c0f2018c7495a5b20e2451839299a018e0e5108b2680f71e0", size = 13180335, upload-time = "2025-02-18T18:04:33.449Z" },
+ { url = "https://files.pythonhosted.org/packages/ef/14/0c4a02cb27ca8b1e836886b9ec7c9149de03053650e9e2ed0625f248dd92/scikit_image-0.25.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b2cfc96b27afe9a05bc92f8c6235321d3a66499995675b27415e0d0c76625173", size = 14144783, upload-time = "2025-02-18T18:04:36.594Z" },
+ { url = "https://files.pythonhosted.org/packages/dd/9b/9fb556463a34d9842491d72a421942c8baff4281025859c84fcdb5e7e602/scikit_image-0.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:24cc986e1f4187a12aa319f777b36008764e856e5013666a4a83f8df083c2641", size = 14785376, upload-time = "2025-02-18T18:04:39.856Z" },
+ { url = "https://files.pythonhosted.org/packages/de/ec/b57c500ee85885df5f2188f8bb70398481393a69de44a00d6f1d055f103c/scikit_image-0.25.2-cp311-cp311-win_amd64.whl", hash = "sha256:b4f6b61fc2db6340696afe3db6b26e0356911529f5f6aee8c322aa5157490c9b", size = 12791698, upload-time = "2025-02-18T18:04:42.868Z" },
+ { url = "https://files.pythonhosted.org/packages/35/8c/5df82881284459f6eec796a5ac2a0a304bb3384eec2e73f35cfdfcfbf20c/scikit_image-0.25.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8db8dd03663112783221bf01ccfc9512d1cc50ac9b5b0fe8f4023967564719fb", size = 13986000, upload-time = "2025-02-18T18:04:47.156Z" },
+ { url = "https://files.pythonhosted.org/packages/ce/e6/93bebe1abcdce9513ffec01d8af02528b4c41fb3c1e46336d70b9ed4ef0d/scikit_image-0.25.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:483bd8cc10c3d8a7a37fae36dfa5b21e239bd4ee121d91cad1f81bba10cfb0ed", size = 13235893, upload-time = "2025-02-18T18:04:51.049Z" },
+ { url = "https://files.pythonhosted.org/packages/53/4b/eda616e33f67129e5979a9eb33c710013caa3aa8a921991e6cc0b22cea33/scikit_image-0.25.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9d1e80107bcf2bf1291acfc0bf0425dceb8890abe9f38d8e94e23497cbf7ee0d", size = 14178389, upload-time = "2025-02-18T18:04:54.245Z" },
+ { url = "https://files.pythonhosted.org/packages/6b/b5/b75527c0f9532dd8a93e8e7cd8e62e547b9f207d4c11e24f0006e8646b36/scikit_image-0.25.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a17e17eb8562660cc0d31bb55643a4da996a81944b82c54805c91b3fe66f4824", size = 15003435, upload-time = "2025-02-18T18:04:57.586Z" },
+ { url = "https://files.pythonhosted.org/packages/34/e3/49beb08ebccda3c21e871b607c1cb2f258c3fa0d2f609fed0a5ba741b92d/scikit_image-0.25.2-cp312-cp312-win_amd64.whl", hash = "sha256:bdd2b8c1de0849964dbc54037f36b4e9420157e67e45a8709a80d727f52c7da2", size = 12899474, upload-time = "2025-02-18T18:05:01.166Z" },
+]
+
+[[package]]
+name = "scikit-image"
+version = "0.26.0"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+]
+dependencies = [
+ { name = "imageio", marker = "python_full_version >= '3.11'" },
+ { name = "lazy-loader", marker = "python_full_version >= '3.11'" },
+ { name = "networkx", version = "3.6.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "numpy", marker = "python_full_version >= '3.11'" },
+ { name = "packaging", marker = "python_full_version >= '3.11'" },
+ { name = "pillow", marker = "python_full_version >= '3.11'" },
+ { name = "scipy", version = "1.17.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "tifffile", version = "2026.1.28", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a1/b4/2528bb43c67d48053a7a649a9666432dc307d66ba02e3a6d5c40f46655df/scikit_image-0.26.0.tar.gz", hash = "sha256:f5f970ab04efad85c24714321fcc91613fcb64ef2a892a13167df2f3e59199fa", size = 22729739, upload-time = "2025-12-20T17:12:21.824Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/76/16/8a407688b607f86f81f8c649bf0d68a2a6d67375f18c2d660aba20f5b648/scikit_image-0.26.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:b1ede33a0fb3731457eaf53af6361e73dd510f449dac437ab54573b26788baf0", size = 12355510, upload-time = "2025-12-20T17:10:31.628Z" },
+ { url = "https://files.pythonhosted.org/packages/6b/f9/7efc088ececb6f6868fd4475e16cfafc11f242ce9ab5fc3557d78b5da0d4/scikit_image-0.26.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7af7aa331c6846bd03fa28b164c18d0c3fd419dbb888fb05e958ac4257a78fdd", size = 12056334, upload-time = "2025-12-20T17:10:34.559Z" },
+ { url = "https://files.pythonhosted.org/packages/9f/1e/bc7fb91fb5ff65ef42346c8b7ee8b09b04eabf89235ab7dbfdfd96cbd1ea/scikit_image-0.26.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9ea6207d9e9d21c3f464efe733121c0504e494dbdc7728649ff3e23c3c5a4953", size = 13297768, upload-time = "2025-12-20T17:10:37.733Z" },
+ { url = "https://files.pythonhosted.org/packages/a5/2a/e71c1a7d90e70da67b88ccc609bd6ae54798d5847369b15d3a8052232f9d/scikit_image-0.26.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:74aa5518ccea28121f57a95374581d3b979839adc25bb03f289b1bc9b99c58af", size = 13711217, upload-time = "2025-12-20T17:10:40.935Z" },
+ { url = "https://files.pythonhosted.org/packages/d4/59/9637ee12c23726266b91296791465218973ce1ad3e4c56fc81e4d8e7d6e1/scikit_image-0.26.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d5c244656de905e195a904e36dbc18585e06ecf67d90f0482cbde63d7f9ad59d", size = 14337782, upload-time = "2025-12-20T17:10:43.452Z" },
+ { url = "https://files.pythonhosted.org/packages/e7/5c/a3e1e0860f9294663f540c117e4bf83d55e5b47c281d475cc06227e88411/scikit_image-0.26.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:21a818ee6ca2f2131b9e04d8eb7637b5c18773ebe7b399ad23dcc5afaa226d2d", size = 14805997, upload-time = "2025-12-20T17:10:45.93Z" },
+ { url = "https://files.pythonhosted.org/packages/d3/c6/2eeacf173da041a9e388975f54e5c49df750757fcfc3ee293cdbbae1ea0a/scikit_image-0.26.0-cp311-cp311-win_amd64.whl", hash = "sha256:9490360c8d3f9a7e85c8de87daf7c0c66507960cf4947bb9610d1751928721c7", size = 11878486, upload-time = "2025-12-20T17:10:48.246Z" },
+ { url = "https://files.pythonhosted.org/packages/c3/a4/a852c4949b9058d585e762a66bf7e9a2cd3be4795cd940413dfbfbb0ce79/scikit_image-0.26.0-cp311-cp311-win_arm64.whl", hash = "sha256:0baa0108d2d027f34d748e84e592b78acc23e965a5de0e4bb03cf371de5c0581", size = 11346518, upload-time = "2025-12-20T17:10:50.575Z" },
+ { url = "https://files.pythonhosted.org/packages/99/e8/e13757982264b33a1621628f86b587e9a73a13f5256dad49b19ba7dc9083/scikit_image-0.26.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:d454b93a6fa770ac5ae2d33570f8e7a321bb80d29511ce4b6b78058ebe176e8c", size = 12376452, upload-time = "2025-12-20T17:10:52.796Z" },
+ { url = "https://files.pythonhosted.org/packages/e3/be/f8dd17d0510f9911f9f17ba301f7455328bf13dae416560126d428de9568/scikit_image-0.26.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3409e89d66eff5734cd2b672d1c48d2759360057e714e1d92a11df82c87cba37", size = 12061567, upload-time = "2025-12-20T17:10:55.207Z" },
+ { url = "https://files.pythonhosted.org/packages/b3/2b/c70120a6880579fb42b91567ad79feb4772f7be72e8d52fec403a3dde0c6/scikit_image-0.26.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4c717490cec9e276afb0438dd165b7c3072d6c416709cc0f9f5a4c1070d23a44", size = 13084214, upload-time = "2025-12-20T17:10:57.468Z" },
+ { url = "https://files.pythonhosted.org/packages/f4/a2/70401a107d6d7466d64b466927e6b96fcefa99d57494b972608e2f8be50f/scikit_image-0.26.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7df650e79031634ac90b11e64a9eedaf5a5e06fcd09bcd03a34be01745744466", size = 13561683, upload-time = "2025-12-20T17:10:59.49Z" },
+ { url = "https://files.pythonhosted.org/packages/13/a5/48bdfd92794c5002d664e0910a349d0a1504671ef5ad358150f21643c79a/scikit_image-0.26.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:cefd85033e66d4ea35b525bb0937d7f42d4cdcfed2d1888e1570d5ce450d3932", size = 14112147, upload-time = "2025-12-20T17:11:02.083Z" },
+ { url = "https://files.pythonhosted.org/packages/ee/b5/ac71694da92f5def5953ca99f18a10fe98eac2dd0a34079389b70b4d0394/scikit_image-0.26.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:3f5bf622d7c0435884e1e141ebbe4b2804e16b2dd23ae4c6183e2ea99233be70", size = 14661625, upload-time = "2025-12-20T17:11:04.528Z" },
+ { url = "https://files.pythonhosted.org/packages/23/4d/a3cc1e96f080e253dad2251bfae7587cf2b7912bcd76fd43fd366ff35a87/scikit_image-0.26.0-cp312-cp312-win_amd64.whl", hash = "sha256:abed017474593cd3056ae0fe948d07d0747b27a085e92df5474f4955dd65aec0", size = 11911059, upload-time = "2025-12-20T17:11:06.61Z" },
+ { url = "https://files.pythonhosted.org/packages/35/8a/d1b8055f584acc937478abf4550d122936f420352422a1a625eef2c605d8/scikit_image-0.26.0-cp312-cp312-win_arm64.whl", hash = "sha256:4d57e39ef67a95d26860c8caf9b14b8fb130f83b34c6656a77f191fa6d1d04d8", size = 11348740, upload-time = "2025-12-20T17:11:09.118Z" },
+]
+
+[[package]]
+name = "scikit-learn"
+version = "1.7.2"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+dependencies = [
+ { name = "joblib", marker = "python_full_version < '3.11'" },
+ { name = "numpy", marker = "python_full_version < '3.11'" },
+ { name = "scipy", version = "1.15.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "threadpoolctl", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/98/c2/a7855e41c9d285dfe86dc50b250978105dce513d6e459ea66a6aeb0e1e0c/scikit_learn-1.7.2.tar.gz", hash = "sha256:20e9e49ecd130598f1ca38a1d85090e1a600147b9c02fa6f15d69cb53d968fda", size = 7193136, upload-time = "2025-09-09T08:21:29.075Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ba/3e/daed796fd69cce768b8788401cc464ea90b306fb196ae1ffed0b98182859/scikit_learn-1.7.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:6b33579c10a3081d076ab403df4a4190da4f4432d443521674637677dc91e61f", size = 9336221, upload-time = "2025-09-09T08:20:19.328Z" },
+ { url = "https://files.pythonhosted.org/packages/1c/ce/af9d99533b24c55ff4e18d9b7b4d9919bbc6cd8f22fe7a7be01519a347d5/scikit_learn-1.7.2-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:36749fb62b3d961b1ce4fedf08fa57a1986cd409eff2d783bca5d4b9b5fce51c", size = 8653834, upload-time = "2025-09-09T08:20:22.073Z" },
+ { url = "https://files.pythonhosted.org/packages/58/0e/8c2a03d518fb6bd0b6b0d4b114c63d5f1db01ff0f9925d8eb10960d01c01/scikit_learn-1.7.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7a58814265dfc52b3295b1900cfb5701589d30a8bb026c7540f1e9d3499d5ec8", size = 9660938, upload-time = "2025-09-09T08:20:24.327Z" },
+ { url = "https://files.pythonhosted.org/packages/2b/75/4311605069b5d220e7cf5adabb38535bd96f0079313cdbb04b291479b22a/scikit_learn-1.7.2-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4a847fea807e278f821a0406ca01e387f97653e284ecbd9750e3ee7c90347f18", size = 9477818, upload-time = "2025-09-09T08:20:26.845Z" },
+ { url = "https://files.pythonhosted.org/packages/7f/9b/87961813c34adbca21a6b3f6b2bea344c43b30217a6d24cc437c6147f3e8/scikit_learn-1.7.2-cp310-cp310-win_amd64.whl", hash = "sha256:ca250e6836d10e6f402436d6463d6c0e4d8e0234cfb6a9a47835bd392b852ce5", size = 8886969, upload-time = "2025-09-09T08:20:29.329Z" },
+ { url = "https://files.pythonhosted.org/packages/43/83/564e141eef908a5863a54da8ca342a137f45a0bfb71d1d79704c9894c9d1/scikit_learn-1.7.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:c7509693451651cd7361d30ce4e86a1347493554f172b1c72a39300fa2aea79e", size = 9331967, upload-time = "2025-09-09T08:20:32.421Z" },
+ { url = "https://files.pythonhosted.org/packages/18/d6/ba863a4171ac9d7314c4d3fc251f015704a2caeee41ced89f321c049ed83/scikit_learn-1.7.2-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:0486c8f827c2e7b64837c731c8feff72c0bd2b998067a8a9cbc10643c31f0fe1", size = 8648645, upload-time = "2025-09-09T08:20:34.436Z" },
+ { url = "https://files.pythonhosted.org/packages/ef/0e/97dbca66347b8cf0ea8b529e6bb9367e337ba2e8be0ef5c1a545232abfde/scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:89877e19a80c7b11a2891a27c21c4894fb18e2c2e077815bcade10d34287b20d", size = 9715424, upload-time = "2025-09-09T08:20:36.776Z" },
+ { url = "https://files.pythonhosted.org/packages/f7/32/1f3b22e3207e1d2c883a7e09abb956362e7d1bd2f14458c7de258a26ac15/scikit_learn-1.7.2-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8da8bf89d4d79aaec192d2bda62f9b56ae4e5b4ef93b6a56b5de4977e375c1f1", size = 9509234, upload-time = "2025-09-09T08:20:38.957Z" },
+ { url = "https://files.pythonhosted.org/packages/9f/71/34ddbd21f1da67c7a768146968b4d0220ee6831e4bcbad3e03dd3eae88b6/scikit_learn-1.7.2-cp311-cp311-win_amd64.whl", hash = "sha256:9b7ed8d58725030568523e937c43e56bc01cadb478fc43c042a9aca1dacb3ba1", size = 8894244, upload-time = "2025-09-09T08:20:41.166Z" },
+ { url = "https://files.pythonhosted.org/packages/a7/aa/3996e2196075689afb9fce0410ebdb4a09099d7964d061d7213700204409/scikit_learn-1.7.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8d91a97fa2b706943822398ab943cde71858a50245e31bc71dba62aab1d60a96", size = 9259818, upload-time = "2025-09-09T08:20:43.19Z" },
+ { url = "https://files.pythonhosted.org/packages/43/5d/779320063e88af9c4a7c2cf463ff11c21ac9c8bd730c4a294b0000b666c9/scikit_learn-1.7.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:acbc0f5fd2edd3432a22c69bed78e837c70cf896cd7993d71d51ba6708507476", size = 8636997, upload-time = "2025-09-09T08:20:45.468Z" },
+ { url = "https://files.pythonhosted.org/packages/5c/d0/0c577d9325b05594fdd33aa970bf53fb673f051a45496842caee13cfd7fe/scikit_learn-1.7.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:e5bf3d930aee75a65478df91ac1225ff89cd28e9ac7bd1196853a9229b6adb0b", size = 9478381, upload-time = "2025-09-09T08:20:47.982Z" },
+ { url = "https://files.pythonhosted.org/packages/82/70/8bf44b933837ba8494ca0fc9a9ab60f1c13b062ad0197f60a56e2fc4c43e/scikit_learn-1.7.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b4d6e9deed1a47aca9fe2f267ab8e8fe82ee20b4526b2c0cd9e135cea10feb44", size = 9300296, upload-time = "2025-09-09T08:20:50.366Z" },
+ { url = "https://files.pythonhosted.org/packages/c6/99/ed35197a158f1fdc2fe7c3680e9c70d0128f662e1fee4ed495f4b5e13db0/scikit_learn-1.7.2-cp312-cp312-win_amd64.whl", hash = "sha256:6088aa475f0785e01bcf8529f55280a3d7d298679f50c0bb70a2364a82d0b290", size = 8731256, upload-time = "2025-09-09T08:20:52.627Z" },
+]
+
+[[package]]
+name = "scikit-learn"
+version = "1.8.0"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+]
+dependencies = [
+ { name = "joblib", marker = "python_full_version >= '3.11'" },
+ { name = "numpy", marker = "python_full_version >= '3.11'" },
+ { name = "scipy", version = "1.17.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "threadpoolctl", marker = "python_full_version >= '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/0e/d4/40988bf3b8e34feec1d0e6a051446b1f66225f8529b9309becaeef62b6c4/scikit_learn-1.8.0.tar.gz", hash = "sha256:9bccbb3b40e3de10351f8f5068e105d0f4083b1a65fa07b6634fbc401a6287fd", size = 7335585, upload-time = "2025-12-10T07:08:53.618Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c9/92/53ea2181da8ac6bf27170191028aee7251f8f841f8d3edbfdcaf2008fde9/scikit_learn-1.8.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:146b4d36f800c013d267b29168813f7a03a43ecd2895d04861f1240b564421da", size = 8595835, upload-time = "2025-12-10T07:07:39.385Z" },
+ { url = "https://files.pythonhosted.org/packages/01/18/d154dc1638803adf987910cdd07097d9c526663a55666a97c124d09fb96a/scikit_learn-1.8.0-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:f984ca4b14914e6b4094c5d52a32ea16b49832c03bd17a110f004db3c223e8e1", size = 8080381, upload-time = "2025-12-10T07:07:41.93Z" },
+ { url = "https://files.pythonhosted.org/packages/8a/44/226142fcb7b7101e64fdee5f49dbe6288d4c7af8abf593237b70fca080a4/scikit_learn-1.8.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5e30adb87f0cc81c7690a84f7932dd66be5bac57cfe16b91cb9151683a4a2d3b", size = 8799632, upload-time = "2025-12-10T07:07:43.899Z" },
+ { url = "https://files.pythonhosted.org/packages/36/4d/4a67f30778a45d542bbea5db2dbfa1e9e100bf9ba64aefe34215ba9f11f6/scikit_learn-1.8.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ada8121bcb4dac28d930febc791a69f7cb1673c8495e5eee274190b73a4559c1", size = 9103788, upload-time = "2025-12-10T07:07:45.982Z" },
+ { url = "https://files.pythonhosted.org/packages/89/3c/45c352094cfa60050bcbb967b1faf246b22e93cb459f2f907b600f2ceda5/scikit_learn-1.8.0-cp311-cp311-win_amd64.whl", hash = "sha256:c57b1b610bd1f40ba43970e11ce62821c2e6569e4d74023db19c6b26f246cb3b", size = 8081706, upload-time = "2025-12-10T07:07:48.111Z" },
+ { url = "https://files.pythonhosted.org/packages/3d/46/5416595bb395757f754feb20c3d776553a386b661658fb21b7c814e89efe/scikit_learn-1.8.0-cp311-cp311-win_arm64.whl", hash = "sha256:2838551e011a64e3053ad7618dda9310175f7515f1742fa2d756f7c874c05961", size = 7688451, upload-time = "2025-12-10T07:07:49.873Z" },
+ { url = "https://files.pythonhosted.org/packages/90/74/e6a7cc4b820e95cc38cf36cd74d5aa2b42e8ffc2d21fe5a9a9c45c1c7630/scikit_learn-1.8.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:5fb63362b5a7ddab88e52b6dbb47dac3fd7dafeee740dc6c8d8a446ddedade8e", size = 8548242, upload-time = "2025-12-10T07:07:51.568Z" },
+ { url = "https://files.pythonhosted.org/packages/49/d8/9be608c6024d021041c7f0b3928d4749a706f4e2c3832bbede4fb4f58c95/scikit_learn-1.8.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:5025ce924beccb28298246e589c691fe1b8c1c96507e6d27d12c5fadd85bfd76", size = 8079075, upload-time = "2025-12-10T07:07:53.697Z" },
+ { url = "https://files.pythonhosted.org/packages/dd/47/f187b4636ff80cc63f21cd40b7b2d177134acaa10f6bb73746130ee8c2e5/scikit_learn-1.8.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4496bb2cf7a43ce1a2d7524a79e40bc5da45cf598dbf9545b7e8316ccba47bb4", size = 8660492, upload-time = "2025-12-10T07:07:55.574Z" },
+ { url = "https://files.pythonhosted.org/packages/97/74/b7a304feb2b49df9fafa9382d4d09061a96ee9a9449a7cbea7988dda0828/scikit_learn-1.8.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a0bcfe4d0d14aec44921545fd2af2338c7471de9cb701f1da4c9d85906ab847a", size = 8931904, upload-time = "2025-12-10T07:07:57.666Z" },
+ { url = "https://files.pythonhosted.org/packages/9f/c4/0ab22726a04ede56f689476b760f98f8f46607caecff993017ac1b64aa5d/scikit_learn-1.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:35c007dedb2ffe38fe3ee7d201ebac4a2deccd2408e8621d53067733e3c74809", size = 8019359, upload-time = "2025-12-10T07:07:59.838Z" },
+ { url = "https://files.pythonhosted.org/packages/24/90/344a67811cfd561d7335c1b96ca21455e7e472d281c3c279c4d3f2300236/scikit_learn-1.8.0-cp312-cp312-win_arm64.whl", hash = "sha256:8c497fff237d7b4e07e9ef1a640887fa4fb765647f86fbe00f969ff6280ce2bb", size = 7641898, upload-time = "2025-12-10T07:08:01.36Z" },
+]
+
+[[package]]
+name = "scipy"
+version = "1.15.3"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+dependencies = [
+ { name = "numpy", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/0f/37/6964b830433e654ec7485e45a00fc9a27cf868d622838f6b6d9c5ec0d532/scipy-1.15.3.tar.gz", hash = "sha256:eae3cf522bc7df64b42cad3925c876e1b0b6c35c1337c93e12c0f366f55b0eaf", size = 59419214, upload-time = "2025-05-08T16:13:05.955Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/78/2f/4966032c5f8cc7e6a60f1b2e0ad686293b9474b65246b0c642e3ef3badd0/scipy-1.15.3-cp310-cp310-macosx_10_13_x86_64.whl", hash = "sha256:a345928c86d535060c9c2b25e71e87c39ab2f22fc96e9636bd74d1dbf9de448c", size = 38702770, upload-time = "2025-05-08T16:04:20.849Z" },
+ { url = "https://files.pythonhosted.org/packages/a0/6e/0c3bf90fae0e910c274db43304ebe25a6b391327f3f10b5dcc638c090795/scipy-1.15.3-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:ad3432cb0f9ed87477a8d97f03b763fd1d57709f1bbde3c9369b1dff5503b253", size = 30094511, upload-time = "2025-05-08T16:04:27.103Z" },
+ { url = "https://files.pythonhosted.org/packages/ea/b1/4deb37252311c1acff7f101f6453f0440794f51b6eacb1aad4459a134081/scipy-1.15.3-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:aef683a9ae6eb00728a542b796f52a5477b78252edede72b8327a886ab63293f", size = 22368151, upload-time = "2025-05-08T16:04:31.731Z" },
+ { url = "https://files.pythonhosted.org/packages/38/7d/f457626e3cd3c29b3a49ca115a304cebb8cc6f31b04678f03b216899d3c6/scipy-1.15.3-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:1c832e1bd78dea67d5c16f786681b28dd695a8cb1fb90af2e27580d3d0967e92", size = 25121732, upload-time = "2025-05-08T16:04:36.596Z" },
+ { url = "https://files.pythonhosted.org/packages/db/0a/92b1de4a7adc7a15dcf5bddc6e191f6f29ee663b30511ce20467ef9b82e4/scipy-1.15.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:263961f658ce2165bbd7b99fa5135195c3a12d9bef045345016b8b50c315cb82", size = 35547617, upload-time = "2025-05-08T16:04:43.546Z" },
+ { url = "https://files.pythonhosted.org/packages/8e/6d/41991e503e51fc1134502694c5fa7a1671501a17ffa12716a4a9151af3df/scipy-1.15.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9e2abc762b0811e09a0d3258abee2d98e0c703eee49464ce0069590846f31d40", size = 37662964, upload-time = "2025-05-08T16:04:49.431Z" },
+ { url = "https://files.pythonhosted.org/packages/25/e1/3df8f83cb15f3500478c889be8fb18700813b95e9e087328230b98d547ff/scipy-1.15.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:ed7284b21a7a0c8f1b6e5977ac05396c0d008b89e05498c8b7e8f4a1423bba0e", size = 37238749, upload-time = "2025-05-08T16:04:55.215Z" },
+ { url = "https://files.pythonhosted.org/packages/93/3e/b3257cf446f2a3533ed7809757039016b74cd6f38271de91682aa844cfc5/scipy-1.15.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:5380741e53df2c566f4d234b100a484b420af85deb39ea35a1cc1be84ff53a5c", size = 40022383, upload-time = "2025-05-08T16:05:01.914Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/84/55bc4881973d3f79b479a5a2e2df61c8c9a04fcb986a213ac9c02cfb659b/scipy-1.15.3-cp310-cp310-win_amd64.whl", hash = "sha256:9d61e97b186a57350f6d6fd72640f9e99d5a4a2b8fbf4b9ee9a841eab327dc13", size = 41259201, upload-time = "2025-05-08T16:05:08.166Z" },
+ { url = "https://files.pythonhosted.org/packages/96/ab/5cc9f80f28f6a7dff646c5756e559823614a42b1939d86dd0ed550470210/scipy-1.15.3-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:993439ce220d25e3696d1b23b233dd010169b62f6456488567e830654ee37a6b", size = 38714255, upload-time = "2025-05-08T16:05:14.596Z" },
+ { url = "https://files.pythonhosted.org/packages/4a/4a/66ba30abe5ad1a3ad15bfb0b59d22174012e8056ff448cb1644deccbfed2/scipy-1.15.3-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:34716e281f181a02341ddeaad584205bd2fd3c242063bd3423d61ac259ca7eba", size = 30111035, upload-time = "2025-05-08T16:05:20.152Z" },
+ { url = "https://files.pythonhosted.org/packages/4b/fa/a7e5b95afd80d24313307f03624acc65801846fa75599034f8ceb9e2cbf6/scipy-1.15.3-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:3b0334816afb8b91dab859281b1b9786934392aa3d527cd847e41bb6f45bee65", size = 22384499, upload-time = "2025-05-08T16:05:24.494Z" },
+ { url = "https://files.pythonhosted.org/packages/17/99/f3aaddccf3588bb4aea70ba35328c204cadd89517a1612ecfda5b2dd9d7a/scipy-1.15.3-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:6db907c7368e3092e24919b5e31c76998b0ce1684d51a90943cb0ed1b4ffd6c1", size = 25152602, upload-time = "2025-05-08T16:05:29.313Z" },
+ { url = "https://files.pythonhosted.org/packages/56/c5/1032cdb565f146109212153339f9cb8b993701e9fe56b1c97699eee12586/scipy-1.15.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:721d6b4ef5dc82ca8968c25b111e307083d7ca9091bc38163fb89243e85e3889", size = 35503415, upload-time = "2025-05-08T16:05:34.699Z" },
+ { url = "https://files.pythonhosted.org/packages/bd/37/89f19c8c05505d0601ed5650156e50eb881ae3918786c8fd7262b4ee66d3/scipy-1.15.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:39cb9c62e471b1bb3750066ecc3a3f3052b37751c7c3dfd0fd7e48900ed52982", size = 37652622, upload-time = "2025-05-08T16:05:40.762Z" },
+ { url = "https://files.pythonhosted.org/packages/7e/31/be59513aa9695519b18e1851bb9e487de66f2d31f835201f1b42f5d4d475/scipy-1.15.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:795c46999bae845966368a3c013e0e00947932d68e235702b5c3f6ea799aa8c9", size = 37244796, upload-time = "2025-05-08T16:05:48.119Z" },
+ { url = "https://files.pythonhosted.org/packages/10/c0/4f5f3eeccc235632aab79b27a74a9130c6c35df358129f7ac8b29f562ac7/scipy-1.15.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:18aaacb735ab38b38db42cb01f6b92a2d0d4b6aabefeb07f02849e47f8fb3594", size = 40047684, upload-time = "2025-05-08T16:05:54.22Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/a7/0ddaf514ce8a8714f6ed243a2b391b41dbb65251affe21ee3077ec45ea9a/scipy-1.15.3-cp311-cp311-win_amd64.whl", hash = "sha256:ae48a786a28412d744c62fd7816a4118ef97e5be0bee968ce8f0a2fba7acf3bb", size = 41246504, upload-time = "2025-05-08T16:06:00.437Z" },
+ { url = "https://files.pythonhosted.org/packages/37/4b/683aa044c4162e10ed7a7ea30527f2cbd92e6999c10a8ed8edb253836e9c/scipy-1.15.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:6ac6310fdbfb7aa6612408bd2f07295bcbd3fda00d2d702178434751fe48e019", size = 38766735, upload-time = "2025-05-08T16:06:06.471Z" },
+ { url = "https://files.pythonhosted.org/packages/7b/7e/f30be3d03de07f25dc0ec926d1681fed5c732d759ac8f51079708c79e680/scipy-1.15.3-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:185cd3d6d05ca4b44a8f1595af87f9c372bb6acf9c808e99aa3e9aa03bd98cf6", size = 30173284, upload-time = "2025-05-08T16:06:11.686Z" },
+ { url = "https://files.pythonhosted.org/packages/07/9c/0ddb0d0abdabe0d181c1793db51f02cd59e4901da6f9f7848e1f96759f0d/scipy-1.15.3-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:05dc6abcd105e1a29f95eada46d4a3f251743cfd7d3ae8ddb4088047f24ea477", size = 22446958, upload-time = "2025-05-08T16:06:15.97Z" },
+ { url = "https://files.pythonhosted.org/packages/af/43/0bce905a965f36c58ff80d8bea33f1f9351b05fad4beaad4eae34699b7a1/scipy-1.15.3-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:06efcba926324df1696931a57a176c80848ccd67ce6ad020c810736bfd58eb1c", size = 25242454, upload-time = "2025-05-08T16:06:20.394Z" },
+ { url = "https://files.pythonhosted.org/packages/56/30/a6f08f84ee5b7b28b4c597aca4cbe545535c39fe911845a96414700b64ba/scipy-1.15.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c05045d8b9bfd807ee1b9f38761993297b10b245f012b11b13b91ba8945f7e45", size = 35210199, upload-time = "2025-05-08T16:06:26.159Z" },
+ { url = "https://files.pythonhosted.org/packages/0b/1f/03f52c282437a168ee2c7c14a1a0d0781a9a4a8962d84ac05c06b4c5b555/scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:271e3713e645149ea5ea3e97b57fdab61ce61333f97cfae392c28ba786f9bb49", size = 37309455, upload-time = "2025-05-08T16:06:32.778Z" },
+ { url = "https://files.pythonhosted.org/packages/89/b1/fbb53137f42c4bf630b1ffdfc2151a62d1d1b903b249f030d2b1c0280af8/scipy-1.15.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:6cfd56fc1a8e53f6e89ba3a7a7251f7396412d655bca2aa5611c8ec9a6784a1e", size = 36885140, upload-time = "2025-05-08T16:06:39.249Z" },
+ { url = "https://files.pythonhosted.org/packages/2e/2e/025e39e339f5090df1ff266d021892694dbb7e63568edcfe43f892fa381d/scipy-1.15.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:0ff17c0bb1cb32952c09217d8d1eed9b53d1463e5f1dd6052c7857f83127d539", size = 39710549, upload-time = "2025-05-08T16:06:45.729Z" },
+ { url = "https://files.pythonhosted.org/packages/e6/eb/3bf6ea8ab7f1503dca3a10df2e4b9c3f6b3316df07f6c0ded94b281c7101/scipy-1.15.3-cp312-cp312-win_amd64.whl", hash = "sha256:52092bc0472cfd17df49ff17e70624345efece4e1a12b23783a1ac59a1b728ed", size = 40966184, upload-time = "2025-05-08T16:06:52.623Z" },
+]
+
+[[package]]
+name = "scipy"
+version = "1.17.0"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+]
+dependencies = [
+ { name = "numpy", marker = "python_full_version >= '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/56/3e/9cca699f3486ce6bc12ff46dc2031f1ec8eb9ccc9a320fdaf925f1417426/scipy-1.17.0.tar.gz", hash = "sha256:2591060c8e648d8b96439e111ac41fd8342fdeff1876be2e19dea3fe8930454e", size = 30396830, upload-time = "2026-01-10T21:34:23.009Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1e/4b/c89c131aa87cad2b77a54eb0fb94d633a842420fa7e919dc2f922037c3d8/scipy-1.17.0-cp311-cp311-macosx_10_14_x86_64.whl", hash = "sha256:2abd71643797bd8a106dff97894ff7869eeeb0af0f7a5ce02e4227c6a2e9d6fd", size = 31381316, upload-time = "2026-01-10T21:24:33.42Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/5f/a6b38f79a07d74989224d5f11b55267714707582908a5f1ae854cf9a9b84/scipy-1.17.0-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:ef28d815f4d2686503e5f4f00edc387ae58dfd7a2f42e348bb53359538f01558", size = 27966760, upload-time = "2026-01-10T21:24:38.911Z" },
+ { url = "https://files.pythonhosted.org/packages/c1/20/095ad24e031ee8ed3c5975954d816b8e7e2abd731e04f8be573de8740885/scipy-1.17.0-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:272a9f16d6bb4667e8b50d25d71eddcc2158a214df1b566319298de0939d2ab7", size = 20138701, upload-time = "2026-01-10T21:24:43.249Z" },
+ { url = "https://files.pythonhosted.org/packages/89/11/4aad2b3858d0337756f3323f8960755704e530b27eb2a94386c970c32cbe/scipy-1.17.0-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:7204fddcbec2fe6598f1c5fdf027e9f259106d05202a959a9f1aecf036adc9f6", size = 22480574, upload-time = "2026-01-10T21:24:47.266Z" },
+ { url = "https://files.pythonhosted.org/packages/85/bd/f5af70c28c6da2227e510875cadf64879855193a687fb19951f0f44cfd6b/scipy-1.17.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fc02c37a5639ee67d8fb646ffded6d793c06c5622d36b35cfa8fe5ececb8f042", size = 32862414, upload-time = "2026-01-10T21:24:52.566Z" },
+ { url = "https://files.pythonhosted.org/packages/ef/df/df1457c4df3826e908879fe3d76bc5b6e60aae45f4ee42539512438cfd5d/scipy-1.17.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:dac97a27520d66c12a34fd90a4fe65f43766c18c0d6e1c0a80f114d2260080e4", size = 35112380, upload-time = "2026-01-10T21:24:58.433Z" },
+ { url = "https://files.pythonhosted.org/packages/5f/bb/88e2c16bd1dd4de19d80d7c5e238387182993c2fb13b4b8111e3927ad422/scipy-1.17.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:ebb7446a39b3ae0fe8f416a9a3fdc6fba3f11c634f680f16a239c5187bc487c0", size = 34922676, upload-time = "2026-01-10T21:25:04.287Z" },
+ { url = "https://files.pythonhosted.org/packages/02/ba/5120242cc735f71fc002cff0303d536af4405eb265f7c60742851e7ccfe9/scipy-1.17.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:474da16199f6af66601a01546144922ce402cb17362e07d82f5a6cf8f963e449", size = 37507599, upload-time = "2026-01-10T21:25:09.851Z" },
+ { url = "https://files.pythonhosted.org/packages/52/c8/08629657ac6c0da198487ce8cd3de78e02cfde42b7f34117d56a3fe249dc/scipy-1.17.0-cp311-cp311-win_amd64.whl", hash = "sha256:255c0da161bd7b32a6c898e7891509e8a9289f0b1c6c7d96142ee0d2b114c2ea", size = 36380284, upload-time = "2026-01-10T21:25:15.632Z" },
+ { url = "https://files.pythonhosted.org/packages/6c/4a/465f96d42c6f33ad324a40049dfd63269891db9324aa66c4a1c108c6f994/scipy-1.17.0-cp311-cp311-win_arm64.whl", hash = "sha256:85b0ac3ad17fa3be50abd7e69d583d98792d7edc08367e01445a1e2076005379", size = 24370427, upload-time = "2026-01-10T21:25:20.514Z" },
+ { url = "https://files.pythonhosted.org/packages/0b/11/7241a63e73ba5a516f1930ac8d5b44cbbfabd35ac73a2d08ca206df007c4/scipy-1.17.0-cp312-cp312-macosx_10_14_x86_64.whl", hash = "sha256:0d5018a57c24cb1dd828bcf51d7b10e65986d549f52ef5adb6b4d1ded3e32a57", size = 31364580, upload-time = "2026-01-10T21:25:25.717Z" },
+ { url = "https://files.pythonhosted.org/packages/ed/1d/5057f812d4f6adc91a20a2d6f2ebcdb517fdbc87ae3acc5633c9b97c8ba5/scipy-1.17.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:88c22af9e5d5a4f9e027e26772cc7b5922fab8bcc839edb3ae33de404feebd9e", size = 27969012, upload-time = "2026-01-10T21:25:30.921Z" },
+ { url = "https://files.pythonhosted.org/packages/e3/21/f6ec556c1e3b6ec4e088da667d9987bb77cc3ab3026511f427dc8451187d/scipy-1.17.0-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:f3cd947f20fe17013d401b64e857c6b2da83cae567adbb75b9dcba865abc66d8", size = 20140691, upload-time = "2026-01-10T21:25:34.802Z" },
+ { url = "https://files.pythonhosted.org/packages/7a/fe/5e5ad04784964ba964a96f16c8d4676aa1b51357199014dce58ab7ec5670/scipy-1.17.0-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:e8c0b331c2c1f531eb51f1b4fc9ba709521a712cce58f1aa627bc007421a5306", size = 22463015, upload-time = "2026-01-10T21:25:39.277Z" },
+ { url = "https://files.pythonhosted.org/packages/4a/69/7c347e857224fcaf32a34a05183b9d8a7aca25f8f2d10b8a698b8388561a/scipy-1.17.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5194c445d0a1c7a6c1a4a4681b6b7c71baad98ff66d96b949097e7513c9d6742", size = 32724197, upload-time = "2026-01-10T21:25:44.084Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/fe/66d73b76d378ba8cc2fe605920c0c75092e3a65ae746e1e767d9d020a75a/scipy-1.17.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9eeb9b5f5997f75507814ed9d298ab23f62cf79f5a3ef90031b1ee2506abdb5b", size = 35009148, upload-time = "2026-01-10T21:25:50.591Z" },
+ { url = "https://files.pythonhosted.org/packages/af/07/07dec27d9dc41c18d8c43c69e9e413431d20c53a0339c388bcf72f353c4b/scipy-1.17.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:40052543f7bbe921df4408f46003d6f01c6af109b9e2c8a66dd1cf6cf57f7d5d", size = 34798766, upload-time = "2026-01-10T21:25:59.41Z" },
+ { url = "https://files.pythonhosted.org/packages/81/61/0470810c8a093cdacd4ba7504b8a218fd49ca070d79eca23a615f5d9a0b0/scipy-1.17.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:0cf46c8013fec9d3694dc572f0b54100c28405d55d3e2cb15e2895b25057996e", size = 37405953, upload-time = "2026-01-10T21:26:07.75Z" },
+ { url = "https://files.pythonhosted.org/packages/92/ce/672ed546f96d5d41ae78c4b9b02006cedd0b3d6f2bf5bb76ea455c320c28/scipy-1.17.0-cp312-cp312-win_amd64.whl", hash = "sha256:0937a0b0d8d593a198cededd4c439a0ea216a3f36653901ea1f3e4be949056f8", size = 36328121, upload-time = "2026-01-10T21:26:16.509Z" },
+ { url = "https://files.pythonhosted.org/packages/9d/21/38165845392cae67b61843a52c6455d47d0cc2a40dd495c89f4362944654/scipy-1.17.0-cp312-cp312-win_arm64.whl", hash = "sha256:f603d8a5518c7426414d1d8f82e253e454471de682ce5e39c29adb0df1efb86b", size = 24314368, upload-time = "2026-01-10T21:26:23.087Z" },
+]
+
+[[package]]
+name = "send2trash"
+version = "2.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/c5/f0/184b4b5f8d00f2a92cf96eec8967a3d550b52cf94362dad1100df9e48d57/send2trash-2.1.0.tar.gz", hash = "sha256:1c72b39f09457db3c05ce1d19158c2cbef4c32b8bedd02c155e49282b7ea7459", size = 17255, upload-time = "2026-01-14T06:27:36.056Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1c/78/504fdd027da3b84ff1aecd9f6957e65f35134534ccc6da8628eb71e76d3f/send2trash-2.1.0-py3-none-any.whl", hash = "sha256:0da2f112e6d6bb22de6aa6daa7e144831a4febf2a87261451c4ad849fe9a873c", size = 17610, upload-time = "2026-01-14T06:27:35.218Z" },
+]
+
+[[package]]
+name = "setuptools"
+version = "80.10.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/76/95/faf61eb8363f26aa7e1d762267a8d602a1b26d4f3a1e758e92cb3cb8b054/setuptools-80.10.2.tar.gz", hash = "sha256:8b0e9d10c784bf7d262c4e5ec5d4ec94127ce206e8738f29a437945fbc219b70", size = 1200343, upload-time = "2026-01-25T22:38:17.252Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/94/b8/f1f62a5e3c0ad2ff1d189590bfa4c46b4f3b6e49cef6f26c6ee4e575394d/setuptools-80.10.2-py3-none-any.whl", hash = "sha256:95b30ddfb717250edb492926c92b5221f7ef3fbcc2b07579bcd4a27da21d0173", size = 1064234, upload-time = "2026-01-25T22:38:15.216Z" },
+]
+
+[[package]]
+name = "shellingham"
+version = "1.5.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/58/15/8b3609fd3830ef7b27b655beb4b4e9c62313a4e8da8c676e142cc210d58e/shellingham-1.5.4.tar.gz", hash = "sha256:8dbca0739d487e5bd35ab3ca4b36e11c4078f3a234bfce294b0a0291363404de", size = 10310, upload-time = "2023-10-24T04:13:40.426Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e0/f9/0595336914c5619e5f28a1fb793285925a8cd4b432c9da0a987836c7f822/shellingham-1.5.4-py2.py3-none-any.whl", hash = "sha256:7ecfff8f2fd72616f7481040475a65b2bf8af90a56c89140852d1120324e8686", size = 9755, upload-time = "2023-10-24T04:13:38.866Z" },
+]
+
+[[package]]
+name = "six"
+version = "1.17.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/94/e7/b2c673351809dca68a0e064b6af791aa332cf192da575fd474ed7d6f16a2/six-1.17.0.tar.gz", hash = "sha256:ff70335d468e7eb6ec65b95b99d3a2836546063f63acc5171de367e834932a81", size = 34031, upload-time = "2024-12-04T17:35:28.174Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/b7/ce/149a00dd41f10bc29e5921b496af8b574d8413afcd5e30dfa0ed46c2cc5e/six-1.17.0-py2.py3-none-any.whl", hash = "sha256:4721f391ed90541fddacab5acf947aa0d3dc7d27b2e1e8eda2be8970586c3274", size = 11050, upload-time = "2024-12-04T17:35:26.475Z" },
+]
+
+[[package]]
+name = "smmap"
+version = "5.0.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/44/cd/a040c4b3119bbe532e5b0732286f805445375489fceaec1f48306068ee3b/smmap-5.0.2.tar.gz", hash = "sha256:26ea65a03958fa0c8a1c7e8c7a58fdc77221b8910f6be2131affade476898ad5", size = 22329, upload-time = "2025-01-02T07:14:40.909Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/04/be/d09147ad1ec7934636ad912901c5fd7667e1c858e19d355237db0d0cd5e4/smmap-5.0.2-py3-none-any.whl", hash = "sha256:b30115f0def7d7531d22a0fb6502488d879e75b260a9db4d0819cfb25403af5e", size = 24303, upload-time = "2025-01-02T07:14:38.724Z" },
+]
+
+[[package]]
+name = "soupsieve"
+version = "2.8.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/7b/ae/2d9c981590ed9999a0d91755b47fc74f74de286b0f5cee14c9269041e6c4/soupsieve-2.8.3.tar.gz", hash = "sha256:3267f1eeea4251fb42728b6dfb746edc9acaffc4a45b27e19450b676586e8349", size = 118627, upload-time = "2026-01-20T04:27:02.457Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/46/2c/1462b1d0a634697ae9e55b3cecdcb64788e8b7d63f54d923fcd0bb140aed/soupsieve-2.8.3-py3-none-any.whl", hash = "sha256:ed64f2ba4eebeab06cc4962affce381647455978ffc1e36bb79a545b91f45a95", size = 37016, upload-time = "2026-01-20T04:27:01.012Z" },
+]
+
+[[package]]
+name = "stack-data"
+version = "0.6.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "asttokens" },
+ { name = "executing" },
+ { name = "pure-eval" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/28/e3/55dcc2cfbc3ca9c29519eb6884dd1415ecb53b0e934862d3559ddcb7e20b/stack_data-0.6.3.tar.gz", hash = "sha256:836a778de4fec4dcd1dcd89ed8abff8a221f58308462e1c4aa2a3cf30148f0b9", size = 44707, upload-time = "2023-09-30T13:58:05.479Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/f1/7b/ce1eafaf1a76852e2ec9b22edecf1daa58175c090266e9f6c64afcd81d91/stack_data-0.6.3-py3-none-any.whl", hash = "sha256:d5558e0c25a4cb0853cddad3d77da9891a08cb85dd9f9f91b9f8cd66e511e695", size = 24521, upload-time = "2023-09-30T13:58:03.53Z" },
+]
+
+[[package]]
+name = "stdlibs"
+version = "2025.10.28"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e4/83/ac15c4a3c059725dcb5f5d76270b986808cc12d2d7d417ee540d37609e46/stdlibs-2025.10.28.tar.gz", hash = "sha256:18db81f45f7783ddf86b80771e061782c70e2f4a8642843b3c80b42cd774b24f", size = 20108, upload-time = "2025-10-28T22:14:42.308Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/65/de/5fcc806280950b9535d3892c7f1f3477efc4c2f8624ae6c0b2c3baf9a339/stdlibs-2025.10.28-py3-none-any.whl", hash = "sha256:fc25a3608c417c7fecec06736a2671adaceafc9f20c3f536d967e894a998afea", size = 59232, upload-time = "2025-10-28T22:14:40.799Z" },
+]
+
+[[package]]
+name = "submitit"
+version = "1.5.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "cloudpickle" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/47/86/497018fb3b74e71bef45df82762b176e6b3d159f29941c20d2f141ec4096/submitit-1.5.4.tar.gz", hash = "sha256:7100848bd1cdda79c7196e54ee830793ae75fd7adde0c5bef738d72360a07508", size = 81538, upload-time = "2025-12-17T19:20:03.396Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ea/bb/711e1c2ebd18a21202c972dd5d5c8e09a921f2d3560e3a53d6350c808ab7/submitit-1.5.4-py3-none-any.whl", hash = "sha256:c26f3a7c8d4150eaf70b1da71e2023e9e9936c93e8342ed7db910f29158561c5", size = 76043, upload-time = "2025-12-17T19:20:01.941Z" },
+]
+
+[[package]]
+name = "sympy"
+version = "1.14.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "mpmath" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/83/d3/803453b36afefb7c2bb238361cd4ae6125a569b4db67cd9e79846ba2d68c/sympy-1.14.0.tar.gz", hash = "sha256:d3d3fe8df1e5a0b42f0e7bdf50541697dbe7d23746e894990c030e2b05e72517", size = 7793921, upload-time = "2025-04-27T18:05:01.611Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/a2/09/77d55d46fd61b4a135c444fc97158ef34a095e5681d0a6c10b75bf356191/sympy-1.14.0-py3-none-any.whl", hash = "sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5", size = 6299353, upload-time = "2025-04-27T18:04:59.103Z" },
+]
+
+[[package]]
+name = "tabulate"
+version = "0.9.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/ec/fe/802052aecb21e3797b8f7902564ab6ea0d60ff8ca23952079064155d1ae1/tabulate-0.9.0.tar.gz", hash = "sha256:0095b12bf5966de529c0feb1fa08671671b3368eec77d7ef7ab114be2c068b3c", size = 81090, upload-time = "2022-10-06T17:21:48.54Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/40/44/4a5f08c96eb108af5cb50b41f76142f0afa346dfa99d5296fe7202a11854/tabulate-0.9.0-py3-none-any.whl", hash = "sha256:024ca478df22e9340661486f85298cff5f6dcdba14f3813e8830015b9ed1948f", size = 35252, upload-time = "2022-10-06T17:21:44.262Z" },
+]
+
+[[package]]
+name = "tensorboard"
+version = "2.20.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "absl-py" },
+ { name = "grpcio" },
+ { name = "markdown" },
+ { name = "numpy" },
+ { name = "packaging" },
+ { name = "pillow" },
+ { name = "protobuf" },
+ { name = "setuptools" },
+ { name = "tensorboard-data-server" },
+ { name = "werkzeug" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/9c/d9/a5db55f88f258ac669a92858b70a714bbbd5acd993820b41ec4a96a4d77f/tensorboard-2.20.0-py3-none-any.whl", hash = "sha256:9dc9f978cb84c0723acf9a345d96c184f0293d18f166bb8d59ee098e6cfaaba6", size = 5525680, upload-time = "2025-07-17T19:20:49.638Z" },
+]
+
+[[package]]
+name = "tensorboard-data-server"
+version = "0.7.2"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/7a/13/e503968fefabd4c6b2650af21e110aa8466fe21432cd7c43a84577a89438/tensorboard_data_server-0.7.2-py3-none-any.whl", hash = "sha256:7e0610d205889588983836ec05dc098e80f97b7e7bbff7e994ebb78f578d0ddb", size = 2356, upload-time = "2023-10-23T21:23:32.16Z" },
+ { url = "https://files.pythonhosted.org/packages/b7/85/dabeaf902892922777492e1d253bb7e1264cadce3cea932f7ff599e53fea/tensorboard_data_server-0.7.2-py3-none-macosx_10_9_x86_64.whl", hash = "sha256:9fe5d24221b29625dbc7328b0436ca7fc1c23de4acf4d272f1180856e32f9f60", size = 4823598, upload-time = "2023-10-23T21:23:33.714Z" },
+ { url = "https://files.pythonhosted.org/packages/73/c6/825dab04195756cf8ff2e12698f22513b3db2f64925bdd41671bfb33aaa5/tensorboard_data_server-0.7.2-py3-none-manylinux_2_31_x86_64.whl", hash = "sha256:ef687163c24185ae9754ed5650eb5bc4d84ff257aabdc33f0cc6f74d8ba54530", size = 6590363, upload-time = "2023-10-23T21:23:35.583Z" },
+]
+
+[[package]]
+name = "termcolor"
+version = "3.3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/46/79/cf31d7a93a8fdc6aa0fbb665be84426a8c5a557d9240b6239e9e11e35fc5/termcolor-3.3.0.tar.gz", hash = "sha256:348871ca648ec6a9a983a13ab626c0acce02f515b9e1983332b17af7979521c5", size = 14434, upload-time = "2025-12-29T12:55:21.882Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/33/d1/8bb87d21e9aeb323cc03034f5eaf2c8f69841e40e4853c2627edf8111ed3/termcolor-3.3.0-py3-none-any.whl", hash = "sha256:cf642efadaf0a8ebbbf4bc7a31cec2f9b5f21a9f726f4ccbb08192c9c26f43a5", size = 7734, upload-time = "2025-12-29T12:55:20.718Z" },
+]
+
+[[package]]
+name = "terminado"
+version = "0.18.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "ptyprocess", marker = "os_name != 'nt'" },
+ { name = "pywinpty", marker = "(os_name == 'nt' and platform_machine != 'aarch64' and sys_platform == 'linux') or (os_name == 'nt' and sys_platform != 'darwin' and sys_platform != 'linux')" },
+ { name = "tornado" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/8a/11/965c6fd8e5cc254f1fe142d547387da17a8ebfd75a3455f637c663fb38a0/terminado-0.18.1.tar.gz", hash = "sha256:de09f2c4b85de4765f7714688fff57d3e75bad1f909b589fde880460c753fd2e", size = 32701, upload-time = "2024-03-12T14:34:39.026Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/6a/9e/2064975477fdc887e47ad42157e214526dcad8f317a948dee17e1659a62f/terminado-0.18.1-py3-none-any.whl", hash = "sha256:a4468e1b37bb318f8a86514f65814e1afc977cf29b3992a4500d9dd305dcceb0", size = 14154, upload-time = "2024-03-12T14:34:36.569Z" },
+]
+
+[[package]]
+name = "threadpoolctl"
+version = "3.6.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b7/4d/08c89e34946fce2aec4fbb45c9016efd5f4d7f24af8e5d93296e935631d8/threadpoolctl-3.6.0.tar.gz", hash = "sha256:8ab8b4aa3491d812b623328249fab5302a68d2d71745c8a4c719a2fcaba9f44e", size = 21274, upload-time = "2025-03-13T13:49:23.031Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/32/d5/f9a850d79b0851d1d4ef6456097579a9005b31fea68726a4ae5f2d82ddd9/threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb", size = 18638, upload-time = "2025-03-13T13:49:21.846Z" },
+]
+
+[[package]]
+name = "tifffile"
+version = "2025.5.10"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version < '3.11' and sys_platform == 'darwin'",
+ "python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "(python_full_version < '3.11' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.11' and sys_platform != 'darwin' and sys_platform != 'linux')",
+]
+dependencies = [
+ { name = "numpy", marker = "python_full_version < '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/44/d0/18fed0fc0916578a4463f775b0fbd9c5fed2392152d039df2fb533bfdd5d/tifffile-2025.5.10.tar.gz", hash = "sha256:018335d34283aa3fd8c263bae5c3c2b661ebc45548fde31504016fcae7bf1103", size = 365290, upload-time = "2025-05-10T19:22:34.386Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/5d/06/bd0a6097da704a7a7c34a94cfd771c3ea3c2f405dd214e790d22c93f6be1/tifffile-2025.5.10-py3-none-any.whl", hash = "sha256:e37147123c0542d67bc37ba5cdd67e12ea6fbe6e86c52bee037a9eb6a064e5ad", size = 226533, upload-time = "2025-05-10T19:22:27.279Z" },
+]
+
+[[package]]
+name = "tifffile"
+version = "2026.1.28"
+source = { registry = "https://pypi.org/simple" }
+resolution-markers = [
+ "python_full_version >= '3.12' and sys_platform == 'darwin'",
+ "python_full_version >= '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version >= '3.12' and sys_platform == 'win32'",
+ "python_full_version >= '3.12' and sys_platform == 'emscripten'",
+ "(python_full_version >= '3.12' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+ "python_full_version == '3.11.*' and sys_platform == 'darwin'",
+ "python_full_version == '3.11.*' and platform_machine == 'aarch64' and sys_platform == 'linux'",
+ "python_full_version == '3.11.*' and sys_platform == 'win32'",
+ "python_full_version == '3.11.*' and sys_platform == 'emscripten'",
+ "(python_full_version == '3.11.*' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version == '3.11.*' and sys_platform != 'darwin' and sys_platform != 'emscripten' and sys_platform != 'linux' and sys_platform != 'win32')",
+]
+dependencies = [
+ { name = "numpy", marker = "python_full_version >= '3.11'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/94/32/38498d2a1a5d70f33f6c3909bbad48557c9a54b0e33a9307ff06b6d416ba/tifffile-2026.1.28.tar.gz", hash = "sha256:537ae6466a8bb555c336108bb1878d8319d52c9c738041d3349454dea6956e1c", size = 374675, upload-time = "2026-01-29T05:17:24.992Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/09/19/529b28ca338c5a88315e71e672badc85eef89460c248c4164f6ce058f8c7/tifffile-2026.1.28-py3-none-any.whl", hash = "sha256:45b08a19cf603dd99952eff54a61519626a1912e4e2a4d355f05938fe4a6e9fd", size = 233011, upload-time = "2026-01-29T05:17:23.078Z" },
+]
+
+[[package]]
+name = "timm"
+version = "1.0.24"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "huggingface-hub" },
+ { name = "pyyaml" },
+ { name = "safetensors" },
+ { name = "torch" },
+ { name = "torchvision" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/f4/9d/0ea45640be447445c8664ce2b10c74f763b0b0b9ed11620d41a4d4baa10c/timm-1.0.24.tar.gz", hash = "sha256:c7b909f43fe2ef8fe62c505e270cd4f1af230dfbc37f2ee93e3608492b9d9a40", size = 2412239, upload-time = "2026-01-07T00:26:17.541Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/92/dd/c1f5b0890f7b5db661bde0864b41cb0275be76851047e5f7e085fe0b455a/timm-1.0.24-py3-none-any.whl", hash = "sha256:8301ac783410c6ad72c73c49326af6d71a9e4d1558238552796e825c2464913f", size = 2560563, upload-time = "2026-01-07T00:26:13.956Z" },
+]
+
+[[package]]
+name = "tinycss2"
+version = "1.4.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "webencodings" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/7a/fd/7a5ee21fd08ff70d3d33a5781c255cbe779659bd03278feb98b19ee550f4/tinycss2-1.4.0.tar.gz", hash = "sha256:10c0972f6fc0fbee87c3edb76549357415e94548c1ae10ebccdea16fb404a9b7", size = 87085, upload-time = "2024-10-24T14:58:29.895Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e6/34/ebdc18bae6aa14fbee1a08b63c015c72b64868ff7dae68808ab500c492e2/tinycss2-1.4.0-py3-none-any.whl", hash = "sha256:3a49cf47b7675da0b15d0c6e1df8df4ebd96e9394bb905a5775adb0d884c5289", size = 26610, upload-time = "2024-10-24T14:58:28.029Z" },
+]
+
+[[package]]
+name = "toml"
+version = "0.10.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/be/ba/1f744cdc819428fc6b5084ec34d9b30660f6f9daaf70eead706e3203ec3c/toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f", size = 22253, upload-time = "2020-11-01T01:40:22.204Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/44/6f/7120676b6d73228c96e17f1f794d8ab046fc910d781c8d151120c3f1569e/toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b", size = 16588, upload-time = "2020-11-01T01:40:20.672Z" },
+]
+
+[[package]]
+name = "tomli"
+version = "2.4.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/82/30/31573e9457673ab10aa432461bee537ce6cef177667deca369efb79df071/tomli-2.4.0.tar.gz", hash = "sha256:aa89c3f6c277dd275d8e243ad24f3b5e701491a860d5121f2cdd399fbb31fc9c", size = 17477, upload-time = "2026-01-11T11:22:38.165Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3c/d9/3dc2289e1f3b32eb19b9785b6a006b28ee99acb37d1d47f78d4c10e28bf8/tomli-2.4.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:b5ef256a3fd497d4973c11bf142e9ed78b150d36f5773f1ca6088c230ffc5867", size = 153663, upload-time = "2026-01-11T11:21:45.27Z" },
+ { url = "https://files.pythonhosted.org/packages/51/32/ef9f6845e6b9ca392cd3f64f9ec185cc6f09f0a2df3db08cbe8809d1d435/tomli-2.4.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:5572e41282d5268eb09a697c89a7bee84fae66511f87533a6f88bd2f7b652da9", size = 148469, upload-time = "2026-01-11T11:21:46.873Z" },
+ { url = "https://files.pythonhosted.org/packages/d6/c2/506e44cce89a8b1b1e047d64bd495c22c9f71f21e05f380f1a950dd9c217/tomli-2.4.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:551e321c6ba03b55676970b47cb1b73f14a0a4dce6a3e1a9458fd6d921d72e95", size = 236039, upload-time = "2026-01-11T11:21:48.503Z" },
+ { url = "https://files.pythonhosted.org/packages/b3/40/e1b65986dbc861b7e986e8ec394598187fa8aee85b1650b01dd925ca0be8/tomli-2.4.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5e3f639a7a8f10069d0e15408c0b96a2a828cfdec6fca05296ebcdcc28ca7c76", size = 243007, upload-time = "2026-01-11T11:21:49.456Z" },
+ { url = "https://files.pythonhosted.org/packages/9c/6f/6e39ce66b58a5b7ae572a0f4352ff40c71e8573633deda43f6a379d56b3e/tomli-2.4.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1b168f2731796b045128c45982d3a4874057626da0e2ef1fdd722848b741361d", size = 240875, upload-time = "2026-01-11T11:21:50.755Z" },
+ { url = "https://files.pythonhosted.org/packages/aa/ad/cb089cb190487caa80204d503c7fd0f4d443f90b95cf4ef5cf5aa0f439b0/tomli-2.4.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:133e93646ec4300d651839d382d63edff11d8978be23da4cc106f5a18b7d0576", size = 246271, upload-time = "2026-01-11T11:21:51.81Z" },
+ { url = "https://files.pythonhosted.org/packages/0b/63/69125220e47fd7a3a27fd0de0c6398c89432fec41bc739823bcc66506af6/tomli-2.4.0-cp311-cp311-win32.whl", hash = "sha256:b6c78bdf37764092d369722d9946cb65b8767bfa4110f902a1b2542d8d173c8a", size = 96770, upload-time = "2026-01-11T11:21:52.647Z" },
+ { url = "https://files.pythonhosted.org/packages/1e/0d/a22bb6c83f83386b0008425a6cd1fa1c14b5f3dd4bad05e98cf3dbbf4a64/tomli-2.4.0-cp311-cp311-win_amd64.whl", hash = "sha256:d3d1654e11d724760cdb37a3d7691f0be9db5fbdaef59c9f532aabf87006dbaa", size = 107626, upload-time = "2026-01-11T11:21:53.459Z" },
+ { url = "https://files.pythonhosted.org/packages/2f/6d/77be674a3485e75cacbf2ddba2b146911477bd887dda9d8c9dfb2f15e871/tomli-2.4.0-cp311-cp311-win_arm64.whl", hash = "sha256:cae9c19ed12d4e8f3ebf46d1a75090e4c0dc16271c5bce1c833ac168f08fb614", size = 94842, upload-time = "2026-01-11T11:21:54.831Z" },
+ { url = "https://files.pythonhosted.org/packages/3c/43/7389a1869f2f26dba52404e1ef13b4784b6b37dac93bac53457e3ff24ca3/tomli-2.4.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:920b1de295e72887bafa3ad9f7a792f811847d57ea6b1215154030cf131f16b1", size = 154894, upload-time = "2026-01-11T11:21:56.07Z" },
+ { url = "https://files.pythonhosted.org/packages/e9/05/2f9bf110b5294132b2edf13fe6ca6ae456204f3d749f623307cbb7a946f2/tomli-2.4.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:7d6d9a4aee98fac3eab4952ad1d73aee87359452d1c086b5ceb43ed02ddb16b8", size = 149053, upload-time = "2026-01-11T11:21:57.467Z" },
+ { url = "https://files.pythonhosted.org/packages/e8/41/1eda3ca1abc6f6154a8db4d714a4d35c4ad90adc0bcf700657291593fbf3/tomli-2.4.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:36b9d05b51e65b254ea6c2585b59d2c4cb91c8a3d91d0ed0f17591a29aaea54a", size = 243481, upload-time = "2026-01-11T11:21:58.661Z" },
+ { url = "https://files.pythonhosted.org/packages/d2/6d/02ff5ab6c8868b41e7d4b987ce2b5f6a51d3335a70aa144edd999e055a01/tomli-2.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1c8a885b370751837c029ef9bc014f27d80840e48bac415f3412e6593bbc18c1", size = 251720, upload-time = "2026-01-11T11:22:00.178Z" },
+ { url = "https://files.pythonhosted.org/packages/7b/57/0405c59a909c45d5b6f146107c6d997825aa87568b042042f7a9c0afed34/tomli-2.4.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8768715ffc41f0008abe25d808c20c3d990f42b6e2e58305d5da280ae7d1fa3b", size = 247014, upload-time = "2026-01-11T11:22:01.238Z" },
+ { url = "https://files.pythonhosted.org/packages/2c/0e/2e37568edd944b4165735687cbaf2fe3648129e440c26d02223672ee0630/tomli-2.4.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7b438885858efd5be02a9a133caf5812b8776ee0c969fea02c45e8e3f296ba51", size = 251820, upload-time = "2026-01-11T11:22:02.727Z" },
+ { url = "https://files.pythonhosted.org/packages/5a/1c/ee3b707fdac82aeeb92d1a113f803cf6d0f37bdca0849cb489553e1f417a/tomli-2.4.0-cp312-cp312-win32.whl", hash = "sha256:0408e3de5ec77cc7f81960c362543cbbd91ef883e3138e81b729fc3eea5b9729", size = 97712, upload-time = "2026-01-11T11:22:03.777Z" },
+ { url = "https://files.pythonhosted.org/packages/69/13/c07a9177d0b3bab7913299b9278845fc6eaaca14a02667c6be0b0a2270c8/tomli-2.4.0-cp312-cp312-win_amd64.whl", hash = "sha256:685306e2cc7da35be4ee914fd34ab801a6acacb061b6a7abca922aaf9ad368da", size = 108296, upload-time = "2026-01-11T11:22:04.86Z" },
+ { url = "https://files.pythonhosted.org/packages/18/27/e267a60bbeeee343bcc279bb9e8fbed0cbe224bc7b2a3dc2975f22809a09/tomli-2.4.0-cp312-cp312-win_arm64.whl", hash = "sha256:5aa48d7c2356055feef06a43611fc401a07337d5b006be13a30f6c58f869e3c3", size = 94553, upload-time = "2026-01-11T11:22:05.854Z" },
+ { url = "https://files.pythonhosted.org/packages/23/d1/136eb2cb77520a31e1f64cbae9d33ec6df0d78bdf4160398e86eec8a8754/tomli-2.4.0-py3-none-any.whl", hash = "sha256:1f776e7d669ebceb01dee46484485f43a4048746235e683bcdffacdf1fb4785a", size = 14477, upload-time = "2026-01-11T11:22:37.446Z" },
+]
+
+[[package]]
+name = "tomlkit"
+version = "0.14.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/c3/af/14b24e41977adb296d6bd1fb59402cf7d60ce364f90c890bd2ec65c43b5a/tomlkit-0.14.0.tar.gz", hash = "sha256:cf00efca415dbd57575befb1f6634c4f42d2d87dbba376128adb42c121b87064", size = 187167, upload-time = "2026-01-13T01:14:53.304Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/b5/11/87d6d29fb5d237229d67973a6c9e06e048f01cf4994dee194ab0ea841814/tomlkit-0.14.0-py3-none-any.whl", hash = "sha256:592064ed85b40fa213469f81ac584f67a4f2992509a7c3ea2d632208623a3680", size = 39310, upload-time = "2026-01-13T01:14:51.965Z" },
+]
+
+[[package]]
+name = "torch"
+version = "2.11.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "cuda-bindings", marker = "sys_platform == 'linux'" },
+ { name = "cuda-toolkit", extra = ["cublas", "cudart", "cufft", "cufile", "cupti", "curand", "cusolver", "cusparse", "nvjitlink", "nvrtc", "nvtx"], marker = "sys_platform == 'linux'" },
+ { name = "filelock" },
+ { name = "fsspec" },
+ { name = "jinja2" },
+ { name = "networkx", version = "3.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+ { name = "networkx", version = "3.6.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+ { name = "nvidia-cudnn-cu13", marker = "sys_platform == 'linux'" },
+ { name = "nvidia-cusparselt-cu13", marker = "sys_platform == 'linux'" },
+ { name = "nvidia-nccl-cu13", marker = "sys_platform == 'linux'" },
+ { name = "nvidia-nvshmem-cu13", marker = "sys_platform == 'linux'" },
+ { name = "setuptools" },
+ { name = "sympy" },
+ { name = "triton", marker = "sys_platform == 'linux'" },
+ { name = "typing-extensions" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ac/f2/c1690994afe461aae2d0cac62251e6802a703dec0a6c549c02ecd0de92a9/torch-2.11.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2c0d7fcfbc0c4e8bb5ebc3907cbc0c6a0da1b8f82b1fc6e14e914fa0b9baf74e", size = 80526521, upload-time = "2026-03-23T18:12:06.86Z" },
+ { url = "https://files.pythonhosted.org/packages/a4/f0/98ae802fa8c09d3149b0c8690741f3f5753c90e779bd28c9613257295945/torch-2.11.0-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:4cf8687f4aec3900f748d553483ef40e0ac38411c3c48d0a86a438f6d7a99b18", size = 419723025, upload-time = "2026-03-23T18:11:43.774Z" },
+ { url = "https://files.pythonhosted.org/packages/f9/1e/18a9b10b4bd34f12d4e561c52b0ae7158707b8193c6cfc0aad2b48167090/torch-2.11.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:1b32ceda909818a03b112006709b02be1877240c31750a8d9c6b7bf5f2d8a6e5", size = 530589207, upload-time = "2026-03-23T18:11:23.756Z" },
+ { url = "https://files.pythonhosted.org/packages/35/40/2d532e8c0e23705be9d1debce5bc37b68d59a39bda7584c26fe9668076fe/torch-2.11.0-cp310-cp310-win_amd64.whl", hash = "sha256:b3c712ae6fb8e7a949051a953fc412fe0a6940337336c3b6f905e905dac5157f", size = 114518313, upload-time = "2026-03-23T18:11:58.281Z" },
+ { url = "https://files.pythonhosted.org/packages/ae/0d/98b410492609e34a155fa8b121b55c7dca229f39636851c3a9ec20edea21/torch-2.11.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7b6a60d48062809f58595509c524b88e6ddec3ebe25833d6462eeab81e5f2ce4", size = 80529712, upload-time = "2026-03-23T18:12:02.608Z" },
+ { url = "https://files.pythonhosted.org/packages/84/03/acea680005f098f79fd70c1d9d5ccc0cb4296ec2af539a0450108232fc0c/torch-2.11.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:d91aac77f24082809d2c5a93f52a5f085032740a1ebc9252a7b052ef5a4fddc6", size = 419718178, upload-time = "2026-03-23T18:10:46.675Z" },
+ { url = "https://files.pythonhosted.org/packages/8c/8b/d7be22fbec9ffee6cff31a39f8750d4b3a65d349a286cf4aec74c2375662/torch-2.11.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:7aa2f9bbc6d4595ba72138026b2074be1233186150e9292865e04b7a63b8c67a", size = 530604548, upload-time = "2026-03-23T18:10:03.569Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/bd/9912d30b68845256aabbb4a40aeefeef3c3b20db5211ccda653544ada4b6/torch-2.11.0-cp311-cp311-win_amd64.whl", hash = "sha256:73e24aaf8f36ab90d95cd1761208b2eb70841c2a9ca1a3f9061b39fc5331b708", size = 114519675, upload-time = "2026-03-23T18:11:52.995Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/8b/69e3008d78e5cee2b30183340cc425081b78afc5eff3d080daab0adda9aa/torch-2.11.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4b5866312ee6e52ea625cd211dcb97d6a2cdc1131a5f15cc0d87eec948f6dd34", size = 80606338, upload-time = "2026-03-23T18:11:34.781Z" },
+ { url = "https://files.pythonhosted.org/packages/13/16/42e5915ebe4868caa6bac83a8ed59db57f12e9a61b7d749d584776ed53d5/torch-2.11.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:f99924682ef0aa6a4ab3b1b76f40dc6e273fca09f367d15a524266db100a723f", size = 419731115, upload-time = "2026-03-23T18:11:06.944Z" },
+ { url = "https://files.pythonhosted.org/packages/1a/c9/82638ef24d7877510f83baf821f5619a61b45568ce21c0a87a91576510aa/torch-2.11.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:0f68f4ac6d95d12e896c3b7a912b5871619542ec54d3649cf48cc1edd4dd2756", size = 530712279, upload-time = "2026-03-23T18:10:31.481Z" },
+ { url = "https://files.pythonhosted.org/packages/1c/ff/6756f1c7ee302f6d202120e0f4f05b432b839908f9071157302cedfc5232/torch-2.11.0-cp312-cp312-win_amd64.whl", hash = "sha256:fbf39280699d1b869f55eac536deceaa1b60bd6788ba74f399cc67e60a5fab10", size = 114556047, upload-time = "2026-03-23T18:10:55.931Z" },
+]
+
+[[package]]
+name = "torchmetrics"
+version = "1.8.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "lightning-utilities" },
+ { name = "numpy" },
+ { name = "packaging" },
+ { name = "torch" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/85/2e/48a887a59ecc4a10ce9e8b35b3e3c5cef29d902c4eac143378526e7485cb/torchmetrics-1.8.2.tar.gz", hash = "sha256:cf64a901036bf107f17a524009eea7781c9c5315d130713aeca5747a686fe7a5", size = 580679, upload-time = "2025-09-03T14:00:54.077Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/02/21/aa0f434434c48490f91b65962b1ce863fdcce63febc166ca9fe9d706c2b6/torchmetrics-1.8.2-py3-none-any.whl", hash = "sha256:08382fd96b923e39e904c4d570f3d49e2cc71ccabd2a94e0f895d1f0dac86242", size = 983161, upload-time = "2025-09-03T14:00:51.921Z" },
+]
+
+[[package]]
+name = "torchvision"
+version = "0.26.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "numpy" },
+ { name = "pillow" },
+ { name = "torch" },
+]
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/74/b4/cdfee31e0402ea035135462cb0ab496e974d56fab6b4e7a1f0cbccb8cd28/torchvision-0.26.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:a06d4772a8e13e772906ed736cc53ec6639e5e60554f8e5fa6ca165aabebc464", size = 1863503, upload-time = "2026-03-23T18:13:01.384Z" },
+ { url = "https://files.pythonhosted.org/packages/e4/74/11fee109841e80ad14e5ca2d80bff6b10eb11b7838ff06f35bfeaa9f7251/torchvision-0.26.0-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:2adfbe438473236191ff077a4a9a0c767436879c89628aa97137e959b0c11a94", size = 7766423, upload-time = "2026-03-23T18:12:56.049Z" },
+ { url = "https://files.pythonhosted.org/packages/5e/00/24d8c7845c3f270153fb81395a5135b2778e2538e81d14c6aea5106c689c/torchvision-0.26.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:b6f9ad1ecc0eab52647298b379ee9426845f8903703e6127973f8f3d049a798b", size = 7518249, upload-time = "2026-03-23T18:12:51.743Z" },
+ { url = "https://files.pythonhosted.org/packages/d7/ed/e53cd7c0da7ae002e5e929c1796ebbe7ec0c700c29f7a0a6696497fb3d8b/torchvision-0.26.0-cp310-cp310-win_amd64.whl", hash = "sha256:f13f12b3791a266de2d599cb8162925261622a037d87fc03132848343cf68f75", size = 3669784, upload-time = "2026-03-23T18:12:49.949Z" },
+ { url = "https://files.pythonhosted.org/packages/b4/bd/d552a2521bade3295b2c6e7a4a0d1022261cab7ca7011f4e2a330dbb3caa/torchvision-0.26.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:55bd6ad4ae77be01ba67a410b05b51f53b0d0ee45f146eb6a0dfb9007e70ab3c", size = 1863499, upload-time = "2026-03-23T18:12:58.696Z" },
+ { url = "https://files.pythonhosted.org/packages/33/bf/21b899792b08cae7a298551c68398a79e333697479ed311b3b067aab4bdc/torchvision-0.26.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:1c55dc8affbcc0eb2060fbabbe996ae9e5839b24bb6419777f17848945a411b1", size = 7767527, upload-time = "2026-03-23T18:12:44.348Z" },
+ { url = "https://files.pythonhosted.org/packages/9a/45/57bbf9e216850d065e66dd31a50f57424b607f1d878ab8956e56a1f4e36b/torchvision-0.26.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:fd10b5f994c210f4f6d6761cf686f82d748554adf486cb0979770c3252868c8f", size = 7519925, upload-time = "2026-03-23T18:12:53.283Z" },
+ { url = "https://files.pythonhosted.org/packages/10/58/ed8f7754299f3e91d6414b6dc09f62b3fa7c6e5d63dfe48d69ab81498a37/torchvision-0.26.0-cp311-cp311-win_amd64.whl", hash = "sha256:de6424b12887ad884f39a0ee446994ae3cd3b6a00a9cafe1bead85a031132af0", size = 3983834, upload-time = "2026-03-23T18:13:00.224Z" },
+ { url = "https://files.pythonhosted.org/packages/ae/e7/56b47cc3b132aea90ccce22bcb8975dec688b002150012acc842846039d0/torchvision-0.26.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:c409e1c3fdebec7a3834465086dbda8bf7680eff79abf7fd2f10c6b59520a7a4", size = 1863502, upload-time = "2026-03-23T18:12:57.326Z" },
+ { url = "https://files.pythonhosted.org/packages/f4/ec/5c31c92c08b65662fe9604a4067ae8232582805949f11ddc042cebe818ed/torchvision-0.26.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:406557718e62fdf10f5706e88d8a5ec000f872da913bf629aab9297622585547", size = 7767944, upload-time = "2026-03-23T18:12:42.805Z" },
+ { url = "https://files.pythonhosted.org/packages/f5/d8/cb6ccda1a1f35a6597645818641701207b3e8e13553e75fce5d86bac74b2/torchvision-0.26.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:d61a5abb6b42a0c0c311996c2ac4b83a94418a97182c83b055a2a4ae985e05aa", size = 7522205, upload-time = "2026-03-23T18:12:54.654Z" },
+ { url = "https://files.pythonhosted.org/packages/1c/a9/c272623a0f735c35f0f6cd6dc74784d4f970e800cf063bb76687895a2ab9/torchvision-0.26.0-cp312-cp312-win_amd64.whl", hash = "sha256:7993c01648e7c61d191b018e84d38fe0825c8fcb2720cd0f37caf7ba14404aa1", size = 4255155, upload-time = "2026-03-23T18:12:32.652Z" },
+]
+
+[[package]]
+name = "tornado"
+version = "6.5.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/37/1d/0a336abf618272d53f62ebe274f712e213f5a03c0b2339575430b8362ef2/tornado-6.5.4.tar.gz", hash = "sha256:a22fa9047405d03260b483980635f0b041989d8bcc9a313f8fe18b411d84b1d7", size = 513632, upload-time = "2025-12-15T19:21:03.836Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ab/a9/e94a9d5224107d7ce3cc1fab8d5dc97f5ea351ccc6322ee4fb661da94e35/tornado-6.5.4-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:d6241c1a16b1c9e4cc28148b1cda97dd1c6cb4fb7068ac1bedc610768dff0ba9", size = 443909, upload-time = "2025-12-15T19:20:48.382Z" },
+ { url = "https://files.pythonhosted.org/packages/db/7e/f7b8d8c4453f305a51f80dbb49014257bb7d28ccb4bbb8dd328ea995ecad/tornado-6.5.4-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:2d50f63dda1d2cac3ae1fa23d254e16b5e38153758470e9956cbc3d813d40843", size = 442163, upload-time = "2025-12-15T19:20:49.791Z" },
+ { url = "https://files.pythonhosted.org/packages/ba/b5/206f82d51e1bfa940ba366a8d2f83904b15942c45a78dd978b599870ab44/tornado-6.5.4-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d1cf66105dc6acb5af613c054955b8137e34a03698aa53272dbda4afe252be17", size = 445746, upload-time = "2025-12-15T19:20:51.491Z" },
+ { url = "https://files.pythonhosted.org/packages/8e/9d/1a3338e0bd30ada6ad4356c13a0a6c35fbc859063fa7eddb309183364ac1/tornado-6.5.4-cp39-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:50ff0a58b0dc97939d29da29cd624da010e7f804746621c78d14b80238669335", size = 445083, upload-time = "2025-12-15T19:20:52.778Z" },
+ { url = "https://files.pythonhosted.org/packages/50/d4/e51d52047e7eb9a582da59f32125d17c0482d065afd5d3bc435ff2120dc5/tornado-6.5.4-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e5fb5e04efa54cf0baabdd10061eb4148e0be137166146fff835745f59ab9f7f", size = 445315, upload-time = "2025-12-15T19:20:53.996Z" },
+ { url = "https://files.pythonhosted.org/packages/27/07/2273972f69ca63dbc139694a3fc4684edec3ea3f9efabf77ed32483b875c/tornado-6.5.4-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9c86b1643b33a4cd415f8d0fe53045f913bf07b4a3ef646b735a6a86047dda84", size = 446003, upload-time = "2025-12-15T19:20:56.101Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/83/41c52e47502bf7260044413b6770d1a48dda2f0246f95ee1384a3cd9c44a/tornado-6.5.4-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:6eb82872335a53dd063a4f10917b3efd28270b56a33db69009606a0312660a6f", size = 445412, upload-time = "2025-12-15T19:20:57.398Z" },
+ { url = "https://files.pythonhosted.org/packages/10/c7/bc96917f06cbee182d44735d4ecde9c432e25b84f4c2086143013e7b9e52/tornado-6.5.4-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:6076d5dda368c9328ff41ab5d9dd3608e695e8225d1cd0fd1e006f05da3635a8", size = 445392, upload-time = "2025-12-15T19:20:58.692Z" },
+ { url = "https://files.pythonhosted.org/packages/0c/1a/d7592328d037d36f2d2462f4bc1fbb383eec9278bc786c1b111cbbd44cfa/tornado-6.5.4-cp39-abi3-win32.whl", hash = "sha256:1768110f2411d5cd281bac0a090f707223ce77fd110424361092859e089b38d1", size = 446481, upload-time = "2025-12-15T19:21:00.008Z" },
+ { url = "https://files.pythonhosted.org/packages/d6/6d/c69be695a0a64fd37a97db12355a035a6d90f79067a3cf936ec2b1dc38cd/tornado-6.5.4-cp39-abi3-win_amd64.whl", hash = "sha256:fa07d31e0cd85c60713f2b995da613588aa03e1303d75705dca6af8babc18ddc", size = 446886, upload-time = "2025-12-15T19:21:01.287Z" },
+ { url = "https://files.pythonhosted.org/packages/50/49/8dc3fd90902f70084bd2cd059d576ddb4f8bb44c2c7c0e33a11422acb17e/tornado-6.5.4-cp39-abi3-win_arm64.whl", hash = "sha256:053e6e16701eb6cbe641f308f4c1a9541f91b6261991160391bfc342e8a551a1", size = 445910, upload-time = "2025-12-15T19:21:02.571Z" },
+]
+
+[[package]]
+name = "tqdm"
+version = "4.67.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "colorama", marker = "sys_platform == 'win32'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/09/a9/6ba95a270c6f1fbcd8dac228323f2777d886cb206987444e4bce66338dd4/tqdm-4.67.3.tar.gz", hash = "sha256:7d825f03f89244ef73f1d4ce193cb1774a8179fd96f31d7e1dcde62092b960bb", size = 169598, upload-time = "2026-02-03T17:35:53.048Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
+]
+
+[[package]]
+name = "trailrunner"
+version = "1.4.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "pathspec" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/4d/93/630e10bacd897daeb9ff5a408f4e7cb0fc2f243e7e3ef00f9e6cf319b11c/trailrunner-1.4.0.tar.gz", hash = "sha256:3fe61e259e6b2e5192f321c265985b7a0dc18497ced62b2da244f08104978398", size = 15836, upload-time = "2023-03-27T07:54:35.515Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/b1/29/21001afea86bac5016c3940b43de3ce4786b0d8337d4ea79bb903c649ce3/trailrunner-1.4.0-py3-none-any.whl", hash = "sha256:a286d39f2723f28d167347f41cf8f232832648709366e722f55cf5545772a48e", size = 11071, upload-time = "2023-03-27T07:54:32.514Z" },
+]
+
+[[package]]
+name = "traitlets"
+version = "5.14.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/eb/79/72064e6a701c2183016abbbfedaba506d81e30e232a68c9f0d6f6fcd1574/traitlets-5.14.3.tar.gz", hash = "sha256:9ed0579d3502c94b4b3732ac120375cda96f923114522847de4b3bb98b96b6b7", size = 161621, upload-time = "2024-04-19T11:11:49.746Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/00/c0/8f5d070730d7836adc9c9b6408dec68c6ced86b304a9b26a14df072a6e8c/traitlets-5.14.3-py3-none-any.whl", hash = "sha256:b74e89e397b1ed28cc831db7aea759ba6640cb3de13090ca145426688ff1ac4f", size = 85359, upload-time = "2024-04-19T11:11:46.763Z" },
+]
+
+[[package]]
+name = "triton"
+version = "3.6.0"
+source = { registry = "https://pypi.org/simple" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/44/ba/b1b04f4b291a3205d95ebd24465de0e5bf010a2df27a4e58a9b5f039d8f2/triton-3.6.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6c723cfb12f6842a0ae94ac307dba7e7a44741d720a40cf0e270ed4a4e3be781", size = 175972180, upload-time = "2026-01-20T16:15:53.664Z" },
+ { url = "https://files.pythonhosted.org/packages/8c/f7/f1c9d3424ab199ac53c2da567b859bcddbb9c9e7154805119f8bd95ec36f/triton-3.6.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a6550fae429e0667e397e5de64b332d1e5695b73650ee75a6146e2e902770bea", size = 188105201, upload-time = "2026-01-20T16:00:29.272Z" },
+ { url = "https://files.pythonhosted.org/packages/0f/2c/96f92f3c60387e14cc45aed49487f3486f89ea27106c1b1376913c62abe4/triton-3.6.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:49df5ef37379c0c2b5c0012286f80174fcf0e073e5ade1ca9a86c36814553651", size = 176081190, upload-time = "2026-01-20T16:16:00.523Z" },
+ { url = "https://files.pythonhosted.org/packages/e0/12/b05ba554d2c623bffa59922b94b0775673de251f468a9609bc9e45de95e9/triton-3.6.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e8e323d608e3a9bfcc2d9efcc90ceefb764a82b99dea12a86d643c72539ad5d3", size = 188214640, upload-time = "2026-01-20T16:00:35.869Z" },
+ { url = "https://files.pythonhosted.org/packages/17/5d/08201db32823bdf77a0e2b9039540080b2e5c23a20706ddba942924ebcd6/triton-3.6.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:374f52c11a711fd062b4bfbb201fd9ac0a5febd28a96fb41b4a0f51dde3157f4", size = 176128243, upload-time = "2026-01-20T16:16:07.857Z" },
+ { url = "https://files.pythonhosted.org/packages/ab/a8/cdf8b3e4c98132f965f88c2313a4b493266832ad47fb52f23d14d4f86bb5/triton-3.6.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:74caf5e34b66d9f3a429af689c1c7128daba1d8208df60e81106b115c00d6fca", size = 188266850, upload-time = "2026-01-20T16:00:43.041Z" },
+]
+
+[[package]]
+name = "typer-slim"
+version = "0.21.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "click" },
+ { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/17/d4/064570dec6358aa9049d4708e4a10407d74c99258f8b2136bb8702303f1a/typer_slim-0.21.1.tar.gz", hash = "sha256:73495dd08c2d0940d611c5a8c04e91c2a0a98600cbd4ee19192255a233b6dbfd", size = 110478, upload-time = "2026-01-06T11:21:11.176Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c8/0a/4aca634faf693e33004796b6cee0ae2e1dba375a800c16ab8d3eff4bb800/typer_slim-0.21.1-py3-none-any.whl", hash = "sha256:6e6c31047f171ac93cc5a973c9e617dbc5ab2bddc4d0a3135dc161b4e2020e0d", size = 47444, upload-time = "2026-01-06T11:21:12.441Z" },
+]
+
+[[package]]
+name = "typing-extensions"
+version = "4.15.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391, upload-time = "2025-08-25T13:49:26.313Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614, upload-time = "2025-08-25T13:49:24.86Z" },
+]
+
+[[package]]
+name = "tzdata"
+version = "2025.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/5e/a7/c202b344c5ca7daf398f3b8a477eeb205cf3b6f32e7ec3a6bac0629ca975/tzdata-2025.3.tar.gz", hash = "sha256:de39c2ca5dc7b0344f2eba86f49d614019d29f060fc4ebc8a417896a620b56a7", size = 196772, upload-time = "2025-12-13T17:45:35.667Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/c7/b0/003792df09decd6849a5e39c28b513c06e84436a54440380862b5aeff25d/tzdata-2025.3-py2.py3-none-any.whl", hash = "sha256:06a47e5700f3081aab02b2e513160914ff0694bce9947d6b76ebd6bf57cfc5d1", size = 348521, upload-time = "2025-12-13T17:45:33.889Z" },
+]
+
+[[package]]
+name = "ufmt"
+version = "2.8.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "black" },
+ { name = "click" },
+ { name = "libcst" },
+ { name = "moreorless" },
+ { name = "tomlkit" },
+ { name = "trailrunner" },
+ { name = "typing-extensions" },
+ { name = "usort" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/18/f8/c25e242a8e12062172dea4117859757a11339bbc39b1a3c7fb6a6de03bb2/ufmt-2.8.0.tar.gz", hash = "sha256:72c9502915497678de9aeab8aa18604890f14f869f7f378dd26e2878bde84f13", size = 24482, upload-time = "2024-10-25T06:21:57.239Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/11/4b/3f1b6f566b6cf70ccc5cba9a638fe4459f1e373c34d74df2e40e41871d70/ufmt-2.8.0-py3-none-any.whl", hash = "sha256:47a690811c576ebd3a0e30d77d43b65c84240e5c1611e5cb4a880bdd7f4507c1", size = 28268, upload-time = "2024-10-25T06:21:55.822Z" },
+]
+
+[[package]]
+name = "uri-template"
+version = "1.3.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/31/c7/0336f2bd0bcbada6ccef7aaa25e443c118a704f828a0620c6fa0207c1b64/uri-template-1.3.0.tar.gz", hash = "sha256:0e00f8eb65e18c7de20d595a14336e9f337ead580c70934141624b6d1ffdacc7", size = 21678, upload-time = "2023-06-21T01:49:05.374Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e7/00/3fca040d7cf8a32776d3d81a00c8ee7457e00f80c649f1e4a863c8321ae9/uri_template-1.3.0-py3-none-any.whl", hash = "sha256:a44a133ea12d44a0c0f06d7d42a52d71282e77e2f937d8abd5655b8d56fc1363", size = 11140, upload-time = "2023-06-21T01:49:03.467Z" },
+]
+
+[[package]]
+name = "urllib3"
+version = "2.6.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/c7/24/5f1b3bdffd70275f6661c76461e25f024d5a38a46f04aaca912426a2b1d3/urllib3-2.6.3.tar.gz", hash = "sha256:1b62b6884944a57dbe321509ab94fd4d3b307075e0c2eae991ac71ee15ad38ed", size = 435556, upload-time = "2026-01-07T16:24:43.925Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/39/08/aaaad47bc4e9dc8c725e68f9d04865dbcb2052843ff09c97b08904852d84/urllib3-2.6.3-py3-none-any.whl", hash = "sha256:bf272323e553dfb2e87d9bfd225ca7b0f467b919d7bbd355436d3fd37cb0acd4", size = 131584, upload-time = "2026-01-07T16:24:42.685Z" },
+]
+
+[[package]]
+name = "usort"
+version = "1.0.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "attrs" },
+ { name = "click" },
+ { name = "libcst" },
+ { name = "moreorless" },
+ { name = "stdlibs" },
+ { name = "toml" },
+ { name = "trailrunner" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ec/f6/0cf8fd139f3deab8f180a03062eb09d0bcfc380baf0df1ae24f82496577d/usort-1.0.2.tar.gz", hash = "sha256:f0dbdfcf18b117323dff3a03df804957ba3b755c1069d2cf98bee133592bd369", size = 78068, upload-time = "2022-03-07T22:09:08.724Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/1b/9b/2967208157a740d24b5a059bc1371a97863fb4c392a6766fd473cf11fea9/usort-1.0.2-py3-none-any.whl", hash = "sha256:0e7ee0702902d4d54fdd35cbc81f5590df2573db29e72aeb6eddaa9e9d01cef9", size = 23261, upload-time = "2022-03-07T22:09:07.14Z" },
+]
+
+[[package]]
+name = "wcwidth"
+version = "0.5.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/c2/62/a7c072fbfefb2980a00f99ca994279cb9ecf310cb2e6b2a4d2a28fe192b3/wcwidth-0.5.3.tar.gz", hash = "sha256:53123b7af053c74e9fe2e92ac810301f6139e64379031f7124574212fb3b4091", size = 157587, upload-time = "2026-01-31T03:52:10.92Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3c/c1/d73f12f8cdb1891334a2ccf7389eed244d3941e74d80dd220badb937f3fb/wcwidth-0.5.3-py3-none-any.whl", hash = "sha256:d584eff31cd4753e1e5ff6c12e1edfdb324c995713f75d26c29807bb84bf649e", size = 92981, upload-time = "2026-01-31T03:52:09.14Z" },
+]
+
+[[package]]
+name = "webcolors"
+version = "25.10.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/1d/7a/eb316761ec35664ea5174709a68bbd3389de60d4a1ebab8808bfc264ed67/webcolors-25.10.0.tar.gz", hash = "sha256:62abae86504f66d0f6364c2a8520de4a0c47b80c03fc3a5f1815fedbef7c19bf", size = 53491, upload-time = "2025-10-31T07:51:03.977Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/e2/cc/e097523dd85c9cf5d354f78310927f1656c422bd7b2613b2db3e3f9a0f2c/webcolors-25.10.0-py3-none-any.whl", hash = "sha256:032c727334856fc0b968f63daa252a1ac93d33db2f5267756623c210e57a4f1d", size = 14905, upload-time = "2025-10-31T07:51:01.778Z" },
+]
+
+[[package]]
+name = "webencodings"
+version = "0.5.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/0b/02/ae6ceac1baeda530866a85075641cec12989bd8d31af6d5ab4a3e8c92f47/webencodings-0.5.1.tar.gz", hash = "sha256:b36a1c245f2d304965eb4e0a82848379241dc04b865afcc4aab16748587e1923", size = 9721, upload-time = "2017-04-05T20:21:34.189Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/f4/24/2a3e3df732393fed8b3ebf2ec078f05546de641fe1b667ee316ec1dcf3b7/webencodings-0.5.1-py2.py3-none-any.whl", hash = "sha256:a0af1213f3c2226497a97e2b3aa01a7e4bee4f403f95be16fc9acd2947514a78", size = 11774, upload-time = "2017-04-05T20:21:32.581Z" },
+]
+
+[[package]]
+name = "websocket-client"
+version = "1.9.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/2c/41/aa4bf9664e4cda14c3b39865b12251e8e7d239f4cd0e3cc1b6c2ccde25c1/websocket_client-1.9.0.tar.gz", hash = "sha256:9e813624b6eb619999a97dc7958469217c3176312b3a16a4bd1bc7e08a46ec98", size = 70576, upload-time = "2025-10-07T21:16:36.495Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/34/db/b10e48aa8fff7407e67470363eac595018441cf32d5e1001567a7aeba5d2/websocket_client-1.9.0-py3-none-any.whl", hash = "sha256:af248a825037ef591efbf6ed20cc5faa03d3b47b9e5a2230a529eeee1c1fc3ef", size = 82616, upload-time = "2025-10-07T21:16:34.951Z" },
+]
+
+[[package]]
+name = "werkzeug"
+version = "3.1.5"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "markupsafe" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5a/70/1469ef1d3542ae7c2c7b72bd5e3a4e6ee69d7978fa8a3af05a38eca5becf/werkzeug-3.1.5.tar.gz", hash = "sha256:6a548b0e88955dd07ccb25539d7d0cc97417ee9e179677d22c7041c8f078ce67", size = 864754, upload-time = "2026-01-08T17:49:23.247Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/ad/e4/8d97cca767bcc1be76d16fb76951608305561c6e056811587f36cb1316a8/werkzeug-3.1.5-py3-none-any.whl", hash = "sha256:5111e36e91086ece91f93268bb39b4a35c1e6f1feac762c9c822ded0a4e322dc", size = 225025, upload-time = "2026-01-08T17:49:21.859Z" },
+]
+
+[[package]]
+name = "widgetsnbextension"
+version = "4.0.15"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/bd/f4/c67440c7fb409a71b7404b7aefcd7569a9c0d6bd071299bf4198ae7a5d95/widgetsnbextension-4.0.15.tar.gz", hash = "sha256:de8610639996f1567952d763a5a41af8af37f2575a41f9852a38f947eb82a3b9", size = 1097402, upload-time = "2025-11-01T21:15:55.178Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/3f/0e/fa3b193432cfc60c93b42f3be03365f5f909d2b3ea410295cf36df739e31/widgetsnbextension-4.0.15-py3-none-any.whl", hash = "sha256:8156704e4346a571d9ce73b84bee86a29906c9abfd7223b7228a28899ccf3366", size = 2196503, upload-time = "2025-11-01T21:15:53.565Z" },
+]
+
+[[package]]
+name = "yacs"
+version = "0.1.8"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+ { name = "pyyaml" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/44/3e/4a45cb0738da6565f134c01d82ba291c746551b5bc82e781ec876eb20909/yacs-0.1.8.tar.gz", hash = "sha256:efc4c732942b3103bea904ee89af98bcd27d01f0ac12d8d4d369f1e7a2914384", size = 11100, upload-time = "2020-08-10T16:37:47.755Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/38/4f/fe9a4d472aa867878ce3bb7efb16654c5d63672b86dc0e6e953a67018433/yacs-0.1.8-py3-none-any.whl", hash = "sha256:99f893e30497a4b66842821bac316386f7bd5c4f47ad35c9073ef089aa33af32", size = 14747, upload-time = "2020-08-10T16:37:46.4Z" },
+]
+
+[[package]]
+name = "yt-dlp"
+version = "2026.2.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/16/be/8e099f3f34bac6851490525fb1a8b62d525a95fcb5af082e8c52ba884fb5/yt_dlp-2026.2.4.tar.gz", hash = "sha256:24733ef081116f29d8ee6eae7a48127101e6c56eb7aa228dd604a60654760022", size = 3100305, upload-time = "2026-02-04T00:49:27.043Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/96/38/b17cbeaf6712a4c1b97f7f9ec3a55f3a8ddee678cc88742af47dca0315b7/yt_dlp-2026.2.4-py3-none-any.whl", hash = "sha256:d6ea83257e8127a0097b1d37ee36201f99a292067e4616b2e5d51ab153b3dbb9", size = 3299165, upload-time = "2026-02-04T00:49:25.31Z" },
+]
+
+[[package]]
+name = "zstandard"
+version = "0.25.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/fd/aa/3e0508d5a5dd96529cdc5a97011299056e14c6505b678fd58938792794b1/zstandard-0.25.0.tar.gz", hash = "sha256:7713e1179d162cf5c7906da876ec2ccb9c3a9dcbdffef0cc7f70c3667a205f0b", size = 711513, upload-time = "2025-09-14T22:15:54.002Z" }
+wheels = [
+ { url = "https://files.pythonhosted.org/packages/56/7a/28efd1d371f1acd037ac64ed1c5e2b41514a6cc937dd6ab6a13ab9f0702f/zstandard-0.25.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:e59fdc271772f6686e01e1b3b74537259800f57e24280be3f29c8a0deb1904dd", size = 795256, upload-time = "2025-09-14T22:15:56.415Z" },
+ { url = "https://files.pythonhosted.org/packages/96/34/ef34ef77f1ee38fc8e4f9775217a613b452916e633c4f1d98f31db52c4a5/zstandard-0.25.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:4d441506e9b372386a5271c64125f72d5df6d2a8e8a2a45a0ae09b03cb781ef7", size = 640565, upload-time = "2025-09-14T22:15:58.177Z" },
+ { url = "https://files.pythonhosted.org/packages/9d/1b/4fdb2c12eb58f31f28c4d28e8dc36611dd7205df8452e63f52fb6261d13e/zstandard-0.25.0-cp310-cp310-manylinux2010_i686.manylinux2014_i686.manylinux_2_12_i686.manylinux_2_17_i686.whl", hash = "sha256:ab85470ab54c2cb96e176f40342d9ed41e58ca5733be6a893b730e7af9c40550", size = 5345306, upload-time = "2025-09-14T22:16:00.165Z" },
+ { url = "https://files.pythonhosted.org/packages/73/28/a44bdece01bca027b079f0e00be3b6bd89a4df180071da59a3dd7381665b/zstandard-0.25.0-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:e05ab82ea7753354bb054b92e2f288afb750e6b439ff6ca78af52939ebbc476d", size = 5055561, upload-time = "2025-09-14T22:16:02.22Z" },
+ { url = "https://files.pythonhosted.org/packages/e9/74/68341185a4f32b274e0fc3410d5ad0750497e1acc20bd0f5b5f64ce17785/zstandard-0.25.0-cp310-cp310-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:78228d8a6a1c177a96b94f7e2e8d012c55f9c760761980da16ae7546a15a8e9b", size = 5402214, upload-time = "2025-09-14T22:16:04.109Z" },
+ { url = "https://files.pythonhosted.org/packages/8b/67/f92e64e748fd6aaffe01e2b75a083c0c4fd27abe1c8747fee4555fcee7dd/zstandard-0.25.0-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:2b6bd67528ee8b5c5f10255735abc21aa106931f0dbaf297c7be0c886353c3d0", size = 5449703, upload-time = "2025-09-14T22:16:06.312Z" },
+ { url = "https://files.pythonhosted.org/packages/fd/e5/6d36f92a197c3c17729a2125e29c169f460538a7d939a27eaaa6dcfcba8e/zstandard-0.25.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:4b6d83057e713ff235a12e73916b6d356e3084fd3d14ced499d84240f3eecee0", size = 5556583, upload-time = "2025-09-14T22:16:08.457Z" },
+ { url = "https://files.pythonhosted.org/packages/d7/83/41939e60d8d7ebfe2b747be022d0806953799140a702b90ffe214d557638/zstandard-0.25.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:9174f4ed06f790a6869b41cba05b43eeb9a35f8993c4422ab853b705e8112bbd", size = 5045332, upload-time = "2025-09-14T22:16:10.444Z" },
+ { url = "https://files.pythonhosted.org/packages/b3/87/d3ee185e3d1aa0133399893697ae91f221fda79deb61adbe998a7235c43f/zstandard-0.25.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:25f8f3cd45087d089aef5ba3848cd9efe3ad41163d3400862fb42f81a3a46701", size = 5572283, upload-time = "2025-09-14T22:16:12.128Z" },
+ { url = "https://files.pythonhosted.org/packages/0a/1d/58635ae6104df96671076ac7d4ae7816838ce7debd94aecf83e30b7121b0/zstandard-0.25.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:3756b3e9da9b83da1796f8809dd57cb024f838b9eeafde28f3cb472012797ac1", size = 4959754, upload-time = "2025-09-14T22:16:14.225Z" },
+ { url = "https://files.pythonhosted.org/packages/75/d6/57e9cb0a9983e9a229dd8fd2e6e96593ef2aa82a3907188436f22b111ccd/zstandard-0.25.0-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:81dad8d145d8fd981b2962b686b2241d3a1ea07733e76a2f15435dfb7fb60150", size = 5266477, upload-time = "2025-09-14T22:16:16.343Z" },
+ { url = "https://files.pythonhosted.org/packages/d1/a9/ee891e5edf33a6ebce0a028726f0bbd8567effe20fe3d5808c42323e8542/zstandard-0.25.0-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:a5a419712cf88862a45a23def0ae063686db3d324cec7edbe40509d1a79a0aab", size = 5440914, upload-time = "2025-09-14T22:16:18.453Z" },
+ { url = "https://files.pythonhosted.org/packages/58/08/a8522c28c08031a9521f27abc6f78dbdee7312a7463dd2cfc658b813323b/zstandard-0.25.0-cp310-cp310-musllinux_1_2_s390x.whl", hash = "sha256:e7360eae90809efd19b886e59a09dad07da4ca9ba096752e61a2e03c8aca188e", size = 5819847, upload-time = "2025-09-14T22:16:20.559Z" },
+ { url = "https://files.pythonhosted.org/packages/6f/11/4c91411805c3f7b6f31c60e78ce347ca48f6f16d552fc659af6ec3b73202/zstandard-0.25.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:75ffc32a569fb049499e63ce68c743155477610532da1eb38e7f24bf7cd29e74", size = 5363131, upload-time = "2025-09-14T22:16:22.206Z" },
+ { url = "https://files.pythonhosted.org/packages/ef/d6/8c4bd38a3b24c4c7676a7a3d8de85d6ee7a983602a734b9f9cdefb04a5d6/zstandard-0.25.0-cp310-cp310-win32.whl", hash = "sha256:106281ae350e494f4ac8a80470e66d1fe27e497052c8d9c3b95dc4cf1ade81aa", size = 436469, upload-time = "2025-09-14T22:16:25.002Z" },
+ { url = "https://files.pythonhosted.org/packages/93/90/96d50ad417a8ace5f841b3228e93d1bb13e6ad356737f42e2dde30d8bd68/zstandard-0.25.0-cp310-cp310-win_amd64.whl", hash = "sha256:ea9d54cc3d8064260114a0bbf3479fc4a98b21dffc89b3459edd506b69262f6e", size = 506100, upload-time = "2025-09-14T22:16:23.569Z" },
+ { url = "https://files.pythonhosted.org/packages/2a/83/c3ca27c363d104980f1c9cee1101cc8ba724ac8c28a033ede6aab89585b1/zstandard-0.25.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:933b65d7680ea337180733cf9e87293cc5500cc0eb3fc8769f4d3c88d724ec5c", size = 795254, upload-time = "2025-09-14T22:16:26.137Z" },
+ { url = "https://files.pythonhosted.org/packages/ac/4d/e66465c5411a7cf4866aeadc7d108081d8ceba9bc7abe6b14aa21c671ec3/zstandard-0.25.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:a3f79487c687b1fc69f19e487cd949bf3aae653d181dfb5fde3bf6d18894706f", size = 640559, upload-time = "2025-09-14T22:16:27.973Z" },
+ { url = "https://files.pythonhosted.org/packages/12/56/354fe655905f290d3b147b33fe946b0f27e791e4b50a5f004c802cb3eb7b/zstandard-0.25.0-cp311-cp311-manylinux2010_i686.manylinux2014_i686.manylinux_2_12_i686.manylinux_2_17_i686.whl", hash = "sha256:0bbc9a0c65ce0eea3c34a691e3c4b6889f5f3909ba4822ab385fab9057099431", size = 5348020, upload-time = "2025-09-14T22:16:29.523Z" },
+ { url = "https://files.pythonhosted.org/packages/3b/13/2b7ed68bd85e69a2069bcc72141d378f22cae5a0f3b353a2c8f50ef30c1b/zstandard-0.25.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:01582723b3ccd6939ab7b3a78622c573799d5d8737b534b86d0e06ac18dbde4a", size = 5058126, upload-time = "2025-09-14T22:16:31.811Z" },
+ { url = "https://files.pythonhosted.org/packages/c9/dd/fdaf0674f4b10d92cb120ccff58bbb6626bf8368f00ebfd2a41ba4a0dc99/zstandard-0.25.0-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:5f1ad7bf88535edcf30038f6919abe087f606f62c00a87d7e33e7fc57cb69fcc", size = 5405390, upload-time = "2025-09-14T22:16:33.486Z" },
+ { url = "https://files.pythonhosted.org/packages/0f/67/354d1555575bc2490435f90d67ca4dd65238ff2f119f30f72d5cde09c2ad/zstandard-0.25.0-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:06acb75eebeedb77b69048031282737717a63e71e4ae3f77cc0c3b9508320df6", size = 5452914, upload-time = "2025-09-14T22:16:35.277Z" },
+ { url = "https://files.pythonhosted.org/packages/bb/1f/e9cfd801a3f9190bf3e759c422bbfd2247db9d7f3d54a56ecde70137791a/zstandard-0.25.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:9300d02ea7c6506f00e627e287e0492a5eb0371ec1670ae852fefffa6164b072", size = 5559635, upload-time = "2025-09-14T22:16:37.141Z" },
+ { url = "https://files.pythonhosted.org/packages/21/88/5ba550f797ca953a52d708c8e4f380959e7e3280af029e38fbf47b55916e/zstandard-0.25.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:bfd06b1c5584b657a2892a6014c2f4c20e0db0208c159148fa78c65f7e0b0277", size = 5048277, upload-time = "2025-09-14T22:16:38.807Z" },
+ { url = "https://files.pythonhosted.org/packages/46/c0/ca3e533b4fa03112facbe7fbe7779cb1ebec215688e5df576fe5429172e0/zstandard-0.25.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:f373da2c1757bb7f1acaf09369cdc1d51d84131e50d5fa9863982fd626466313", size = 5574377, upload-time = "2025-09-14T22:16:40.523Z" },
+ { url = "https://files.pythonhosted.org/packages/12/9b/3fb626390113f272abd0799fd677ea33d5fc3ec185e62e6be534493c4b60/zstandard-0.25.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:6c0e5a65158a7946e7a7affa6418878ef97ab66636f13353b8502d7ea03c8097", size = 4961493, upload-time = "2025-09-14T22:16:43.3Z" },
+ { url = "https://files.pythonhosted.org/packages/cb/d3/23094a6b6a4b1343b27ae68249daa17ae0651fcfec9ed4de09d14b940285/zstandard-0.25.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:c8e167d5adf59476fa3e37bee730890e389410c354771a62e3c076c86f9f7778", size = 5269018, upload-time = "2025-09-14T22:16:45.292Z" },
+ { url = "https://files.pythonhosted.org/packages/8c/a7/bb5a0c1c0f3f4b5e9d5b55198e39de91e04ba7c205cc46fcb0f95f0383c1/zstandard-0.25.0-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:98750a309eb2f020da61e727de7d7ba3c57c97cf6213f6f6277bb7fb42a8e065", size = 5443672, upload-time = "2025-09-14T22:16:47.076Z" },
+ { url = "https://files.pythonhosted.org/packages/27/22/503347aa08d073993f25109c36c8d9f029c7d5949198050962cb568dfa5e/zstandard-0.25.0-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:22a086cff1b6ceca18a8dd6096ec631e430e93a8e70a9ca5efa7561a00f826fa", size = 5822753, upload-time = "2025-09-14T22:16:49.316Z" },
+ { url = "https://files.pythonhosted.org/packages/e2/be/94267dc6ee64f0f8ba2b2ae7c7a2df934a816baaa7291db9e1aa77394c3c/zstandard-0.25.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:72d35d7aa0bba323965da807a462b0966c91608ef3a48ba761678cb20ce5d8b7", size = 5366047, upload-time = "2025-09-14T22:16:51.328Z" },
+ { url = "https://files.pythonhosted.org/packages/7b/a3/732893eab0a3a7aecff8b99052fecf9f605cf0fb5fb6d0290e36beee47a4/zstandard-0.25.0-cp311-cp311-win32.whl", hash = "sha256:f5aeea11ded7320a84dcdd62a3d95b5186834224a9e55b92ccae35d21a8b63d4", size = 436484, upload-time = "2025-09-14T22:16:55.005Z" },
+ { url = "https://files.pythonhosted.org/packages/43/a3/c6155f5c1cce691cb80dfd38627046e50af3ee9ddc5d0b45b9b063bfb8c9/zstandard-0.25.0-cp311-cp311-win_amd64.whl", hash = "sha256:daab68faadb847063d0c56f361a289c4f268706b598afbf9ad113cbe5c38b6b2", size = 506183, upload-time = "2025-09-14T22:16:52.753Z" },
+ { url = "https://files.pythonhosted.org/packages/8c/3e/8945ab86a0820cc0e0cdbf38086a92868a9172020fdab8a03ac19662b0e5/zstandard-0.25.0-cp311-cp311-win_arm64.whl", hash = "sha256:22a06c5df3751bb7dc67406f5374734ccee8ed37fc5981bf1ad7041831fa1137", size = 462533, upload-time = "2025-09-14T22:16:53.878Z" },
+ { url = "https://files.pythonhosted.org/packages/82/fc/f26eb6ef91ae723a03e16eddb198abcfce2bc5a42e224d44cc8b6765e57e/zstandard-0.25.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7b3c3a3ab9daa3eed242d6ecceead93aebbb8f5f84318d82cee643e019c4b73b", size = 795738, upload-time = "2025-09-14T22:16:56.237Z" },
+ { url = "https://files.pythonhosted.org/packages/aa/1c/d920d64b22f8dd028a8b90e2d756e431a5d86194caa78e3819c7bf53b4b3/zstandard-0.25.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:913cbd31a400febff93b564a23e17c3ed2d56c064006f54efec210d586171c00", size = 640436, upload-time = "2025-09-14T22:16:57.774Z" },
+ { url = "https://files.pythonhosted.org/packages/53/6c/288c3f0bd9fcfe9ca41e2c2fbfd17b2097f6af57b62a81161941f09afa76/zstandard-0.25.0-cp312-cp312-manylinux2010_i686.manylinux2014_i686.manylinux_2_12_i686.manylinux_2_17_i686.whl", hash = "sha256:011d388c76b11a0c165374ce660ce2c8efa8e5d87f34996aa80f9c0816698b64", size = 5343019, upload-time = "2025-09-14T22:16:59.302Z" },
+ { url = "https://files.pythonhosted.org/packages/1e/15/efef5a2f204a64bdb5571e6161d49f7ef0fffdbca953a615efbec045f60f/zstandard-0.25.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:6dffecc361d079bb48d7caef5d673c88c8988d3d33fb74ab95b7ee6da42652ea", size = 5063012, upload-time = "2025-09-14T22:17:01.156Z" },
+ { url = "https://files.pythonhosted.org/packages/b7/37/a6ce629ffdb43959e92e87ebdaeebb5ac81c944b6a75c9c47e300f85abdf/zstandard-0.25.0-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.whl", hash = "sha256:7149623bba7fdf7e7f24312953bcf73cae103db8cae49f8154dd1eadc8a29ecb", size = 5394148, upload-time = "2025-09-14T22:17:03.091Z" },
+ { url = "https://files.pythonhosted.org/packages/e3/79/2bf870b3abeb5c070fe2d670a5a8d1057a8270f125ef7676d29ea900f496/zstandard-0.25.0-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.whl", hash = "sha256:6a573a35693e03cf1d67799fd01b50ff578515a8aeadd4595d2a7fa9f3ec002a", size = 5451652, upload-time = "2025-09-14T22:17:04.979Z" },
+ { url = "https://files.pythonhosted.org/packages/53/60/7be26e610767316c028a2cbedb9a3beabdbe33e2182c373f71a1c0b88f36/zstandard-0.25.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:5a56ba0db2d244117ed744dfa8f6f5b366e14148e00de44723413b2f3938a902", size = 5546993, upload-time = "2025-09-14T22:17:06.781Z" },
+ { url = "https://files.pythonhosted.org/packages/85/c7/3483ad9ff0662623f3648479b0380d2de5510abf00990468c286c6b04017/zstandard-0.25.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:10ef2a79ab8e2974e2075fb984e5b9806c64134810fac21576f0668e7ea19f8f", size = 5046806, upload-time = "2025-09-14T22:17:08.415Z" },
+ { url = "https://files.pythonhosted.org/packages/08/b3/206883dd25b8d1591a1caa44b54c2aad84badccf2f1de9e2d60a446f9a25/zstandard-0.25.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:aaf21ba8fb76d102b696781bddaa0954b782536446083ae3fdaa6f16b25a1c4b", size = 5576659, upload-time = "2025-09-14T22:17:10.164Z" },
+ { url = "https://files.pythonhosted.org/packages/9d/31/76c0779101453e6c117b0ff22565865c54f48f8bd807df2b00c2c404b8e0/zstandard-0.25.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:1869da9571d5e94a85a5e8d57e4e8807b175c9e4a6294e3b66fa4efb074d90f6", size = 4953933, upload-time = "2025-09-14T22:17:11.857Z" },
+ { url = "https://files.pythonhosted.org/packages/18/e1/97680c664a1bf9a247a280a053d98e251424af51f1b196c6d52f117c9720/zstandard-0.25.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:809c5bcb2c67cd0ed81e9229d227d4ca28f82d0f778fc5fea624a9def3963f91", size = 5268008, upload-time = "2025-09-14T22:17:13.627Z" },
+ { url = "https://files.pythonhosted.org/packages/1e/73/316e4010de585ac798e154e88fd81bb16afc5c5cb1a72eeb16dd37e8024a/zstandard-0.25.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:f27662e4f7dbf9f9c12391cb37b4c4c3cb90ffbd3b1fb9284dadbbb8935fa708", size = 5433517, upload-time = "2025-09-14T22:17:16.103Z" },
+ { url = "https://files.pythonhosted.org/packages/5b/60/dd0f8cfa8129c5a0ce3ea6b7f70be5b33d2618013a161e1ff26c2b39787c/zstandard-0.25.0-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:99c0c846e6e61718715a3c9437ccc625de26593fea60189567f0118dc9db7512", size = 5814292, upload-time = "2025-09-14T22:17:17.827Z" },
+ { url = "https://files.pythonhosted.org/packages/fc/5f/75aafd4b9d11b5407b641b8e41a57864097663699f23e9ad4dbb91dc6bfe/zstandard-0.25.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:474d2596a2dbc241a556e965fb76002c1ce655445e4e3bf38e5477d413165ffa", size = 5360237, upload-time = "2025-09-14T22:17:19.954Z" },
+ { url = "https://files.pythonhosted.org/packages/ff/8d/0309daffea4fcac7981021dbf21cdb2e3427a9e76bafbcdbdf5392ff99a4/zstandard-0.25.0-cp312-cp312-win32.whl", hash = "sha256:23ebc8f17a03133b4426bcc04aabd68f8236eb78c3760f12783385171b0fd8bd", size = 436922, upload-time = "2025-09-14T22:17:24.398Z" },
+ { url = "https://files.pythonhosted.org/packages/79/3b/fa54d9015f945330510cb5d0b0501e8253c127cca7ebe8ba46a965df18c5/zstandard-0.25.0-cp312-cp312-win_amd64.whl", hash = "sha256:ffef5a74088f1e09947aecf91011136665152e0b4b359c42be3373897fb39b01", size = 506276, upload-time = "2025-09-14T22:17:21.429Z" },
+ { url = "https://files.pythonhosted.org/packages/ea/6b/8b51697e5319b1f9ac71087b0af9a40d8a6288ff8025c36486e0c12abcc4/zstandard-0.25.0-cp312-cp312-win_arm64.whl", hash = "sha256:181eb40e0b6a29b3cd2849f825e0fa34397f649170673d385f3598ae17cca2e9", size = 462679, upload-time = "2025-09-14T22:17:23.147Z" },
+]