From 4ed987dd890c5af2df26ec0c69fb69c2c948fed4 Mon Sep 17 00:00:00 2001 From: Innocent Kisoka <76830393+InnocentKisoka@users.noreply.github.com> Date: Wed, 20 May 2026 23:17:00 +0200 Subject: [PATCH] Add project card for open-vocabulary object tracking Added a new project card for an open-vocabulary object tracking system, including details about the project and its functionality. --- index.html | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/index.html b/index.html index fbc5691..7ceb546 100644 --- a/index.html +++ b/index.html @@ -273,6 +273,78 @@

Smart Event Detection for Highlight Clips

+ + + + + +
+ +
+

Object detection, segmentation, tracking, vision-language models

+

Open-Vocabulary Object Tracking with Grounding DINO, SAM 2 and CLIP

+

+ We present an open-vocabulary object tracking system that enables users to search, segment, and track arbitrary objects in images and videos using natural language queries. +

+ Our pipeline combines Grounding DINO for text-conditioned object detection, CLIP for semantic verification, and SAM 2 for segmentation and temporal tracking. +

+ The system supports interactive querying through a Gradio web interface and demonstrates how modern vision foundation models can be integrated into a unified visual understanding pipeline. +

+ +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +