Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added assets/group_W.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
89 changes: 28 additions & 61 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ <h3>Facial Expression Recognition with Hybrid Models</h3>
</div>
</article>

<article class="project-card">
<article class="project-card">
<div class="teaser" role="img" aria-label="Promptable Video Event Finder with Segmentation-Guided Motion Analysis.">
<img src="assets/group_O.png" alt="Highlights Preview" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
<span class="teaser-label" style="z-index:3;">Group O</span>
Expand All @@ -263,7 +263,7 @@ <h3>Smart Event Detection for Highlight Clips</h3>
<br><br>
The goal is to combine modern segmentation models (such as SAM) with classical computer vision techniques. Segmentation serves as a strong perception layer, while event detection is driven by motion-based features such as trajectories, velocity, and frequency analysis, along with lightweight reasoning.
The system follows a modular design, consisting of a general perception and feature extraction pipeline combined with task-specific event detection modules.
<br><br>
<br><br>
The system is primarily designed for human action detection (e.g., waving, raising a hand, standing up). As an extension, it can also handle simple sports scenarios, such as tracking a ball moving toward or crossing a goal, demonstrating its ability to generalize to multi-object interactions.
</p>
<label class="project-toggle-label">
Expand All @@ -274,11 +274,7 @@ <h3>Smart Event Detection for Highlight Clips</h3>
</div>
</article>





<article class="project-card">
<article class="project-card">
<div class="teaser" role="img" aria-label="Open-vocabulary tracking project.">
<img src="assets/group_X.png" alt="Two segmented puppies in a park" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
<span class="teaser-label" style="z-index:3;">Group X</span>
Expand All @@ -290,7 +286,7 @@ <h3>Open-Vocabulary Object Tracking with Grounding DINO, SAM 2 and CLIP</h3>
We present an open-vocabulary object tracking system that enables users to search, segment, and track arbitrary objects in images and videos using natural language queries.
<br><br>
Our pipeline combines Grounding DINO for text-conditioned object detection, CLIP for semantic verification, and SAM 2 for segmentation and temporal tracking.
<br><br>
<br><br>
The system supports interactive querying through a Gradio web interface and demonstrates how modern vision foundation models can be integrated into a unified visual understanding pipeline.
</p>
<label class="project-toggle-label">
Expand All @@ -301,51 +297,6 @@ <h3>Open-Vocabulary Object Tracking with Grounding DINO, SAM 2 and CLIP</h3>
</div>
</article>














































<article class="project-card">
<div class="teaser" role="img" aria-label="Image retrieval with CLIP.">
<img src="assets/group_Q.png" alt="Image retrieval preview" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
Expand All @@ -367,8 +318,7 @@ <h3>Image retrieval with CLIP</h3>
</div>
</article>


<article class="project-card">
<article class="project-card">
<div class="teaser" role="img" aria-label="SfM with Colmap">
<img src="assets/group_I.png" alt="" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
<span class="teaser-label" style="z-index:3;">Group I</span>
Expand Down Expand Up @@ -435,8 +385,29 @@ <h3>From Raw Footage to Recipe: Extracting Cooking Steps from Egocentric Video</
</label>
</div>
</article>



<article class="project-card">
<div class="teaser" role="img" aria-label="Real-time whiteboard transcription pipeline.">
<img src="assets/group_W.png" alt="Whiteboard with detected text regions and entity overlays" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
<span class="teaser-label" style="z-index:3;">Group W</span>
</div>
<div class="project-content">
<p class="project-meta">Computer vision, OCR, segmentation, vision-language models, object tracking</p>
<h3>Real-Time Whiteboard Transcription with Temporal Ledger</h3>
<p class="project-abstract">
When a professor is at the board, you have two choices, pay attention, or copy. You can't really do both at the same time.
<br><br>
We wanted to eliminate that trade-off. Our system transcribes in real time what the professor writes, so the student is free to just listen and understand.
<br><br>
The pipeline captures the full evolution of whiteboard content across a lecture, every correction and erasure included, and synthesises it into structured Markdown output.
</p>
<label class="project-toggle-label">
<input class="project-toggle" type="checkbox" aria-label="Toggle full project pitch">
<span class="project-toggle-more">Read more</span>
<span class="project-toggle-less">Show less</span>
</label>
</div>
</article>

<article class="project-card add-project-card">
<a href="https://github.com/Computer-Vision-2026/Computer-Vision-2026.github.io/edit/main/index.html" target="_blank" rel="noopener">
Expand All @@ -446,11 +417,7 @@ <h3>From Raw Footage to Recipe: Extracting Cooking Steps from Egocentric Video</
</a>
</article>




</div>

</div>
</section>
</main>
Expand Down
Loading