Computer-Vision-2026 · nihermann · Jun 1, 2026 · May 31, 2026
diff --git a/assets/group_W.png b/assets/group_W.png
diff --git a/index.html b/index.html
@@ -249,7 +249,7 @@ <h3>Facial Expression Recognition with Hybrid Models</h3>
             </div>
           </article>
 
-         <article class="project-card">
+          <article class="project-card">
             <div class="teaser" role="img" aria-label="Promptable Video Event Finder with Segmentation-Guided Motion Analysis.">
               <img src="assets/group_O.png" alt="Highlights Preview" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
               <span class="teaser-label" style="z-index:3;">Group O</span>
@@ -263,7 +263,7 @@ <h3>Smart Event Detection for Highlight Clips</h3>
                 <br><br>
                 The goal is to combine modern segmentation models (such as SAM) with classical computer vision techniques. Segmentation serves as a strong perception layer, while event detection is driven by motion-based features such as trajectories, velocity, and frequency analysis, along with lightweight reasoning.
                 The system follows a modular design, consisting of a general perception and feature extraction pipeline combined with task-specific event detection modules.
-                 <br><br>
+                <br><br>
                 The system is primarily designed for human action detection (e.g., waving, raising a hand, standing up). As an extension, it can also handle simple sports scenarios, such as tracking a ball moving toward or crossing a goal, demonstrating its ability to generalize to multi-object interactions.
                 </p>
               <label class="project-toggle-label">
@@ -274,11 +274,7 @@ <h3>Smart Event Detection for Highlight Clips</h3>
             </div>
           </article>
 
-
-
-
-
-           <article class="project-card">
+          <article class="project-card">
             <div class="teaser" role="img" aria-label="Open-vocabulary tracking project.">
               <img src="assets/group_X.png" alt="Two segmented puppies in a park" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
               <span class="teaser-label" style="z-index:3;">Group X</span>
@@ -290,7 +286,7 @@ <h3>Open-Vocabulary Object Tracking with Grounding DINO, SAM 2 and CLIP</h3>
                 We present an open-vocabulary object tracking system that enables users to search, segment, and track arbitrary objects in images and videos using natural language queries.
                 <br><br>
                 Our pipeline combines Grounding DINO for text-conditioned object detection, CLIP for semantic verification, and SAM 2 for segmentation and temporal tracking.
-                 <br><br>
+                <br><br>
                 The system supports interactive querying through a Gradio web interface and demonstrates how modern vision foundation models can be integrated into a unified visual understanding pipeline.
                 </p>
               <label class="project-toggle-label">
@@ -301,51 +297,6 @@ <h3>Open-Vocabulary Object Tracking with Grounding DINO, SAM 2 and CLIP</h3>
             </div>
           </article>
 
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
           <article class="project-card">
             <div class="teaser" role="img" aria-label="Image retrieval with CLIP.">
               <img src="assets/group_Q.png" alt="Image retrieval preview" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
@@ -367,8 +318,7 @@ <h3>Image retrieval with CLIP</h3>
             </div>
           </article>
 
-
-           <article class="project-card">
+          <article class="project-card">
             <div class="teaser" role="img" aria-label="SfM with Colmap">
               <img src="assets/group_I.png" alt="" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
               <span class="teaser-label" style="z-index:3;">Group I</span>
@@ -435,8 +385,29 @@ <h3>From Raw Footage to Recipe: Extracting Cooking Steps from Egocentric Video</
               </label>
             </div>
           </article>
-
-
+
+          <article class="project-card">
+            <div class="teaser" role="img" aria-label="Real-time whiteboard transcription pipeline.">
+              <img src="assets/group_W.png" alt="Whiteboard with detected text regions and entity overlays" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
+              <span class="teaser-label" style="z-index:3;">Group W</span>
+            </div>
+            <div class="project-content">
+              <p class="project-meta">Computer vision, OCR, segmentation, vision-language models, object tracking</p>
+              <h3>Real-Time Whiteboard Transcription with Temporal Ledger</h3>
+              <p class="project-abstract">
+                When a professor is at the board, you have two choices, pay attention, or copy. You can't really do both at the same time.
+                <br><br>
+                We wanted to eliminate that trade-off. Our system transcribes in real time what the professor writes, so the student is free to just listen and understand.
+                <br><br>
+                The pipeline captures the full evolution of whiteboard content across a lecture, every correction and erasure included, and synthesises it into structured Markdown output.
+              </p>
+              <label class="project-toggle-label">
+                <input class="project-toggle" type="checkbox" aria-label="Toggle full project pitch">
+                <span class="project-toggle-more">Read more</span>
+                <span class="project-toggle-less">Show less</span>
+              </label>
+            </div>
+          </article>
 
           <article class="project-card add-project-card">
             <a href="https://github.com/Computer-Vision-2026/Computer-Vision-2026.github.io/edit/main/index.html" target="_blank" rel="noopener">
@@ -446,11 +417,7 @@ <h3>From Raw Footage to Recipe: Extracting Cooking Steps from Egocentric Video</
             </a>
           </article>
 
-
-
-
         </div>
-
       </div>
     </section>
   </main>