Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 21 additions & 46 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -301,51 +301,6 @@ <h3>Open-Vocabulary Object Tracking with Grounding DINO, SAM 2 and CLIP</h3>
</div>
</article>














































<article class="project-card">
<div class="teaser" role="img" aria-label="Image retrieval with CLIP.">
<img src="assets/group_Q.png" alt="Image retrieval preview" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
Expand Down Expand Up @@ -435,7 +390,27 @@ <h3>From Raw Footage to Recipe: Extracting Cooking Steps from Egocentric Video</
</label>
</div>
</article>


<article class="project-card">
<div class="teaser" role="img" aria-label="AI image captioning system turning video frames into short action labels.">
<img src="assets/group_V.png" alt="Group V image captioning preview" style="position:absolute; inset:0; width:100%; height:100%; object-fit:cover; z-index:2;">
<span class="teaser-label" style="z-index:3;">Group V</span>
</div>
<div class="project-content">
<p class="project-meta">Video understanding, vision-language models, action captioning</p>
<h3>Action/Event-Focused Captioning: A Three-Model Comparison</h3>
<p class="project-abstract">
This project explores how pretrained image-captioning models can be adapted to produce short action-focused captions for video activity timelines. Instead of generating long descriptive captions, we fine-tune BLIP, ViT-GPT2, and Microsoft GIT on COCO action captions so that the models output compact labels such as “person walking” or “coffee being poured.”
<br><br>
For video inference, frames are sampled over time, captioned by the fine-tuned models, and de-duplicated into a simple activity timeline. The project compares original and fine-tuned models using BLEU-1, BLEU-2, METEOR, and ROUGE-L, and analyzes whether architecture choice still matters after all models are adapted to the same action-caption task.
</p>
<label class="project-toggle-label">
<input class="project-toggle" type="checkbox" aria-label="Toggle full project pitch">
<span class="project-toggle-more">Read more</span>
<span class="project-toggle-less">Show less</span>
</label>
</div>
</article>


<article class="project-card add-project-card">
Expand Down