diff --git a/assets/group_P.png b/assets/group_P.png new file mode 100644 index 0000000..5c6bc9b Binary files /dev/null and b/assets/group_P.png differ diff --git a/index.html b/index.html index 49419b0..fbc5691 100644 --- a/index.html +++ b/index.html @@ -342,7 +342,29 @@

Real time hand gesture detection: from rock paper scissors to sign interpret - +
+ +
+

Action recognition, video understanding, self-supervised embeddings

+

From Raw Footage to Recipe: Extracting Cooking Steps from Egocentric Video

+

+ This project builds a system that watches egocentric cooking videos and automatically extracts the sequence of cooking actions performed, with the goal of reconstructing a recipe from raw footage alone. + Because most frames in a cooking video are irrelevant, the pipeline first applies a relevance classifier to filter out background activity, then routes the remaining clips through an RNN-based action classifier that identifies steps such as cutting, peeling, and boiling. + Video representations are produced by V-JEPA 2, which encodes each video as a sequence of 64-frame block embeddings without requiring labeled pretraining data. + The result is an end-to-end pipeline that turns an unstructured kitchen video into a structured, step-by-step recipe. +

+ +
+
+ +