Skip to content

tamaki-lab/Query-Matching-for-STAD

Repository files navigation

Query Matching for Spatio-Temporal Action Detection with Query-Based Object Detector

Shimon Hori, Toru Tamaki, Nagoya Institute of Technology

IEEE open access arXiv

DETR

This repository provides the source code for reproducing, training, and evaluating the models proposed in the paper "Query Matching for Spatio-Temporal Action Detection with Query-Based Object Detector."

Repository Structure and Experimental Flow

  • conf/ : Hydra configuration files for experiments, datasets, and models.
  • dataset/ : Implements data loaders, including WebDataset and sequential loaders. WebDataset is for train and val frame-mAP. Sequential loader is for val video-mAP
  • models/ : Implements DETR-based models with temporal shift mechanisms for spatio-temporal action detection.
  • analysis/ : Result analysis and visualization tools.
  • main.py : Entry point for training and evaluation.
  • train.py : Training loop implementation.
  • val.py : Validation functions for frame-level and video-level mAP evaluation.
  • videomap.py : Video-level mAP calculation and tube evaluation.
  • visualization.py : Visualization utilities for features, bounding boxes, and metrics.
  • dash_app.py : Interactive visualization dashboard for feature analysis using Plotly Dash.

The basic experimental flow is as follows:

  1. Edit configuration files in conf/ to specify dataset paths and model hyperparameters.
  2. Run main.py with appropriate Hydra configuration overrides.
  3. Results are logged to Comet.ml and saved locally in experiment_logs/.

Dataset Preparation

JHMDB-21

Follow the instructions at https://github.com/open-mmlab/mmaction2/blob/main/tools/data/jhmdb/README.md to download JHMDB.tar.gz.

Place your datasets in the following directory structure:

project_root/
├─ dataset/
│  └─ jhmdb21/
│     ├─ jhmdb21.yml          # Dataset configuration file
│     ├─ JHMDB-GT.pkl         # Ground truth annotations
│     ├─ Frames/              # video frames
│     │  ├─ brush_hair/
│     │  ├─ catch/
│     │  ├─ clap/
│     │  └─ ...               # 21 action classes
│     ├─ train/               # Training shards (WebDataset format)
│     └─ val/                 # Validation shards (WebDataset format)

Preparing WebDataset Shards

Convert the dataset to WebDataset format for efficient data loading:

python dataset/make_shards.py --subset train --max_tar_files 50
python dataset/make_shards.py --subset val --max_tar_files 50

Environment Setup

Create a virtual environment and install dependencies:

python -m venv .venv_stad
source .venv_stad/bin/activate
pip install -r requirements.txt

Configuration

This project uses Hydra for configuration management. Configuration files are located in conf/:

  • conf/config.yaml : Default configuration
  • conf/experiment/ : Experiment-specific configurations
    • train_jhmdb21.yaml : Training configuration for JHMDB-21
    • val_jhmdb21.yaml : Validation configuration for JHMDB-21
  • conf/dataset/ : Dataset-specific configurations
  • conf/model/ : Model-specific configurations

Training

# Training
python main.py +experiment=train_jhmdb21

# Training with custom parameters
python main.py \
  +experiment=train_jhmdb21 \
  run.val_framemap=True \
  run.val_videomap=True \
  logging.num_images_to_visualize=10

# Validation only
python main.py \
  +experiment=val_jhmdb21 \
  run.only_val=True \
  run.resume_from_checkpoint=PATH/TO/YOUR/CHECKPOINT.pt

Validation

Validation Modes

The framework supports two types of validation:

  1. Frame-level mAP (val_framemap): Evaluates object detection performance on individual frames
  2. Video-level mAP (val_videomap): Evaluates spatio-temporal action detection using action tubes

Running Validation

Run validation on a trained model:

# Basic validation with both frame-level and video-level mAP
python main.py \
  +experiment=val_jhmdb21 \
  run.only_val=True \
  run.resume_from_checkpoint=path/to/your/checkpoint.pt \
  run.val_framemap=True \
  run.val_videomap=True

Validation During Training

Validation is automatically performed during training when enabled:

python main.py \
  +experiment=train_jhmdb21 \
  run.val_framemap=True \
  run.val_videomap=True

Visualization

Feature Visualization Dashboard

Run the interactive Dash application to visualize learned features:

# Using default paths
python dash_app.py

# Using custom paths
python dash_app.py \
  --pickle_path ./experiment_logs/jhmdb21/your_experiment/MLP_features_for_dashapp/MLP_features.pickle \

This provides:

  • t-SNE visualization of MLP features
  • Interactive hover to view corresponding frames

Logging

Results are logged to:

  • Comet.ml: Metrics, images, and confusion matrices (configure in .comet.config)
  • Local files: experiment_logs/{dataset_name}/{experiment_name}/

Citation

If our work is helpful, please help to ⭐ the repo.

Please consider citing our paper if you found our work interesting and useful.

@INPROCEEDINGS{QueryMatching_GCCE2025,
  author={Hori, Shimon and Tamaki, Toru},
  booktitle={2025 IEEE 14th Global Conference on Consumer Electronics (GCCE)},
  title={Query matching for spatio-temporal action detection with query-based object detector},
  year={2025},
  volume={},
  number={},
  pages={1505-1506},
  keywords={Object detection;Detectors;Feature extraction;Decoding;Consumer electronics;spatio-temporal action detection;object detection;DETR;object query;query matching},
  doi={10.1109/GCCE65946.2025.11275351}}

About

This repository provides the source code for reproducing, training, and evaluating the models proposed in the paper "Query Matching for Spatio-Temporal Action Detection with Query-Based Object Detector."

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages