Shimon Hori, Toru Tamaki, Nagoya Institute of Technology
This repository provides the source code for reproducing, training, and evaluating the models proposed in the paper "Query Matching for Spatio-Temporal Action Detection with Query-Based Object Detector."
conf/: Hydra configuration files for experiments, datasets, and models.dataset/: Implements data loaders, including WebDataset and sequential loaders. WebDataset is for train and val frame-mAP. Sequential loader is for val video-mAPmodels/: Implements DETR-based models with temporal shift mechanisms for spatio-temporal action detection.analysis/: Result analysis and visualization tools.main.py: Entry point for training and evaluation.train.py: Training loop implementation.val.py: Validation functions for frame-level and video-level mAP evaluation.videomap.py: Video-level mAP calculation and tube evaluation.visualization.py: Visualization utilities for features, bounding boxes, and metrics.dash_app.py: Interactive visualization dashboard for feature analysis using Plotly Dash.
The basic experimental flow is as follows:
- Edit configuration files in
conf/to specify dataset paths and model hyperparameters. - Run
main.pywith appropriate Hydra configuration overrides. - Results are logged to Comet.ml and saved locally in
experiment_logs/.
Follow the instructions at https://github.com/open-mmlab/mmaction2/blob/main/tools/data/jhmdb/README.md to download JHMDB.tar.gz.
Place your datasets in the following directory structure:
project_root/
├─ dataset/
│ └─ jhmdb21/
│ ├─ jhmdb21.yml # Dataset configuration file
│ ├─ JHMDB-GT.pkl # Ground truth annotations
│ ├─ Frames/ # video frames
│ │ ├─ brush_hair/
│ │ ├─ catch/
│ │ ├─ clap/
│ │ └─ ... # 21 action classes
│ ├─ train/ # Training shards (WebDataset format)
│ └─ val/ # Validation shards (WebDataset format)
Convert the dataset to WebDataset format for efficient data loading:
python dataset/make_shards.py --subset train --max_tar_files 50
python dataset/make_shards.py --subset val --max_tar_files 50Create a virtual environment and install dependencies:
python -m venv .venv_stad
source .venv_stad/bin/activate
pip install -r requirements.txtThis project uses Hydra for configuration management. Configuration files are located in conf/:
conf/config.yaml: Default configurationconf/experiment/: Experiment-specific configurationstrain_jhmdb21.yaml: Training configuration for JHMDB-21val_jhmdb21.yaml: Validation configuration for JHMDB-21
conf/dataset/: Dataset-specific configurationsconf/model/: Model-specific configurations
# Training
python main.py +experiment=train_jhmdb21
# Training with custom parameters
python main.py \
+experiment=train_jhmdb21 \
run.val_framemap=True \
run.val_videomap=True \
logging.num_images_to_visualize=10
# Validation only
python main.py \
+experiment=val_jhmdb21 \
run.only_val=True \
run.resume_from_checkpoint=PATH/TO/YOUR/CHECKPOINT.ptThe framework supports two types of validation:
- Frame-level mAP (val_framemap): Evaluates object detection performance on individual frames
- Video-level mAP (val_videomap): Evaluates spatio-temporal action detection using action tubes
Run validation on a trained model:
# Basic validation with both frame-level and video-level mAP
python main.py \
+experiment=val_jhmdb21 \
run.only_val=True \
run.resume_from_checkpoint=path/to/your/checkpoint.pt \
run.val_framemap=True \
run.val_videomap=TrueValidation is automatically performed during training when enabled:
python main.py \
+experiment=train_jhmdb21 \
run.val_framemap=True \
run.val_videomap=TrueRun the interactive Dash application to visualize learned features:
# Using default paths
python dash_app.py
# Using custom paths
python dash_app.py \
--pickle_path ./experiment_logs/jhmdb21/your_experiment/MLP_features_for_dashapp/MLP_features.pickle \This provides:
- t-SNE visualization of MLP features
- Interactive hover to view corresponding frames
Results are logged to:
- Comet.ml: Metrics, images, and confusion matrices (configure in
.comet.config) - Local files:
experiment_logs/{dataset_name}/{experiment_name}/
If our work is helpful, please help to ⭐ the repo.
Please consider citing our paper if you found our work interesting and useful.
@INPROCEEDINGS{QueryMatching_GCCE2025,
author={Hori, Shimon and Tamaki, Toru},
booktitle={2025 IEEE 14th Global Conference on Consumer Electronics (GCCE)},
title={Query matching for spatio-temporal action detection with query-based object detector},
year={2025},
volume={},
number={},
pages={1505-1506},
keywords={Object detection;Detectors;Feature extraction;Decoding;Consumer electronics;spatio-temporal action detection;object detection;DETR;object query;query matching},
doi={10.1109/GCCE65946.2025.11275351}}
