Query Matching for Spatio-Temporal Action Detection with Query-Based Object Detector

Shimon Hori, Toru Tamaki, Nagoya Institute of Technology

This repository provides the source code for reproducing, training, and evaluating the models proposed in the paper "Query Matching for Spatio-Temporal Action Detection with Query-Based Object Detector."

Repository Structure and Experimental Flow

conf/ : Hydra configuration files for experiments, datasets, and models.
dataset/ : Implements data loaders, including WebDataset and sequential loaders. WebDataset is for train and val frame-mAP. Sequential loader is for val video-mAP
models/ : Implements DETR-based models with temporal shift mechanisms for spatio-temporal action detection.
analysis/ : Result analysis and visualization tools.
main.py : Entry point for training and evaluation.
train.py : Training loop implementation.
val.py : Validation functions for frame-level and video-level mAP evaluation.
videomap.py : Video-level mAP calculation and tube evaluation.
visualization.py : Visualization utilities for features, bounding boxes, and metrics.
dash_app.py : Interactive visualization dashboard for feature analysis using Plotly Dash.

The basic experimental flow is as follows:

Edit configuration files in conf/ to specify dataset paths and model hyperparameters.
Run main.py with appropriate Hydra configuration overrides.
Results are logged to Comet.ml and saved locally in experiment_logs/.

Dataset Preparation

JHMDB-21

Follow the instructions at https://github.com/open-mmlab/mmaction2/blob/main/tools/data/jhmdb/README.md to download JHMDB.tar.gz.

Place your datasets in the following directory structure:

project_root/
├─ dataset/
│  └─ jhmdb21/
│     ├─ jhmdb21.yml          # Dataset configuration file
│     ├─ JHMDB-GT.pkl         # Ground truth annotations
│     ├─ Frames/              # video frames
│     │  ├─ brush_hair/
│     │  ├─ catch/
│     │  ├─ clap/
│     │  └─ ...               # 21 action classes
│     ├─ train/               # Training shards (WebDataset format)
│     └─ val/                 # Validation shards (WebDataset format)

Preparing WebDataset Shards

Convert the dataset to WebDataset format for efficient data loading:

python dataset/make_shards.py --subset train --max_tar_files 50
python dataset/make_shards.py --subset val --max_tar_files 50

Environment Setup

Create a virtual environment and install dependencies:

python -m venv .venv_stad
source .venv_stad/bin/activate
pip install -r requirements.txt

Configuration

This project uses Hydra for configuration management. Configuration files are located in conf/:

conf/config.yaml : Default configuration
conf/experiment/ : Experiment-specific configurations
- train_jhmdb21.yaml : Training configuration for JHMDB-21
- val_jhmdb21.yaml : Validation configuration for JHMDB-21
conf/dataset/ : Dataset-specific configurations
conf/model/ : Model-specific configurations

Training

# Training
python main.py +experiment=train_jhmdb21

# Training with custom parameters
python main.py \
  +experiment=train_jhmdb21 \
  run.val_framemap=True \
  run.val_videomap=True \
  logging.num_images_to_visualize=10

# Validation only
python main.py \
  +experiment=val_jhmdb21 \
  run.only_val=True \
  run.resume_from_checkpoint=PATH/TO/YOUR/CHECKPOINT.pt

Validation

Validation Modes

The framework supports two types of validation:

Frame-level mAP (val_framemap): Evaluates object detection performance on individual frames
Video-level mAP (val_videomap): Evaluates spatio-temporal action detection using action tubes

Running Validation

Run validation on a trained model:

# Basic validation with both frame-level and video-level mAP
python main.py \
  +experiment=val_jhmdb21 \
  run.only_val=True \
  run.resume_from_checkpoint=path/to/your/checkpoint.pt \
  run.val_framemap=True \
  run.val_videomap=True

Validation During Training

Validation is automatically performed during training when enabled:

python main.py \
  +experiment=train_jhmdb21 \
  run.val_framemap=True \
  run.val_videomap=True

Visualization

Feature Visualization Dashboard

Run the interactive Dash application to visualize learned features:

# Using default paths
python dash_app.py

# Using custom paths
python dash_app.py \
  --pickle_path ./experiment_logs/jhmdb21/your_experiment/MLP_features_for_dashapp/MLP_features.pickle \

This provides:

t-SNE visualization of MLP features
Interactive hover to view corresponding frames

Logging

Results are logged to:

Comet.ml: Metrics, images, and confusion matrices (configure in .comet.config)
Local files: experiment_logs/{dataset_name}/{experiment_name}/

Citation

If our work is helpful, please help to ⭐ the repo.

Please consider citing our paper if you found our work interesting and useful.

@INPROCEEDINGS{QueryMatching_GCCE2025,
  author={Hori, Shimon and Tamaki, Toru},
  booktitle={2025 IEEE 14th Global Conference on Consumer Electronics (GCCE)},
  title={Query matching for spatio-temporal action detection with query-based object detector},
  year={2025},
  volume={},
  number={},
  pages={1505-1506},
  keywords={Object detection;Detectors;Feature extraction;Decoding;Consumer electronics;spatio-temporal action detection;object detection;DETR;object query;query matching},
  doi={10.1109/GCCE65946.2025.11275351}}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
analysis		analysis
conf		conf
dataset		dataset
img		img
logger		logger
models		models
.comet.config		.comet.config
.flake8		.flake8
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
dash_app.py		dash_app.py
main.py		main.py
make_tube.py		make_tube.py
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py
val.py		val.py
videomap.py		videomap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Query Matching for Spatio-Temporal Action Detection with Query-Based Object Detector

Repository Structure and Experimental Flow

Dataset Preparation

JHMDB-21

Preparing WebDataset Shards

Environment Setup

Configuration

Training

Validation

Validation Modes

Running Validation

Validation During Training

Visualization

Feature Visualization Dashboard

Logging

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Query Matching for Spatio-Temporal Action Detection with Query-Based Object Detector

Repository Structure and Experimental Flow

Dataset Preparation

JHMDB-21

Preparing WebDataset Shards

Environment Setup

Configuration

Training

Validation

Validation Modes

Running Validation

Validation During Training

Visualization

Feature Visualization Dashboard

Logging

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages