Spatio-Temporal Graph Networks for Tactical Football Analysis
Project-Name/
├── data/ # Strictly local, ignored by Git
│ ├── raw/ # Raw .mp4 broadcast clips
│ └── processed/ # Extracted 2D coordinate CSVs/JSONs
├── notebooks/ # Jupyter notebooks for prototyping and EDA
├── src/ # Core source code
│ ├── cv_pipeline/ # YOLO detection, tracking, and homography scripts
│ ├── gnn_model/ # PyTorch/PyG architecture, dataset loaders, training loop
│ ├── visualization/ # OpenCV scripts to project GNN outputs back onto video
│ └── utils/ # Helper functions (math, video formatting)
├── weights/ # Saved model weights (.pt, .pth) - ignored by Git
├── outputs/ # Final rendered .mp4 files with AR overlays
├── tests/ # Unit tests (Having this sets you apart from 95% of applicants)
├── requirements.txt
├── .gitignore
└── README.md
Standard Recurrent Neural Networks (RNNs) struggle with sports analytics because the number of players (nodes) in the camera frame is highly dynamic. Players enter and leave the broadcast view, causing fixed-size matrices to crash. Furthermore, naive Graph Neural Networks (GNNs) often suffer from oversmoothing, where the network blends all player data into a single average, losing individual tactical intent.
This project implements a custom Spatio-Temporal Graph Neural Network (ST-GNN) designed specifically to handle the physical constraints and dynamic nature of football broadcast tracking.
To solve the crashing matrix problem, this architecture abandons fixed tensors for a custom Dictionary Memory Manager. Hidden GRU states are mapped dynamically to YOLO tracking IDs. As players enter or leave the frame, their sequential memory is instantiated, preserved, or garbage-collected on the fly.
To combat GNN oversmoothing (mode collapse), the model eschews a fully connected topology. Players are only connected to teammates and opponents within a 15-meter dynamic radius. This preserves local tactical interactions while preventing global noise from washing out individual momentum vectors.
Trajectory predictions are anchored using residual skip-connections (identity + predicted_offset), forcing the network to learn kinetic displacement rather than absolute positioning. Gradient clipping (max_norm=1.0) is enforced during the backward pass to protect the model weights from exploding due to tracking artifacts or camera panning outliers.
- Deep Learning: PyTorch, PyTorch Geometric (PyG)
- Computer Vision: OpenCV (
cv2) - Data Engineering: Pandas, NumPy
- Visualization: Matplotlib, OpenCV Picture-in-Picture Rendering
1. Build the Graphs Converts 2D tracking data into PyTorch Geometric distance graphs.
python src/gnn_model/build_graphs.py2. Train the Model Executes the training loop with MinMax scaled spatial data and Z-score standardized velocities.
python src/gnn_model/train.py3. Generate the Broadcast Video Dashboard Compiles the model weights into an OpenCV pipeline, stitching AI predictions onto the broadcast video in real-time.
python src/gnn_model/video_dashboard.py