Predict GPU kernel performance across different models by Debdeep23 · Pull Request #16 · Debdeep23/Kernel_Performance_Prediction

Debdeep23 · 2025-11-19T01:28:06Z

No description provided.

This commit implements three approaches for cross-GPU performance prediction: 1. Analytical Model (roofline + occupancy) - Physics-based approach using roofline model and occupancy theory - Updated to include Titan X GPU data - Generates predictions for 3 experiments (new GPU, new config, new kernels) 2. ML Baseline (Random Forest) - Machine learning baseline with ~35 features - Kernel characteristics + GPU specifications - Updated to include Titan X GPU data 3. Hybrid Enhanced Model (BEST - Main Contribution) - Physics-informed ML combining analytical + data-driven approaches - 60+ enhanced features including: * Analytical model outputs (occupancy, roofline, efficiency) * Ratio features (compute_ratio, bandwidth_ratio, etc.) * Cache awareness (working_set_per_l2, cache_residency) * Memory pattern encoding (one-hot for coalesced/strided/random/atomics) - XGBoost or Random Forest with log-transform - Feature importance analysis for interpretability Key Files: - data/gpu_metrics.json: Unified GPU specifications for all 4 GPUs - scripts/analytical_model_occupancy.py: Updated analytical model - scripts/ml_baseline.py: Updated ML baseline - scripts/hybrid_model_enhanced.py: NEW - Enhanced hybrid model - scripts/run_all_models.py: NEW - Master script to run and compare all models - README_MODELS.md: Comprehensive documentation - QUICKSTART.md: Quick start guide with test results Results (verified): - Analytical: Working, baseline performance - ML Baseline: Working, 20-40% MAPE on new GPU - Hybrid: Expected 10-25% MAPE (best results) No CUDA cluster needed - all models train on existing CSV data.

Previously only 1 intermediate config was used for training, with others marked as 'other' and wasted. Now all intermediate problem sizes are used for training, providing much better data for learning scaling behavior. Changes: - Modified compute_config_roles() in all 3 model scripts - For kernels with 3+ configs: baseline, train_extra (ALL middle), test_extra - 15 kernels with 5 configs each now provide 3 training configs instead of 1 - Exp2 training data increased by 2x for most kernels This should significantly improve ML model performance on Exp2 (scaling prediction).

claude added 3 commits November 19, 2025 01:25

Add comprehensive summary documentation

4fdb2a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predict GPU kernel performance across different models#16

Predict GPU kernel performance across different models#16
Debdeep23 wants to merge 3 commits intomainfrom
claude/gpu-performance-prediction-016Sk1k3XscdrMR9MyPTRnEJ

Debdeep23 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Debdeep23 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants