Skip to content

Latest commit

 

History

History
78 lines (62 loc) · 4.99 KB

File metadata and controls

78 lines (62 loc) · 4.99 KB

Soccer Analytics Engineering - Test Specification Summary

This document provides a comprehensive overview of the testing suite for the Soccer Analytics Engineering project. The suite currently consists of 488 tests ensuring data integrity, quality, and business logic correctness across the DuckDB-based StatsBomb data warehouse.

1. Core Database Schema & Resilience

Ensures the database is correctly structured and the ETL process is robust.

File Test Class Description
test_schema.py TestTableExistence Verifies all required tables (competitions, matches, events, etc.) are present.
TestTableColumns Validates column names and data types (INTEGER, TEXT, BOOLEAN, etc.) for every table.
test_etl_resilience.py TestJSONValidation Ensures the ETL process ignores malformed JSON files and handles non-JSON files gracefully.
TestSchemaResilience Verifies that Primary Keys and Foreign Keys are explicitly defined in the DuckDB schema.

2. Data Integrity & Referential Consistency

Validates that connections between tables are unbroken and data is consistent.

File Test Class Description
test_data_integrity.py TestPrimaryKeys Confirms all primary keys (e.g., match_id, event_uuid) are unique and non-null.
TestForeignKeys Ensures every match points to a valid competition, every event to a valid player, etc.
TestDataConsistency Checks that team and player names are consistent across related tables.
test_data_validation.py TestCrossTableRelationship Deep validation of relationships like every match having events and valid position lookups.
test_detailed_validation.py TestDetailedMatchValidation Cross-verifies assist teams and ensures period markers (Half Start/End) are consistent.

3. Data Quality & Geometric Integrity

Checks that the actual data values and derived calculations align with soccer domain rules and physics.

File Test Class Description
test_data_quality.py TestCoordinateRanges Ensures locations are within the 120x80 pitch bounds (with reasonable buffers).
TestXGValues Validates that Expected Goals (xG) values are strictly between 0 and 1.
TestBooleanFlags Ensures flags like counterpress default to false rather than NULL.
test_geometry.py TestGeometricConsistency Trigonometrically verifies pass_length and pass_angle from raw X/Y coordinates.
Validates that 'Goal' end locations fall within the physical coordinates of the net.
test_advanced_events.py TestSubstitutionEvents Verifies replacement IDs and outcomes (Tactical, Injury) are correctly captured.

4. Business Logic & Transformations

Tests the "intelligence" of the ETL pipeline, such as name cleaning and coordinate extraction.

File Test Class Description
test_business_logic.py TestPlayerCanonicalization Validates the mapping of messy names (e.g., "Marta") to their canonical versions.
TestPassRecipientConsistency Ensures pass recipients in the events table link to valid players.
TestJSONFieldTransformations Checks that JSON strings are correctly parsed into individual x, y, z columns.

5. Temporal & Sequential Consistency

Ensures time flows forward and players don't perform actions when they aren't on the pitch.

File Test Class Description
test_temporal_consistency.py TestEventMonotonicity Verifies that event indexes and timestamps are strictly non-decreasing per match.
TestPostStateIntegrity Ensures no events are generated by a player after a Red Card or Substitution.

6. Advanced Tracking (360 & Lineups)

Validates the most complex parts of the dataset: lineups and 360-degree tracking frames.

File Test Class Description
test_lineup_consistency.py TestLineupEventConsistency Ensures players recording events are present in match lineups and checks Starting XI counts.
test_lineups.py TestLineupIntegrity Verifies complete rosters and team-match links for all lineups.
test_threesixty.py TestThreeSixtyReferential Links 360 tracking frames and player positions back to specific event UUIDs.
TestThreeSixtyDataQuality Validates visibility polygons and player "actor" flags.

7. Statistical Sanity

Performs "macro" checks to ensure the dataset represents the real world of football.

File Test Class Description
test_statistical_sanity.py TestXGCalibration Statistical check: high xG shots must result in goals more often than low xG shots.
TestSpatialSanity Ensures "extreme" errors (like shots taken from one's own penalty area) are flagged.
test_detailed_validation.py test_match_score_summation_strict Sums all "Goal" and "Own Goal" events and compares against the official match score.

Run all tests using:

pytest