This document provides a comprehensive overview of the testing suite for the Soccer Analytics Engineering project. The suite currently consists of 488 tests ensuring data integrity, quality, and business logic correctness across the DuckDB-based StatsBomb data warehouse.
Ensures the database is correctly structured and the ETL process is robust.
| File | Test Class | Description |
|---|---|---|
test_schema.py |
TestTableExistence |
Verifies all required tables (competitions, matches, events, etc.) are present. |
TestTableColumns |
Validates column names and data types (INTEGER, TEXT, BOOLEAN, etc.) for every table. | |
test_etl_resilience.py |
TestJSONValidation |
Ensures the ETL process ignores malformed JSON files and handles non-JSON files gracefully. |
TestSchemaResilience |
Verifies that Primary Keys and Foreign Keys are explicitly defined in the DuckDB schema. |
Validates that connections between tables are unbroken and data is consistent.
| File | Test Class | Description |
|---|---|---|
test_data_integrity.py |
TestPrimaryKeys |
Confirms all primary keys (e.g., match_id, event_uuid) are unique and non-null. |
TestForeignKeys |
Ensures every match points to a valid competition, every event to a valid player, etc. | |
TestDataConsistency |
Checks that team and player names are consistent across related tables. | |
test_data_validation.py |
TestCrossTableRelationship |
Deep validation of relationships like every match having events and valid position lookups. |
test_detailed_validation.py |
TestDetailedMatchValidation |
Cross-verifies assist teams and ensures period markers (Half Start/End) are consistent. |
Checks that the actual data values and derived calculations align with soccer domain rules and physics.
| File | Test Class | Description |
|---|---|---|
test_data_quality.py |
TestCoordinateRanges |
Ensures locations are within the 120x80 pitch bounds (with reasonable buffers). |
TestXGValues |
Validates that Expected Goals (xG) values are strictly between 0 and 1. | |
TestBooleanFlags |
Ensures flags like counterpress default to false rather than NULL. |
|
test_geometry.py |
TestGeometricConsistency |
Trigonometrically verifies pass_length and pass_angle from raw X/Y coordinates. |
| Validates that 'Goal' end locations fall within the physical coordinates of the net. | ||
test_advanced_events.py |
TestSubstitutionEvents |
Verifies replacement IDs and outcomes (Tactical, Injury) are correctly captured. |
Tests the "intelligence" of the ETL pipeline, such as name cleaning and coordinate extraction.
| File | Test Class | Description |
|---|---|---|
test_business_logic.py |
TestPlayerCanonicalization |
Validates the mapping of messy names (e.g., "Marta") to their canonical versions. |
TestPassRecipientConsistency |
Ensures pass recipients in the events table link to valid players. | |
TestJSONFieldTransformations |
Checks that JSON strings are correctly parsed into individual x, y, z columns. |
Ensures time flows forward and players don't perform actions when they aren't on the pitch.
| File | Test Class | Description |
|---|---|---|
test_temporal_consistency.py |
TestEventMonotonicity |
Verifies that event indexes and timestamps are strictly non-decreasing per match. |
TestPostStateIntegrity |
Ensures no events are generated by a player after a Red Card or Substitution. |
Validates the most complex parts of the dataset: lineups and 360-degree tracking frames.
| File | Test Class | Description |
|---|---|---|
test_lineup_consistency.py |
TestLineupEventConsistency |
Ensures players recording events are present in match lineups and checks Starting XI counts. |
test_lineups.py |
TestLineupIntegrity |
Verifies complete rosters and team-match links for all lineups. |
test_threesixty.py |
TestThreeSixtyReferential |
Links 360 tracking frames and player positions back to specific event UUIDs. |
TestThreeSixtyDataQuality |
Validates visibility polygons and player "actor" flags. |
Performs "macro" checks to ensure the dataset represents the real world of football.
| File | Test Class | Description |
|---|---|---|
test_statistical_sanity.py |
TestXGCalibration |
Statistical check: high xG shots must result in goals more often than low xG shots. |
TestSpatialSanity |
Ensures "extreme" errors (like shots taken from one's own penalty area) are flagged. | |
test_detailed_validation.py |
test_match_score_summation_strict |
Sums all "Goal" and "Own Goal" events and compares against the official match score. |
Run all tests using:
pytest