A comprehensive data analysis and visualization system for global GDP statistics. This is a semester project for Structured Data Analysis (SDA) - Phase 1.
Group Members:
- Umair Amjad
- Muhammad Ali
- Project Overview
- Features
- Project Structure
- Requirements
- Installation & Setup
- Usage Guide
- Configuration
- Testing
- Version History
- Troubleshooting
The GDP Analysis System is a Python-based data processing and visualization tool that analyzes global GDP statistics. It loads GDP data from a CSV dataset, processes it according to user-defined configurations, and generates four sequential visualizations:
- Bar Chart - GDP comparison across countries/regions
- Histogram - GDP distribution frequency
- Dot Plot - Individual GDP values with reference lines
- Pie Chart - Market share with final statistics report
- Data Loading: Efficiently loads and parses large GDP datasets
- Data Processing: Supports multiple operations (average, sum, min, max, etc.)
- Regional Filtering: Analyze specific regions or continents
- Year-based Analysis: Query data for any year in the dataset
- Sequential Visualizations: Displays graphs one-by-one with user control
- Automatic Graph Saving: All graphs saved to
out/directory - Cross-Platform Support: Works on Linux, macOS, and Windows
- Comprehensive Logging: Detailed output for debugging and tracking
SDA-Project-2026/
├── main.py # Entry point of the application
├── config.json # Configuration file (region, year, operation)
├── requirements.txt # Python dependencies
├── fix_csv.py # Data cleaning utility
├── README.md # This file
│
├── data/
│ ├── gdp_dataset.csv # Original GDP dataset
│ └── gdp_dataset_fixed.csv # Cleaned dataset (generated by fix_csv.py)
│
├── src/
│ ├── __init__.py # Package initialization
│ ├── loader.py # Data loading module
│ ├── processor.py # Data processing logic
│ └── visualizer.py # Visualization module
│
└── out/
├── 01_bar_*.png # Bar chart outputs
├── 02_hist_*.png # Histogram outputs
├── 03_dot_*.png # Dot plot outputs
└── 04_pie_*.png # Pie chart outputs
- Python 3.8+
- Dependencies (see requirements.txt):
- pandas >= 1.3.0
- matplotlib >= 3.4.0
- numpy >= 1.21.0
cd /path/to/SDA-Project-2026pip install -r requirements.txtRun the data cleaning script to fix the CSV file:
python fix_csv.pyThis creates data/gdp_dataset_fixed.csv with cleaned data.
Edit config.json to set your analysis parameters:
{
"region": "Asia",
"year": 2020,
"operation": "average",
"output": "dashboard"
}Configuration Options:
region: Target region/continent (e.g., "Asia", "Europe", "Africa")year: Analysis year (e.g., 2020, 2021)operation: Calculation type ("average", "sum", "min", "max")output: Output mode ("dashboard" for visualizations)
python main.py- Step 1: Loads configuration from
config.json - Step 2: Reads and validates GDP dataset
- Step 3: Processes data based on configuration
- Step 4: Displays 4 sequential visualizations
- Each graph displays in your system's default image viewer
- Close the current graph to see the next one
- All graphs are saved to the
out/folder automatically
$ python main.py
---------------------------------------
GDP Analysis System (SDA 2026)
---------------------------------------
Step 1: Loading Configuration...
-> Region: Asia
-> Year: 2020
Step 2: Loading Dataset from 'data/gdp_dataset_fixed.csv'...
-> Data loaded successfully.
Step 3: Processing Data...
-> Calculation (average): 1,523,398,324,074.54
Step 4: Launching Visualizations...
(Graphs will open sequentially. Close one to see the next.)
-> Opening Graph 1/4: Bar Chart...
-> Opening Graph 2/4: Histogram...
-> Opening Graph 3/4: Dot Plot...
-> Opening Graph 4/4: Pie Chart & Final Report...{
"region": "Asia",
"year": 2020,
"operation": "average",
"output": "dashboard"
}- Africa
- Asia
- Europe
- North America
- South America
- Oceania
| Operation | Description |
|---|---|
average |
Mean GDP value |
sum |
Total GDP |
min |
Minimum GDP |
max |
Maximum GDP |
- Date: February 2026
- Status: Initial Release
- Features Tested:
- Data loading from CSV ✓
- Basic data processing ✓
- Single visualization output ✓
- Date: February 2026
- Status: Sequential Visualization Release
- Features Added:
- Sequential graph display ✓
- System image viewer integration ✓
- User-controlled graph flow (close to advance) ✓
- Improved error handling ✓
- Date: February 2026
- Status: Production Ready
- Features:
- Complete documentation ✓
- Multiple visualization types ✓
- Regional filtering ✓
- Year-based analysis ✓
- Robust error handling ✓
- Cross-platform compatibility ✓
python main.pyExpected: All 4 graphs display sequentially
python fix_csv.pyExpected: Creates gdp_dataset_fixed.csv without errors
Edit config.json with different values and run python main.py
- Delete
config.json→ Should show error message - Delete data files → Should show file not found error
- Invalid region → Should filter to empty results gracefully
Solution: Run python fix_csv.py first to generate the cleaned dataset
Solution:
- Ensure you have an image viewer installed (default system viewer)
- Check that X11 display is available on Linux systems
- Graphs are always saved to
out/folder as backup
Solution: Install dependencies
pip install -r requirements.txtSolution: Check if the year exists in the dataset (1960-2024)
Solution: Verify region name matches dataset (check data file for available regions)
- Displays GDP values as horizontal bars
- Countries sorted by GDP value
- Color: Teal
- Includes grid for readability
- Shows distribution of GDP values across bins
- Default bins: 10
- Color: Sky Blue
- Helps identify GDP concentration patterns
- Individual data points with reference lines
- Color: Purple with gray reference lines
- Useful for outlier detection
- Sorted by GDP value
- Top 8 countries shown individually
- Remaining countries grouped as "Others"
- Percentage labels for each segment
- Final statistics box showing:
- Region name
- Analysis year
- Operation type
- Calculated result
Handles CSV data loading and initial validation
load_data(file_path: str) -> DataFrameProcesses data based on configuration
process_data(df: DataFrame, config: dict) -> tupleCreates and displays visualizations
show_dashboard(data: DataFrame, result_value: float, config: dict) -> None| Version | Date | Status | Changes |
|---|---|---|---|
| 1.0.0 | Feb 2026 | Released | Initial release with basic functionality |
| 1.1.0 | Feb 2026 | Released | Added sequential graph display |
| 1.2.0 | Feb 2026 | Current | Complete documentation and production ready |
- Umair Amjad - Co-Developer
- Muhammad Ali - Co-Developer
This is a semester project for educational purposes.
For issues or questions, please contact the project contributors or refer to the troubleshooting section above.