โฟ Stream-Bit: Near Real-Time Bitcoin Data Pipeline
Professional real-time data pipeline for Bitcoin , combining Data Engineering , Cloud Computing , and Web Development . The project implements data streaming, data lake storage, advanced analytics, and a responsive web dashboard with live updates.
๐ Demo : Dashboard with real-time synchronization between Current Price, Chart, and Statistics
โ
Real-time Dashboard with Server-Sent Events + Chart.js
โ
Complete AWS Pipeline (Firehose โ S3 โ Athena)
โ
Intelligent Caching with TTL optimized by query type
โ
Scalable MVC Architecture with clean separation of concerns
โ
Smart Synchronization - updates only on real price changes
๐๏ธ Technical Architecture
graph LR
A[CoinGecko API] --> B[Bitcoin Extractor]
B --> C[Firehose Stream]
C --> D[S3 Data Lake]
D --> E[Athena Analytics]
E --> F[Flask API + Cache]
F --> G[Real-time Dashboard]
H[User] --> G
G --> H
Loading
src/
โโโ controllers/ # ๐ฎ Orchestration Layer
โ โโโ streaming_controller.py # Stream pipeline coordination
โ โโโ web/api_controller.py # REST API endpoints + SSE
โโโ models/ # ๐ Data & Config Layer
โ โโโ config.py # Centralized configuration
โ โโโ data_schemas.py # Pydantic data schemas
โโโ views/ # ๐ฅ๏ธ Presentation Layer
โ โโโ web/ # Web interface
โ โโโ templates/ # Jinja2 HTML templates
โ โโโ static/js/ # JavaScript (Chart.js + SSE)
โ โโโ static/css/ # Custom CSS styling
โโโ services/ # โ๏ธ Business Logic Layer
โโโ extractors/ # Data extraction services
โ โโโ bitcoin_extractor.py # CoinGecko API integration
โโโ loaders/ # Data loading services
โ โโโ firehose_loader.py # AWS Firehose streaming
โโโ web/ # Web-specific services
โโโ cache_service.py # TTL cache management
โโโ athena_service.py # AWS Athena queries
# Python 3.8+ (recommended 3.11+)
python --version
# AWS CLI configured (optional - for advanced features)
aws configure list
# Clone repository
git clone < repo-url>
cd stream-bit
# Option 1: uv (recommended - faster)
uv sync
# Option 2: traditional pip
pip install -r requirements.txt
# Copy environment file
cp .env.example .env
# Edit configurations (optional)
# AWS_REGION=us-east-1
# FLASK_DEBUG=True
๐ Web Dashboard (Main)
python app.py --mode web --port 8080
Access:
# Continuous streaming to AWS
python app.py --mode stream
# Single test (demo)
python app.py --mode test
๐ Features and Highlights
๐ Live Chart : Chart.js with smart updates (only on price changes)
โก Server-Sent Events : Data streaming with automatic fallback to polling
๐ Dynamic Statistics : Automatic sync between Current Price, Chart, and Statistics
๐ Continuous Extraction : CoinGecko API with retry logic and rate limiting
โ๏ธ AWS Streaming : Kinesis Firehose for robust ingestion
๐๏ธ Data Lake : S3 with automatic Hive-style partitioning
๐ Analytics : AWS Athena + Glue for optimized SQL queries
๐ฆ Format Conversion : Automatic JSON โ Parquet via Firehose
๐ง Intelligent Architecture
โก TTL Cache : Optimized by type
๐ Synchronization : Coordinated updates between all components
๐ Performance : Sliding window (150 points) for fluid charts
๐ Quality and Reliability
โ ๏ธ Error Handling : Comprehensive error pages and API responses
๐ Structured Logging : Structured logging with appropriate levels
๐ Monitoring : Health checks and status page with system metrics
Technology
Version
Purpose
Python
3.8+
Main language
Flask
2.3+
Web framework + REST API
TTLCache
5.3+
In-memory cache with TTL
Pydantic
2.0+
Data validation and schemas
asyncio
Built-in
Async operations
Service
Functionality
AWS Kinesis Firehose
Robust streaming pipeline
AWS S3
Data lake with Hive partitioning
AWS Athena
Serverless query engine
AWS Glue
Data catalog + format conversion
CoinGecko API
Bitcoin data source
Technology
Version
Purpose
Bootstrap
5.3
Responsive CSS framework
Chart.js
4.4
Interactive charts
Server-Sent Events
HTML5
Real-time updates
Jinja2
3.1+
Template engine
Vanilla JavaScript
ES6+
DOM manipulation
๏ฟฝ Development & Quality
Tool
Functionality
structlog
Structured logging
mypy
Static type checking
๐ฏ Use Cases and Examples
๐จโ๐ป For Developers
# Development mode with hot-reload
python app.py --mode web --port 8080 --debug
# Application health check
curl http://localhost:8080/api/health
# Response: {"status": "healthy", "timestamp": "2025-09-16T10:30:00Z"}
# Config page (debug info)
curl http://localhost:8080/config
# Latest Bitcoin price
curl http://localhost:8080/api/bitcoin/latest
# Response: {"price_brl": 617094.0, "timestamp": "2025-09-16T10:30:00Z"}
# Historical hourly data (last 24h)
curl http://localhost:8080/api/bitcoin/hourly? hours=24
# Statistics by period
curl " http://localhost:8080/api/bitcoin/statistics?hours=6"
# Response: {"avg": 615000, "min": 610000, "max": 620000, "count": 360}
# Complete status page
curl http://localhost:8080/status
# Cache metrics
curl http://localhost:8080/api/cache/stats
# Response: {"hits": 245, "misses": 12, "hit_rate": 95.3}
# Stream mode (production)
python app.py --mode stream --log-level INFO
# Test mode (validation)
python app.py --mode test
๐ Performance and Metrics
โก Performance Benchmarks
Metric
Value
Observation
API Latency
<200ms
TTL cache optimized
Stream Throughput
100+ req/min
CoinGecko rate limits
Chart Update
<50ms
Only real changes (>R$ 0.01)
Memory Usage
~50MB
TTL cache + sliding window
๐ฏ Implemented Optimizations
โ
Smart Caching : TTL differentiated by query type
โ
Sliding Window : Maximum 150 points in chart for fluidity
โ
Price Change Detection : Only updates on changes >R$ 0.01
โ
Async Operations : ConcurrentFutures for I/O operations
โ
Query Optimization : Partition projection in Athena
๐ Complete Documentation
๐ก Tip : Use python app.py --help to see all available options
Developed with โค๏ธ to demonstrate competencies in Data Engineering, Cloud Computing, and APIs