A comprehensive AI-powered machine learning system for detecting unusual trading patterns and potential fraud in financial markets (equities, crypto, FX). This project provides a complete end-to-end solution with an enhanced web dashboard, AI-powered insights, and state-of-the-art anomaly detection techniques.
- Modern UI: Beautiful, responsive interface with light/dark theme support
- Interactive Pages: Data Collection, Feature Engineering, Model Training, Anomaly Detection, Analytics
- Real-time Visualization: Interactive charts and graphs with Plotly
- Comprehensive Settings: Customizable system configuration
- AI Anomaly Insights: Intelligent explanations of detected anomalies
- Market Analysis: AI-powered market condition assessment
- Trading Recommendations: AI-generated buy/sell/hold signals
- Risk Assessment: AI evaluation of portfolio risks
- Demo Mode: Test AI features without API keys
- Multi-source Data Collection: Yahoo Finance, Binance, CoinGecko, FX data
- Advanced ML Models: Isolation Forest, Autoencoder, Graph Neural Networks
- Comprehensive Feature Engineering: 50+ technical indicators and financial features
- Model Evaluation: Multiple metrics for unsupervised anomaly detection
- Real-time Detection: Support for streaming data analysis
- Equities: Yahoo Finance API (free)
- Cryptocurrency: Binance API, CoinGecko API (free tiers available)
- Forex: Alpha Vantage, ExchangeRate-API, Fixer.io (free tiers available)
- Isolation Forest: Fast anomaly detection using tree-based isolation
- Autoencoder: Reconstruction-based anomaly detection using neural networks
- Graph Neural Network: Correlation-aware anomaly detection for multiple assets
-
Clone the repository:
git clone <repository-url> cd anomaly-detection
-
Install dependencies:
pip install -r requirements.txt
-
Verify installation:
python tests/test_pipeline.py
streamlit run dashboard/enhanced_app.pystreamlit run dashboard/app.pypython examples/simple_example.pypython examples/run_analysis.py-
Launch the dashboard:
streamlit run dashboard/enhanced_app.py
-
Navigate through the pages:
- π Data Collection: Gather financial data from multiple sources
- π§ Feature Engineering: Create technical indicators and features
- π€ Model Training: Train Isolation Forest and Autoencoder models
- π Anomaly Detection: Detect and analyze anomalies
- π Analytics: Comprehensive data analysis and visualization
- π§ AI Analysis: AI-powered insights and recommendations
- βοΈ Settings: Configure the system
-
Test AI features (No API keys needed):
- Go to "π§ AI Analysis"
- Enable "Demo Mode" to see AI features in action
- Explore anomaly insights, market analysis, and trading recommendations
βββ data/ # Data collection and processing
β βββ collectors/ # API collectors for different data sources
β β βββ yahoo_finance_collector.py
β β βββ crypto_collector.py
β β βββ fx_collector.py
β βββ processors/ # Data preprocessing and feature engineering
β βββ feature_engineer.py
βββ models/ # Machine learning models
β βββ isolation_forest.py
β βββ autoencoder.py
β βββ gnn_anomaly.py
βββ utils/ # Utility functions
β βββ model_evaluator.py
β βββ ai_anomaly_analyzer.py # AI-powered analysis
βββ dashboard/ # Streamlit dashboards
β βββ app.py # Original dashboard
β βββ enhanced_app.py # Enhanced dashboard with AI features
β βββ components.py # Reusable UI components
β βββ realtime_dashboard.py # Real-time monitoring
β βββ ai_components.py # AI-specific components
βββ examples/ # Example scripts
β βββ simple_example.py
β βββ run_analysis.py
β βββ autoencoder_explanation.py
β βββ model_comparison_explanation.py
βββ tests/ # Unit tests
β βββ test_pipeline.py
βββ requirements.txt # Python dependencies
βββ AI_FEATURES_README.md # AI features documentation
βββ README.md # This file
from data.collectors.yahoo_finance_collector import YahooFinanceCollector
from data.processors.feature_engineer import FinancialFeatureEngineer
from models.isolation_forest import IsolationForestAnomalyDetector
# Collect data
collector = YahooFinanceCollector()
data = collector.get_stock_data("AAPL", period="1y")
# Engineer features
engineer = FinancialFeatureEngineer()
features = engineer.engineer_all_features(data)
features_df, _, _ = engineer.prepare_for_ml(features)
# Train model
model = IsolationForestAnomalyDetector(contamination=0.1)
model.fit(features_df)
# Detect anomalies
predictions, scores, metadata = model.detect_anomalies(features_df)
print(f"Detected {metadata['n_anomalies']} anomalies")from models.autoencoder import AutoencoderAnomalyDetector
from models.gnn_anomaly import GNNAnomalyDetector
from utils.model_evaluator import AnomalyDetectionEvaluator
# Train multiple models
models = {
'Isolation Forest': IsolationForestAnomalyDetector(),
'Autoencoder': AutoencoderAnomalyDetector(),
'GNN': GNNAnomalyDetector()
}
# Train and evaluate
evaluator = AnomalyDetectionEvaluator()
for name, model in models.items():
model.fit(features_df)
predictions, scores, metadata = model.detect_anomalies(features_df)
evaluator.evaluate_model(name, y_true, predictions, scores)
# Compare models
comparison = evaluator.compare_models()
print(comparison)- Isolation Forest:
contamination,n_estimators,max_samples - Autoencoder:
encoding_dim,hidden_dims,epochs,learning_rate - GNN:
model_type,hidden_dim,num_layers,heads
- Price Features: Range, body size, shadows, gaps
- Volume Features: Moving averages, ratios, z-scores
- Technical Indicators: MA, EMA, MACD, RSI, Bollinger Bands
- Returns Features: Simple returns, log returns, volatility
- Time Features: Cyclical encoding, market session indicators
- Light/Dark Theme: Toggle between themes with persistent settings
- Responsive Design: Works on desktop, tablet, and mobile devices
- Interactive Charts: Plotly-powered visualizations with zoom, pan, and hover
- Real-time Updates: Live data refresh and progress tracking
- Custom Styling: Modern CSS with professional appearance
- Multi-source Support: Yahoo Finance, Binance, CoinGecko, FX APIs
- Interactive Configuration: Symbol selection, time periods, intervals
- Real-time Status: Live collection progress and error handling
- Data Preview: Immediate data validation and preview
- Export Options: Save collected data in multiple formats
- Interactive Controls: Select specific feature types to generate
- Advanced Options: Customizable parameters for technical indicators
- Feature Preview: Real-time preview of generated features
- Progress Tracking: Visual progress bars and status updates
- Data Validation: Automatic handling of missing values and outliers
- Model Selection: Choose between Isolation Forest and Autoencoder
- Parameter Tuning: Interactive sliders and input fields
- Training Progress: Real-time training metrics and visualizations
- Model Comparison: Side-by-side performance comparison
- Save/Load Models: Persistent model storage and retrieval
- Interactive Detection: Configure contamination rates and thresholds
- Real-time Results: Live anomaly detection with instant feedback
- Detailed Analysis: Individual anomaly information and scores
- Visualization: Interactive charts showing anomalies over time
- Export Results: Save detection results in CSV/JSON formats
- Price Analysis: Candlestick charts, price distributions, volatility metrics
- Volume Analysis: Volume patterns, correlations, and spikes
- Technical Indicators: Comprehensive technical analysis with charts
- Summary Statistics: Data quality metrics and correlation matrices
- Interactive Tabs: Organized analysis by category
- Demo Mode: Test AI features without API keys
- AI Provider Selection: Easy switching between Demo and OpenAI
- Anomaly Insights: AI-powered explanations of detected anomalies
- Market Analysis: AI assessment of market conditions
- Trading Recommendations: AI-generated buy/sell/hold signals
- Risk Assessment: AI evaluation of portfolio risks
- Appearance Settings: Theme, chart preferences, UI customization
- Data Collection Settings: API configurations, collection limits
- AI Configuration: OpenAI API keys, model selection, parameters
- Analysis Settings: Default parameters, feature selection, model configs
- Reset Options: Restore defaults and clear settings
Run the test suite to verify everything works correctly:
python tests/test_pipeline.pyThe tests cover:
- Data collection (with mocked APIs)
- Feature engineering
- Model training and prediction
- Model evaluation
- End-to-end pipeline
YahooFinanceCollector: Collect stock data from Yahoo FinanceBinanceCollector: Collect cryptocurrency data from BinanceCoinGeckoCollector: Collect cryptocurrency data from CoinGeckoFXCollector: Collect forex data from multiple sources
IsolationForestAnomalyDetector: Tree-based anomaly detectionAutoencoderAnomalyDetector: Neural network-based reconstructionGNNAnomalyDetector: Graph neural network for correlated assets
FinancialFeatureEngineer: Comprehensive feature engineeringAnomalyDetectionEvaluator: Model evaluation and comparison
The system provides multiple evaluation metrics:
- Classification Metrics: Accuracy, Precision, Recall, F1-Score
- Ranking Metrics: ROC-AUC, PR-AUC
- Anomaly-Specific: Anomaly rate, threshold analysis
- Visualization: Time series plots, confusion matrices, score distributions
- Sample Data: Realistic financial anomalies for testing
- AI Insights: Simulated AI explanations and analysis
- Trading Recommendations: Mock buy/sell/hold signals
- Risk Assessment: Sample risk analysis and mitigation strategies
- Perfect for Learning: Understand AI capabilities without setup
- Real AI Analysis: GPT-4 powered anomaly explanations
- Market Intelligence: AI assessment of market conditions
- Trading Signals: AI-generated trading recommendations
- Risk Evaluation: Professional risk assessment
- Free Tier Available: $5 in free credits at platform.openai.com
- Anomaly Insights: Intelligent explanations of why anomalies occurred
- Market Analysis: AI-powered market condition assessment
- Trading Recommendations: Buy/sell/hold signals with confidence levels
- Risk Assessment: Portfolio risk evaluation and mitigation strategies
- Contextual Analysis: AI considers market conditions, volatility, and trends
Combine multiple models for improved detection:
# Train ensemble of models
ensemble_results = {}
for model_name, model in models.items():
predictions, scores, metadata = model.detect_anomalies(features_df)
ensemble_results[model_name] = {'predictions': predictions, 'scores': scores}
# Combine results (example: majority voting)
combined_predictions = np.mean([r['predictions'] for r in ensemble_results.values()], axis=0)For streaming data analysis:
# Process new data points
new_data = collector.get_latest_data("AAPL")
new_features = engineer.engineer_all_features(new_data)
new_features_df, _, _ = engineer.prepare_for_ml(new_features)
# Detect anomalies in real-time
predictions, scores, metadata = model.detect_anomalies(new_features_df)from utils.ai_anomaly_analyzer import AIAnomalyAnalyzer
# Initialize AI analyzer
ai_analyzer = AIAnomalyAnalyzer(openai_api_key="your-key-here")
# Get AI insights for anomalies
insights = ai_analyzer.analyze_anomalies(anomalies, market_context)
recommendations = ai_analyzer.generate_trading_recommendations(anomalies)
risk_assessment = ai_analyzer.assess_risk(anomalies, portfolio_data)- pandas: Data manipulation and analysis
- numpy: Numerical computing
- scikit-learn: Machine learning algorithms
- torch: PyTorch for neural networks
- plotly: Interactive visualizations
- streamlit: Web dashboard framework
- yfinance: Yahoo Finance API
- ccxt: Cryptocurrency exchange APIs
- requests: HTTP requests for APIs
- openai: OpenAI GPT models
- anthropic: Claude models (optional)
- requests: API communication
pip install -r requirements.txt# Launch enhanced dashboard
streamlit run dashboard/enhanced_app.py
# Navigate to AI Analysis β Enable Demo Mode
# Explore all features without any setup# 1. Data Collection: Collect some stock data
# 2. Feature Engineering: Generate technical indicators
# 3. Model Training: Train Isolation Forest and Autoencoder
# 4. Anomaly Detection: Detect anomalies in your data
# 5. Analytics: Explore comprehensive data analysis
# 6. AI Analysis: Get AI-powered insights (Demo Mode)# Get free OpenAI API key at platform.openai.com
# Go to Settings β AI Configuration
# Enter your API key and select model
# Go to AI Analysis β Select OpenAI β Initialize- Theme: Light/Dark mode with persistent settings
- Charts: Interactive Plotly visualizations
- Auto-refresh: Real-time data updates
- Export: Multiple data formats (CSV, JSON, Excel)
- Demo Mode: No setup required, sample data
- OpenAI: GPT-4, GPT-3.5-turbo models
- Parameters: Analysis depth, response length, creativity
- Free Tier: $5 in free credits available
- Isolation Forest: Contamination rate, number of estimators
- Autoencoder: Encoding dimension, epochs, learning rate
- Feature Engineering: Technical indicator periods, normalization
- Data Collection: Use appropriate time periods to avoid rate limits
- Feature Engineering: Select only needed features for faster processing
- Model Training: Start with default parameters, then tune as needed
- AI Analysis: Use Demo Mode for testing, OpenAI for production
- Memory Usage: Monitor data size for large datasets
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
MIT License - Free for personal and commercial use.
Note: This system is for educational and research purposes. Always verify results and consider market conditions when making financial decisions.
- Dashboard not loading: Check if port 8501 is available
- Data collection fails: Verify internet connection and API limits
- Model training errors: Ensure data is properly formatted
- AI features not working: Check API keys and internet connection
- Demo Mode: Use for testing without API keys
- Settings Page: Configure system parameters
- Error Messages: Check console for detailed error information
- Documentation: Refer to this README and inline help text