Zero-Setup Market Data Analytics with Python API, CLI, and LLM Integration
PLUTUS is a data analytics framework for Vietnamese stock market with three ways to access 21GB of historical data (2021-2022): Python API, command-line tools, and natural language queries through LLM integration.
PLUTUS provides zero-setup access to Vietnamese market data without database installation:
- π Rich Dataset: 21GB tick & daily data from HSX, HNX, UPCOM (2021-2022)
- π Zero Setup: Query CSV files directly using DuckDB (no database required)
- β‘ High Performance: Optional Parquet optimization for 10-100x faster queries
- π§ Triple Interface: Python API + CLI + LLM integration (MCP)
- π€ AI-Powered: Query data using natural language through Claude, Gemini, or other MCP clients
- β Production Ready: 205+ tests, comprehensive documentation
git clone https://github.com/algotradevn/plutus.git
cd plutus
pip install -e .Set your dataset path (choose one method):
Option 1: Environment Variable (Recommended)
export HERMES_DATA_ROOT=/path/to/hermes-offline-market-data-pre-2023Option 2: Config File
cp config.cfg.template config.cfg
# Edit config.cfg and set PLUTUS_DATA_ROOTPython API:
from plutus.datahub import query_historical
# Get 5-minute OHLC bars
ohlc = query_historical(
ticker_symbol='FPT',
begin='2021-01-15',
end='2021-01-16',
type='ohlc',
interval='5m'
)
for bar in ohlc:
print(f"{bar['bar_time']}: O={bar['open']} H={bar['high']} "
f"L={bar['low']} C={bar['close']}")CLI:
python -m plutus.datahub \
--ticker FPT \
--begin 2021-01-15 \
--end 2021-01-16 \
--type ohlc \
--interval 5m \
--output fpt.csvLLM (Natural Language):
> Get me FPT's 5-minute OHLC bars for January 15, 2021
Programmatic access to market data with flexible querying:
Tick Data Queries:
from plutus.datahub import query_historical
# Get tick-level data with field selection
ticks = query_historical(
ticker_symbol='HPG',
begin='2021-01-15 09:00:00',
end='2021-01-15 10:00:00',
type='tick',
fields=['matched_price', 'matched_volume', 'bid_price_1', 'ask_price_1']
)
for tick in ticks:
print(f"{tick['datetime']}: {tick['matched_price']} @ {tick['matched_volume']}")OHLC Aggregation:
# Generate candlestick bars from tick data
ohlc = query_historical(
ticker_symbol='VIC',
begin='2021-01-15',
end='2021-01-16',
type='ohlc',
interval='15m', # 1m, 5m, 15m, 30m, 1h, 4h, 1d
include_volume=True
)Features:
- 40+ data fields (matched price/volume, bid/ask, foreign flows, open interest)
- 7 OHLC intervals (1m, 5m, 15m, 30m, 1h, 4h, 1d)
- Date/datetime range filtering
- Lazy iteration for memory efficiency
- DataFrame conversion via
to_dataframe()
Command-line interface for data export and analysis:
# Export tick data to CSV
python -m plutus.datahub \
--ticker FPT \
--begin "2021-01-15 09:00" \
--end "2021-01-15 10:00" \
--type tick \
--fields matched_price,matched_volume \
--output fpt_ticks.csv
# Generate OHLC bars in JSON format
python -m plutus.datahub \
--ticker HPG \
--begin 2021-01-15 \
--end 2021-01-16 \
--type ohlc \
--interval 1m \
--format json \
--output hpg_1m.json
# Get query statistics before execution
python -m plutus.datahub \
--ticker VIC \
--begin 2021-01-01 \
--end 2021-12-31 \
--statsOutput Formats: CSV, JSON, table (terminal)
π CLI Usage Guide
Access market data through natural language using Claude Desktop, Gemini CLI, or other MCP-compatible LLMs.
Model Context Protocol (MCP) enables LLMs to access external data sources through a standardized interface. Instead of writing code, you query data using natural language.
1. Start MCP Server:
python -m plutus.mcp2. Configure Your Client:
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"plutus-datahub": {
"command": "python",
"args": ["-m", "plutus.mcp"],
"env": {
"HERMES_DATA_ROOT": "/absolute/path/to/dataset"
}
}
}
}Restart Claude Desktop.
Claude Code (VS Code)
claude mcp add --transport stdio plutus-datahub python -- -m plutus.mcpEdit ~/.claude.json to add HERMES_DATA_ROOT.
Gemini CLI (Google)
Install and configure:
npm install -g @google/gemini-cli@latest
gemini auth login
gemini mcp add plutus-datahub python -m plutus.mcp \
-e HERMES_DATA_ROOT=/absolute/path/to/dataset \
--description "Vietnamese market data access"Test:
gemini
> @plutus-datahub Get FPT's daily OHLC for January 15, 20213. Query with Natural Language:
Try these queries in your MCP client:
- Basic Data: "Get FPT's daily OHLC data for January 2021"
- Intraday Analysis: "Show me VIC's 5-minute OHLC bars on Jan 15, 2021 with volume"
- Tick Data: "Get HPG's matched price and volume from 9am to 10am on Jan 15"
- Comparison: "Compare FPT and VIC performance for Q1 2021"
- Technical Analysis: "Calculate RSI and MACD for HPG in January 2021"
- Anomaly Detection: "Find unusual volume spikes for FPT in 2021"
- 4 Tools: query_tick_data, query_ohlc_data, get_available_fields, get_query_statistics
- 4 Resources: Dataset metadata, ticker list, field descriptions, OHLC intervals
- 5 Prompts: Daily trends, volume analysis, ticker comparison, anomaly detection, technical indicators
- β Claude Desktop (macOS, Windows)
- β Claude Code (VS Code extension)
- β Gemini CLI (Terminal, all platforms)
- β Custom MCP Clients (Python/TypeScript SDK)
π MCP Documentation:
- Quick Start Guide - 5-minute setup
- Client Setup - Detailed configuration for all clients
- Tools Reference - Complete API documentation
- Usage Examples - Real-world query examples
Plutus requires the hermes-offline-market-data-pre-2023 dataset (~21GB):
- Coverage: 2021-2022 (2 years)
- Exchanges: HSX, HNX, UPCOM
- Data Types: Tick-level intraday + daily aggregations
- Format: CSV files (optionally convert to Parquet for 10-100x faster queries)
π§ Contact ALGOTRADE for dataset access
Out of the box, Plutus queries CSV files directly (zero setup). For production use:
# Convert to Parquet (10-100x faster, 60% smaller)
python -m plutus.datahub.cli_optimize optimize --data-root /path/to/datasetBenefits:
- 10-100x faster queries
- 60% smaller storage footprint
- Metadata caching for instant field lookups
π Performance Guide
- Python: 3.12 or higher
- Dataset: hermes-offline-market-data-pre-2023 (21GB)
- Dependencies: Automatically installed via pip
- DuckDB (query engine)
- PyArrow (Parquet support)
- FastMCP (MCP server)
- Others (see
pyproject.toml)
- Version: 1.0.0 (October 2025)
- Tests: 205/205 passing β
- Production Ready: DataHub + MCP Server
Current Features:
- β DataHub (Python API + CLI)
- β MCP Server (Claude Desktop, Gemini CLI, custom clients)
- β Performance optimization (Parquet, metadata cache)
- π§ Trading algorithms (Framework in development)
Plutus follows the ALGOTRADE 9-step algorithmic trading process:
- Define trading hypothesis
- Data collection β DataHub provides this layer β
- Data exploration
- Signal detection
- Portfolio management
- Risk management
- Backtesting
- Optimization
- Live trading
The DataHub module (production-ready) handles step 2 with three interfaces:
- Python API for programmatic access
- CLI for data export and batch processing
- MCP Server for LLM integration
Other modules are under development.
- CLI Usage Guide - Command-line examples and workflows
- Performance Optimization - Parquet conversion and tuning
- Python Examples - Ready-to-run Python scripts
- Quick Start - 5-minute setup for Claude/Gemini
- Client Setup - Detailed configuration guide
- Tools Reference - Complete API documentation
- Usage Examples - Query patterns and workflows
- Setup Scripts - Server setup and integration
Error: Dataset not found at: /path/to/dataset
Solution: Set HERMES_DATA_ROOT environment variable or edit config.cfg
ModuleNotFoundError: No module named 'plutus'
Solution: Install in development mode: pip install -e .
Solution: Convert data to Parquet format (see Performance Guide)
Solution: See MCP Quick Start for client-specific troubleshooting
This is a research project. For questions or collaboration:
- GitHub Issues: https://github.com/algotradevn/plutus/issues
- Email: andan@algotrade.vn
MIT License - See LICENSE file for details.
Dan (andan@algotrade.vn) ALGOTRADE - Algorithmic Trading Education & Research
Built on the ALGOTRADE 9-step methodology for systematic algorithmic trading development.