An intelligent data analytics agent powered by Google Gemini and the ReAct (Reasoning + Acting) framework
- 🧠 ReAct Reasoning: Systematic think-act-observe loops for robust analysis
- 💾 Dual Memory System: Short-term conversation + long-term dataset storage
- 🛠️ 5 Powerful Tools: Load, inspect, analyze, and visualize data
- 🔄 Auto Retry Logic: Handles rate limits with exponential backoff
- 📊 LLM-Generated Code: Gemini creates pandas/matplotlib code on the fly
- 🎨 Interactive UI: Chat-like interface in Google Colab
- 🔒 Safe Execution: Sandboxed code runner prevents system access
- Click the "Open in Colab" badge above
- Add your Gemini API key
- Run all cells
- Start analyzing data!
git clone https://github.com/YOUR_USERNAME/react-data-agent.git
cd react-data-agent
pip install -r requirements.txtfrom src.agent import DataAnalyticsAgent
import google.generativeai as genai
# Initialize
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")
agent = DataAnalyticsAgent(model)
# Analyze
result = agent.run("Load insurance.csv and show me average premium by region")
print(result["answer"])from src.interface import setup_datasets_with_ui, ColabAgentInterface
# Load data with UI
setup_datasets_with_ui()
# Start chat interface
interface = ColabAgentInterface(agent)
interface.display()User Query ↓ ┌─────────────────────┐ │ DataAnalyticsAgent │ │ (ReAct Loop) │ └─────────────────────┘ ↓ ┌─────────────────────┐ │ Reasoning (LLM) │ ← Google Gemini │ • Think │ │ • Plan │ │ • Decide │ └─────────────────────┘ ↓ ┌─────────────────────┐ │ Tool Selection │ │ • load_csv │ │ • analyze │ │ • visualize │ │ • inspect │ └─────────────────────┘ ↓ ┌─────────────────────┐ │ Code Executor │ ← Safe sandbox │ • Run pandas │ │ • Generate plots │ └─────────────────────┘ ↓ Result → Memory → Next Iteration
- "Load sales.csv and show me total revenue by product"
- "What are the top 5 customers by purchase frequency?"
- "Create a bar chart of monthly sales trends"
- "Which regions have the highest average order value?"
- "Show me correlation between age and insurance premium"
The agent uses the ReAct (Reasoning + Acting) pattern:
- THINK: "User wants sales trends. I need to check if data is loaded."
- ACT: Execute
list_datasets()tool - OBSERVE: "No datasets loaded yet"
- THINK: "I need to load the data first"
- ACT: Execute
load_csv(filepath="sales.csv") - OBSERVE: "Data loaded successfully with 1000 rows"
- THINK: "Now I can analyze trends"
- ACT: Execute
analyze(query="calculate monthly sales") - OBSERVE: "Analysis complete: [results]"
- THINK: "I have the answer"
- ACT:
DONEwith final answer
- Orchestrates ReAct loop
- Manages iterations and memory
- Handles retry logic
- LoadCSVTool: Load datasets into memory
- ListDatasetsTool: View available data
- InspectDatasetTool: Examine structure
- AnalyzeTool: Generate pandas code
- VisualizeTool: Create charts
- Short-term: Recent conversation (50 turns)
- Long-term: Loaded datasets (persistent)
- Safe code execution
- Sandboxed environment
- Prevents file system access
agent = DataAnalyticsAgent(
gemini_model=model,
max_iterations=20, # Max reasoning steps
verbose=True, # Print reasoning
retry_delay=2.0, # Rate limit backoff
max_retries=3 # Max retry attempts
)- Average query time: 10-30 seconds
- Typical iterations: 3-8 steps
- Rate limit handling: Automatic retry with backoff
- Memory usage: ~50MB per 1M row dataset
The agent automatically handles rate limits with exponential backoff. If you still encounter issues:
- Increase
retry_delay(default: 2.0s) - Increase
max_retries(default: 3) - Wait 60 seconds between complex queries
Always load data first:
agent.run("Load mydata.csv as 'data'")
agent.run("Now analyze the data") # Works!If the LLM generates incorrect code:
- Use
inspect_datasetto show column names - Be specific in your query
- Try rephrasing the question
This project is licensed under the MIT License - see LICENSE file.
- ReAct Paper by Yao et al.
- Google Gemini API
- Anthropic for Claude (inspiration)