An intelligent data analytics agent powered by Google Gemini and the ReAct (Reasoning + Acting) framework
- π§ ReAct Reasoning: Systematic think-act-observe loops for robust analysis
- πΎ Dual Memory System: Short-term conversation + long-term dataset storage
- π οΈ 5 Powerful Tools: Load, inspect, analyze, and visualize data
- π Auto Retry Logic: Handles rate limits with exponential backoff
- π LLM-Generated Code: Gemini creates pandas/matplotlib code on the fly
- π¨ Interactive UI: Chat-like interface in Google Colab
- π Safe Execution: Sandboxed code runner prevents system access
- Click the "Open in Colab" badge above
- Add your Gemini API key
- Run all cells
- Start analyzing data!
git clone https://github.com/YOUR_USERNAME/react-data-agent.git
cd react-data-agent
pip install -r requirements.txtfrom src.agent import DataAnalyticsAgent
import google.generativeai as genai
# Initialize
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")
agent = DataAnalyticsAgent(model)
# Analyze
result = agent.run("Load insurance.csv and show me average premium by region")
print(result["answer"])from src.interface import setup_datasets_with_ui, ColabAgentInterface
# Load data with UI
setup_datasets_with_ui()
# Start chat interface
interface = ColabAgentInterface(agent)
interface.display()User Query β βββββββββββββββββββββββ β DataAnalyticsAgent β β (ReAct Loop) β βββββββββββββββββββββββ β βββββββββββββββββββββββ β Reasoning (LLM) β β Google Gemini β β’ Think β β β’ Plan β β β’ Decide β βββββββββββββββββββββββ β βββββββββββββββββββββββ β Tool Selection β β β’ load_csv β β β’ analyze β β β’ visualize β β β’ inspect β βββββββββββββββββββββββ β βββββββββββββββββββββββ β Code Executor β β Safe sandbox β β’ Run pandas β β β’ Generate plots β βββββββββββββββββββββββ β Result β Memory β Next Iteration
- "Load sales.csv and show me total revenue by product"
- "What are the top 5 customers by purchase frequency?"
- "Create a bar chart of monthly sales trends"
- "Which regions have the highest average order value?"
- "Show me correlation between age and insurance premium"
The agent uses the ReAct (Reasoning + Acting) pattern:
- THINK: "User wants sales trends. I need to check if data is loaded."
- ACT: Execute
list_datasets()tool - OBSERVE: "No datasets loaded yet"
- THINK: "I need to load the data first"
- ACT: Execute
load_csv(filepath="sales.csv") - OBSERVE: "Data loaded successfully with 1000 rows"
- THINK: "Now I can analyze trends"
- ACT: Execute
analyze(query="calculate monthly sales") - OBSERVE: "Analysis complete: [results]"
- THINK: "I have the answer"
- ACT:
DONEwith final answer
- Orchestrates ReAct loop
- Manages iterations and memory
- Handles retry logic
- LoadCSVTool: Load datasets into memory
- ListDatasetsTool: View available data
- InspectDatasetTool: Examine structure
- AnalyzeTool: Generate pandas code
- VisualizeTool: Create charts
- Short-term: Recent conversation (50 turns)
- Long-term: Loaded datasets (persistent)
- Safe code execution
- Sandboxed environment
- Prevents file system access
agent = DataAnalyticsAgent(
gemini_model=model,
max_iterations=20, # Max reasoning steps
verbose=True, # Print reasoning
retry_delay=2.0, # Rate limit backoff
max_retries=3 # Max retry attempts
)- Average query time: 10-30 seconds
- Typical iterations: 3-8 steps
- Rate limit handling: Automatic retry with backoff
- Memory usage: ~50MB per 1M row dataset
The agent automatically handles rate limits with exponential backoff. If you still encounter issues:
- Increase
retry_delay(default: 2.0s) - Increase
max_retries(default: 3) - Wait 60 seconds between complex queries
Always load data first:
agent.run("Load mydata.csv as 'data'")
agent.run("Now analyze the data") # Works!If the LLM generates incorrect code:
- Use
inspect_datasetto show column names - Be specific in your query
- Try rephrasing the question
This project is licensed under the MIT License - see LICENSE file.
- ReAct Paper by Yao et al.
- Google Gemini API
- Anthropic for Claude (inspiration)