Skip to content

maticly/DataAnalyticsAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

49 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– ReAct Data Analytics Agent

Python License Status

An intelligent data analytics agent powered by Google Gemini and the ReAct (Reasoning + Acting) framework

[Open In Colab Open In Colab Python 3.8+

image

✨ Features

  • 🧠 ReAct Reasoning: Systematic think-act-observe loops for robust analysis
  • πŸ’Ύ Dual Memory System: Short-term conversation + long-term dataset storage
  • πŸ› οΈ 5 Powerful Tools: Load, inspect, analyze, and visualize data
  • πŸ”„ Auto Retry Logic: Handles rate limits with exponential backoff
  • πŸ“Š LLM-Generated Code: Gemini creates pandas/matplotlib code on the fly
  • 🎨 Interactive UI: Chat-like interface in Google Colab
  • πŸ”’ Safe Execution: Sandboxed code runner prevents system access

πŸš€ Quick Start

Option 1: Google Colab (Recommended)

  1. Click the "Open in Colab" badge above
  2. Add your Gemini API key
  3. Run all cells
  4. Start analyzing data!

Option 2: Local Installation

git clone https://github.com/YOUR_USERNAME/react-data-agent.git
cd react-data-agent
pip install -r requirements.txt

πŸ’‘ Usage Examples

Load and Analyze Data

from src.agent import DataAnalyticsAgent
import google.generativeai as genai

# Initialize
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")
agent = DataAnalyticsAgent(model)

# Analyze
result = agent.run("Load insurance.csv and show me average premium by region")
print(result["answer"])

Interactive Interface

from src.interface import setup_datasets_with_ui, ColabAgentInterface

# Load data with UI
setup_datasets_with_ui()

# Start chat interface
interface = ColabAgentInterface(agent)
interface.display()

πŸ—οΈ Architecture

User Query ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DataAnalyticsAgent β”‚ β”‚ (ReAct Loop) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Reasoning (LLM) β”‚ ← Google Gemini β”‚ β€’ Think β”‚ β”‚ β€’ Plan β”‚ β”‚ β€’ Decide β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Tool Selection β”‚ β”‚ β€’ load_csv β”‚ β”‚ β€’ analyze β”‚ β”‚ β€’ visualize β”‚ β”‚ β€’ inspect β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Code Executor β”‚ ← Safe sandbox β”‚ β€’ Run pandas β”‚ β”‚ β€’ Generate plots β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ Result β†’ Memory β†’ Next Iteration

πŸ“Š Example Queries

  • "Load sales.csv and show me total revenue by product"
  • "What are the top 5 customers by purchase frequency?"
  • "Create a bar chart of monthly sales trends"
  • "Which regions have the highest average order value?"
  • "Show me correlation between age and insurance premium"

🧠 How It Works

The agent uses the ReAct (Reasoning + Acting) pattern:

  1. THINK: "User wants sales trends. I need to check if data is loaded."
  2. ACT: Execute list_datasets() tool
  3. OBSERVE: "No datasets loaded yet"
  4. THINK: "I need to load the data first"
  5. ACT: Execute load_csv(filepath="sales.csv")
  6. OBSERVE: "Data loaded successfully with 1000 rows"
  7. THINK: "Now I can analyze trends"
  8. ACT: Execute analyze(query="calculate monthly sales")
  9. OBSERVE: "Analysis complete: [results]"
  10. THINK: "I have the answer"
  11. ACT: DONE with final answer

πŸ› οΈ Components

Core Agent (src/agent.py)

  • Orchestrates ReAct loop
  • Manages iterations and memory
  • Handles retry logic

Tools (src/tools/)

  • LoadCSVTool: Load datasets into memory
  • ListDatasetsTool: View available data
  • InspectDatasetTool: Examine structure
  • AnalyzeTool: Generate pandas code
  • VisualizeTool: Create charts

Memory (src/memory.py)

  • Short-term: Recent conversation (50 turns)
  • Long-term: Loaded datasets (persistent)

Executor (src/executor.py)

  • Safe code execution
  • Sandboxed environment
  • Prevents file system access

πŸ”§ Configuration

agent = DataAnalyticsAgent(
    gemini_model=model,
    max_iterations=20,      # Max reasoning steps
    verbose=True,           # Print reasoning
    retry_delay=2.0,        # Rate limit backoff
    max_retries=3           # Max retry attempts
)

πŸ“ˆ Performance

  • Average query time: 10-30 seconds
  • Typical iterations: 3-8 steps
  • Rate limit handling: Automatic retry with backoff
  • Memory usage: ~50MB per 1M row dataset

πŸ› Troubleshooting

Rate Limit (429) Errors

The agent automatically handles rate limits with exponential backoff. If you still encounter issues:

  • Increase retry_delay (default: 2.0s)
  • Increase max_retries (default: 3)
  • Wait 60 seconds between complex queries

"Dataset not found" Errors

Always load data first:

agent.run("Load mydata.csv as 'data'")
agent.run("Now analyze the data")  # Works!

Code Generation Issues

If the LLM generates incorrect code:

  • Use inspect_dataset to show column names
  • Be specific in your query
  • Try rephrasing the question

πŸ“„ License

This project is licensed under the MIT License - see LICENSE file.

πŸ™ Acknowledgments

  • ReAct Paper by Yao et al.
  • Google Gemini API
  • Anthropic for Claude (inspiration)

About

Modular agent for dataset analysis and visualization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors