An AI agent that automates web-based information extraction for entities in datasets. This tool leverages LLM technology to process and structure data from web searches, making information gathering efficient .
This tool enables users to:
- Upload CSV files or connect Google Sheets for data input
- Select target columns for entity identification
- Configure custom search queries
- Extract structured information using AI
- Export results to CSV or Google Sheets
- Process both single entities and batch data
- CSV file upload with drag-and-drop support
- Google Sheets integration for direct data access
- Real-time data preview and validation
- Column selection for entity identification
- Pre-built query templates for common use cases
- Custom query builder with dynamic placeholders
- Multi-field extraction support
- Batch processing capabilities
- Automated web searching via SerpAPI
- AI-powered information parsing using Groq
- Structured data extraction
- Error handling and retry mechanisms
- Interactive results display
- CSV export functionality
- Google Sheets integration with:
- Public/private access control
- Real-time updates
- Formatted output
- Frontend: Streamlit
- Backend: Python 3.9+
- Data Processing: Pandas
- Web Search: SerpAPI
- LLM Integration: Groq API
- Cloud Integration: Google Sheets & Drive APIs
- Python 3.9+
- Google Cloud Platform account (for APIs)
- SerpAPI account
- Groq API account
- Clone the repository:
git clone https://github.com/yourusername/BreakoutAI_AI_Agent.git
cd BreakoutAI_AI_Agent- Create and activate virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env with your API keys and configurationRequired Environment Variables:
SERPAPI_KEY: Your SerpAPI keyGROQ_API_KEY: Your Groq API keyGOOGLE_CREDENTIALS_FILE: Path to Google credentials JSON
streamlit run app/main.py- Select "Single Company" mode
- Enter the company name
- Choose or customize search query
- View and export results
- Upload CSV file or connect Google Sheet
- Select entity column
- Configure search parameters
- Monitor processing progress
- Export results
- Enable APIs in Google Cloud Console
- Download service account credentials
- Place in credentials folder
- Configure environment variables
- CSV and Google Sheets support
- Dynamic query templates
- Web search integration
- LLM-powered extraction
- Results export options
- Single entity processing
- Multi-field extraction
- Rate limiting & error handling
- Progress monitoring
- Data validation
- API keys and credentials are stored in environment variables
- Google credentials are secured in a dedicated directory
- Rate limiting implemented for API calls
- Error handling for failed requests
The application includes robust error handling for:
- API rate limits
- Failed search retry mechanisms
- LLM processing errors
- File upload issues
- Data validation failures