Simulating Deep research

This project is a web research tool powered by a Large Language Model (LLM). It enables users to input any research question, then automates web searching, content scraping, and summarization to provide a detailed, comprehensive answer. Built with Python, it integrates APIs such as OpenAI, SerpAPI, and FireCrawl for efficient operation.

Features

Search: Performs Google searches via SerpAPI using LLM-generated search terms.
Scrape: Extracts markdown content from websites using FireCrawl.
Summarize: Generates 500-600 word summaries of scraped content with OpenAI's GPT-4o.
Iterative Research: Repeats search, scrape, and summarize steps until sufficient data is collected, guided by a decision model.
Modular Code: Organized into separate Python files for clarity and maintainability.

Project Structure

📁 simulating-deep-research/
│
├── main.py            # Main script to run the application
├── api_keys.py       # Loads API keys from environment variables
├── search.py        # Handles web searches with SerpAPI
├── scrape.py        # Manages web scraping with FireCrawl
├── summarize.py      # Summarizes content using OpenAI
├── models.py        # Defines Pydantic models for data structure
├── .env             # Stores API keys (not tracked by Git)
├── .gitignore       # Excludes sensitive files from Git
├── requirements.txt   # Lists Python dependencies
└── README.md        # This file

Prerequisites

Python 3.11+: Required to run the project.
API Keys:
- OpenAI API Key (https://platform.openai.com/)
- SerpAPI API Key (https://serpapi.com/)
- FireCrawl API Key (https://firecrawl.dev/)

Setup

Clone the Repository:

git clone [https://github.com/genius-harry/simulating-deep-research.git](https://github.com/genius-harry/simulating-deep-research.git)
cd simulating-deep-research

Install Dependencies:

Use pip to install required packages:
```
pip install -r requirements.txt
```

Configure Environment Variables:

Create a .env file in the root directory and add your API keys:

OPENAI_API_KEY=your_openai_key_here
SERPAPI_API_KEY=your_serpapi_key_here
FIRECRAWL_API_KEY=your_firecrawl_key_here

Usage

Launch the Tool:

Run the main script:
```
python main.py
```

Input a Research Question:

Enter any question when prompted, e.g.:

Enter your research question: What are the latest advancements in renewable energy?

Review the Results:

The tool will:

Generate search terms using the LLM.
Search Google and scrape content from top results.
Summarize the content iteratively.
Deliver a final answer (600+ words) with sources.

Sample output:

--- Iteration 1 ---
Scraping website: [https://example.com](https://example.com)
Summarizing content for: [https://example.com](https://example.com)
Summary (first 500 chars): ...

Final Answer:
[600+ word summary]
Sources:
 - [https://example.com/source1](https://example.com/source1)
 - [https://example.com/source2](https://example.com/source2)

Before installing dependencies, consider using a virtual environment:

python -m venv env
source env/bin/activate  # On Windows use: env\Scripts\activate

Troubleshooting

If you encounter issues, check the following:

Verify that your API keys are correctly set in the .env file.
Ensure your Python version is 3.11 or higher.
Check your network connection for API access issues.

Roadmap

Future enhancements:

Improve error handling and logging.
Add more configurable options for iterations.
Enhance the UI/UX for console interactions.

How It Works

Search Term Creation: The LLM crafts relevant search phrases from the user’s question.
Web Search: SerpAPI retrieves top Google results for each term.
Content Extraction: FireCrawl scrapes markdown content from websites.
Summarization: GPT-4o produces 500-600 word summaries per site.
Iteration: A decision model checks if enough data is collected; if not, it refines search terms and repeats, otherwise compiles a final answer.

Dependencies

See requirements.txt for details:

langchain
openai
requests
pydantic
python-dotenv

Contributing

Fork the repo.
Create a branch (git checkout -b feature-branch).
Commit changes (git commit -m "Add feature").
Push to your fork (git push origin feature-branch).
Submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Notes

Verify API keys are active and have adequate usage limits.
Limited to 10 iterations to avoid infinite loops.
Outputs may vary due to updates in OpenAI’s GPT-4o model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Simulating Deep research

Features

Project Structure

Prerequisites

Setup

Usage

Troubleshooting

Roadmap

How It Works

Dependencies

Contributing

License

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_keys.py		api_keys.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
scrape.py		scrape.py
search.py		search.py
summarize.py		summarize.py

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Simulating Deep research

Features

Project Structure

Prerequisites

Setup

Usage

Troubleshooting

Roadmap

How It Works

Dependencies

Contributing

License

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages