Goal

This project builds a web research assistant that answers questions by searching the internet, evaluating sources, and generating summaries with proper citations. The agent aims to provide accurate information while being transparent about its sources and avoiding hallucinations.

Tech Stack

Frontend: Streamlit for the web interface

Search: SERP API for web search results

AI: Google's Gemini 1.5 Flash for content summarization

Web Scraping: BeautifulSoup for content extraction

Language: Python 3.8+

How to Run

Install dependencies:

pip install -r requirements.txt 
or 
pip install streamlit requests beautifulsoup4 google-generativeai

Get API keys:

Sign up for SERP API at serpapi.com Get a Gemini API key from Google AI Studio

Set up secrets: Create .streamlit/secrets.toml with:

SERP_API_KEY="Your api key here"
GEMINI_API_KEY="Your api key here"

Run the app:

streamlit run web_research_agent.py

Architecture

The system follows a pipeline approach:

Query Processing: User's question will be sent to SERP API for search
Content Extraction: Top 5 URL's are analyzed for content
Source Evaluation: Quality of the source is analyzed
Answer Generation: Answer is then generated by the LLM model by taking into account the extracted content and query

Retrieval Strategy

SERP API finds relevant pages using Google's search engine
Extracts main content and limits to 5000 characters per page
Each source gets a score based on domain reputation and length of the content.
Removes duplicate sources from the same domain
Picks the 3 highest-quality sources for answer generation

Prompt

The main prompt to Gemini follows this structure:

You are a research assistant. Based on the following sources, provide a concise answer to the query. If the information is not available in the sources, say "I don't know". Always cite your sources using numbers like [1], [2], etc.

Query: {user_question}

Sources:

Answer:

Future Improvements

Implement proper chunking and vector storage for better context management
Add user feedback to improve source quality assessment over time

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.streamlit		.streamlit
README.md		README.md
requirements.txt		requirements.txt
web_research_agent.py		web_research_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goal

Tech Stack

How to Run

Architecture

Retrieval Strategy

Prompt

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Goal

Tech Stack

How to Run

Architecture

Retrieval Strategy

Prompt

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages