Skip to content

heuristic-solver/web-research-streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Goal

This project builds a web research assistant that answers questions by searching the internet, evaluating sources, and generating summaries with proper citations. The agent aims to provide accurate information while being transparent about its sources and avoiding hallucinations.

Tech Stack

Frontend: Streamlit for the web interface

Search: SERP API for web search results

AI: Google's Gemini 1.5 Flash for content summarization

Web Scraping: BeautifulSoup for content extraction

Language: Python 3.8+

How to Run

Install dependencies:

pip install -r requirements.txt 
or 
pip install streamlit requests beautifulsoup4 google-generativeai

Get API keys:

Sign up for SERP API at serpapi.com Get a Gemini API key from Google AI Studio

Set up secrets: Create .streamlit/secrets.toml with:

SERP_API_KEY="Your api key here"
GEMINI_API_KEY="Your api key here"

Run the app:

streamlit run web_research_agent.py

Architecture

The system follows a pipeline approach:

  1. Query Processing: User's question will be sent to SERP API for search

  2. Content Extraction: Top 5 URL's are analyzed for content

  3. Source Evaluation: Quality of the source is analyzed

  4. Answer Generation: Answer is then generated by the LLM model by taking into account the extracted content and query

Retrieval Strategy

  1. SERP API finds relevant pages using Google's search engine

  2. Extracts main content and limits to 5000 characters per page

  3. Each source gets a score based on domain reputation and length of the content.

  4. Removes duplicate sources from the same domain

  5. Picks the 3 highest-quality sources for answer generation

Prompt

The main prompt to Gemini follows this structure:

You are a research assistant. Based on the following sources, provide a concise answer to the query. If the information is not available in the sources, say "I don't know". Always cite your sources using numbers like [1], [2], etc.

Query: {user_question}

Sources:

Answer:

Future Improvements

  1. Implement proper chunking and vector storage for better context management

  2. Add user feedback to improve source quality assessment over time

About

A web-research agent based on gemini model along with Streamlit interface.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages