📊 ResearchMate AI - Automated Literature Discovery & Research Agent

ResearchMate AI is a tool-augmented, agentic LLM-powered system built to automate research workflows such as academic paper search, PDF ingestion, summarization, and structured insight extraction. It integrates multiple tools—Arxiv search, PDF reading, and intelligent reasoning—within a REAct-style agent loop to produce high-quality research outputs.

What exactly the project does:

Searches the latest academic papers from arXiv.org based on any research topic entered by the user.
Reads and extracts content from selected PDF research papers dynamically.
Analyzes the paper, identifies research gaps, and proposes new research directions.
Writes a complete LaTeX formatted research paper and exports it as a downloadable PDF.

The project demonstrates how modern LLMs can coordinate external tools, stream structured reasoning paths, and produce actionable research output suitable for academic and scientific applications.

Important

You can tell the Agent to in very detail about what type of research paper you want exactly.

🖧 Technical Architecture

📁 Project Structure

ResearchMate-AI/
│
├── ai_researcher.py         # Simple ReAct Agent workflow
├── ai_researcher_2.py       # Enhanced LangGraph agent with streaming & memory
├── arxiv_tool.py            # Arxiv search integration
├── frontend.py              # Streamlit interface for chat interaction
├── pyproject.toml           # Python dependencies
├── read_pdf.py              # PDF parsing tool using PyPDF2
├── write_pdf.py             # LaTeX → PDF rendering using Tectonic
└── README.md                # Project documentation

🧩 Key Components

1️⃣ ai_researcher.py — Base ReAct Agent Implementation

This file contains the initial foundational version of the AI Research Agent using a ReAct (Reasoning + Acting) paradigm. It demonstrates the traditional LangChain workflow where the model reasons step-by-step and invokes tools when needed.
- Initializes the model using Google Gemini 2.5 Pro.
- Defines and registers critical tools: arxiv_search, read_pdf, and render_latex_pdf
- Creates a ReAct-based agent graph using LangGraph prebuilt utilities.
- Streams model responses sequentially in the console using a generator.
- Handles continuous conversation through while True loop.

2️⃣ ai_researcher_2.py — Advanced LangGraph Agent with Stateful Workflows

This is the core intelligence engine of the project, built using LangGraph, enabling multi-step autonomous decision making and memory persistence.
- Defines the State object using TypedDict to manage message context
- Uses conditional routing logic to decide dynamically whether tool invocation is required or whether response is final.
- Implements bidirectional loop between:
  - Agent node → LLM reasoning
  - Tools node → external tool execution
- Adds memory checkpointing using MemorySaver which enables persistent conversations, reversible workflows, and reproducibility.

Capability	Description
Dynamic Tool Binding	LLM determines when to call tools automatically
Streaming Thought Process	Real-time incremental response construction
Stateful Graph	Maintains chat history across interactions
Re-entrant Architecture	Prevents model hallucinations by checking tool call necessity

Note: This file transforms the project from a simple agent into a production-grade autonomous research workflow controller.

3️⃣ arxiv_tool.py — Research Paper Retrieval Utility

This module integrates the system with arXiv.org, the world’s largest open-access scientific research repository.
- Accepts query topics and formulates structured search queries.
- Retrieves metadata such as:
  - Title, abstract, authors
  - Publication date
  - PDF download link
- Formats the results in a structured payload for the agent to evaluate.

Important

Enables automated academic discovery and exploration of latest published works. And Powers the agent's ability to recommend relevant papers intelligently.

4️⃣ frontend.py — Streamlit Chat UI Interface

The interactive user-facing interface that controls user prompts and visual output.
Functional Capabilities:
- Provides chat input and streaming assistant responses with progressive updates.
- Displays model messages and maintains internal session history.
- Logs tool usage and backend operations during execution.
- Acts as the gateway client for interacting with LangGraph agent.
Why it is critical:
- Converts backend agent workflow into a usable web application.
- Offers production-ready UI suitable for deployments and demos.

5️⃣ read_pdf.py — LaTeX to PDF Rendering Tool

Responsible for exporting the final structured research paper.
Core Capabilities:
- Generates .tex file dynamically using received LaTeX content.
- Invokes Tectonic engine to compile the LaTeX into a PDF.
- Saves time-stamped output into /output directory.
Why it matters:
- Enables automatic generation of polished, academic-style research papers.
- Supports mathematical typesetting, proof formatting, and equations.

🛠️ Tech Stack

Language: Python (>= 3.11)
Environment & packaging: uv (for virtualenv + dependency management via pyproject.toml)
AI Framework: LangChain, LangGraph
UI Framework: Streamlit
LLM / Agent: Google Gemini 2.5 Pro
PDF Tools: PyPDF2, Tectonic

Dependencies (from pyproject.toml):

langchain>=0.3.27
langchain-core>=0.3.72
langchain-google-genai>=2.1.9
langgraph>=0.6.3
pypdf2>=3.0.1
python-dotenv>=1.1.1
requests>=2.32.4
streamlit>=1.48.0

📦 Prerequisites

Python 3.11+ installed on your system.
uv installed (for virtual environment + dependency management).
Google Gemini API Key.
Tectonic PDF Processor installed locally.

⚙️ Setup with `uv`

All commands below assume you are in the project root: ResearchMate-AI/.

1. Clone and install

git clone https://github.com/MadtorXD/ResearchMate-AI.git
cd ResearchMate-AI

# Install dependencies and create .venv using uv
uv sync

2. Activate the virtual environment

# windowsOS / powerShell
.venv\Scripts\Activate.ps1

If you prefer not to activate the venv manually, you can also run commands through uv directly (see examples below).

3. Configure environment variables

Create a .env file

GOOGLE_API_KEY= "YOUR_API_KEY_HERE"

4. Running the Backend

`streamlit run frontend.py`

Important

Security note: Don’t commit real keys; use .env or environment variables in production.

Note

If you prefer environment variables, adapt config.py to read from os.environ.

📚 What is arXiv and How to Use It?

arXiv.org is the largest open-access academic repository for scientific research across domains such as:

Computer Science
Machine Learning & AI
Physics
Mathematics
Economics
Quantitative Biology

It returns:

Paper title, author, publication date
PDF download link

📡 API Reference

Below are the required API keys and tokens that are required for the proper functioning of the project:

Parameter	Description
`GOOGLE_API_KEY`	Required. Your Google Gemini 3.5 Pro key

📝 Development Notes

Streaming response enables real-time UI updates
LangGraph manages decisions when to call tools
PDF generation supports mathematical LaTeX equations
Checkpointing ensures persistent conversation state

🧪 Suggested Next Steps

To further strengthen the reliability, safety, and production readiness of ResearchMate AI, consider implementing the following enhancements:

Add citation formatting (IEEE / APA).
Add local PDF upload option.
Build user authentication & workspace saving.
Add RAG embeddings for better research contextualization.

⚠️ Disclaimer

The ResearchMate AI project is intended for educational and research purposes only. Generated papers should not be used for unethical publication, plagiarism, or academic misconduct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 ResearchMate AI - Automated Literature Discovery & Research Agent

What exactly the project does:

🖧 Technical Architecture

📁 Project Structure

🧩 Key Components

🛠️ Tech Stack

📦 Prerequisites

⚙️ Setup with `uv`

1. Clone and install

2. Activate the virtual environment

3. Configure environment variables

4. Running the Backend

📚 What is arXiv and How to Use It?

📡 API Reference

Below are the required API keys and tokens that are required for the proper functioning of the project:

📝 Development Notes

🧪 Suggested Next Steps

⚠️ Disclaimer

⚖️ License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
ai_researcher.py		ai_researcher.py
ai_researcher_2.py		ai_researcher_2.py
arxiv_tool.py		arxiv_tool.py
frontend.py		frontend.py
pyproject.toml		pyproject.toml
read_pdf.py		read_pdf.py
tectonic.exe		tectonic.exe
write_pdf.py		write_pdf.py

Folders and files

Latest commit

History

Repository files navigation

📊 ResearchMate AI - Automated Literature Discovery & Research Agent

What exactly the project does:

🖧 Technical Architecture

📁 Project Structure

🧩 Key Components

🛠️ Tech Stack

📦 Prerequisites

⚙️ Setup with uv

1. Clone and install

2. Activate the virtual environment

3. Configure environment variables

4. Running the Backend

📚 What is arXiv and How to Use It?

📡 API Reference

Below are the required API keys and tokens that are required for the proper functioning of the project:

📝 Development Notes

🧪 Suggested Next Steps

⚠️ Disclaimer

⚖️ License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

⚙️ Setup with `uv`

Packages