ResearchMate AI is a tool-augmented, agentic LLM-powered system built to automate research workflows such as academic paper search, PDF ingestion, summarization, and structured insight extraction. It integrates multiple tools—Arxiv search, PDF reading, and intelligent reasoning—within a REAct-style agent loop to produce high-quality research outputs.
-
Searches the latest academic papers from arXiv.org based on any research topic entered by the user.
-
Reads and extracts content from selected PDF research papers dynamically.
-
Analyzes the paper, identifies research gaps, and proposes new research directions.
-
Writes a complete LaTeX formatted research paper and exports it as a downloadable PDF.
The project demonstrates how modern LLMs can coordinate external tools, stream structured reasoning paths, and produce actionable research output suitable for academic and scientific applications.
Important
You can tell the Agent to in very detail about what type of research paper you want exactly.
ResearchMate-AI/
│
├── ai_researcher.py # Simple ReAct Agent workflow
├── ai_researcher_2.py # Enhanced LangGraph agent with streaming & memory
├── arxiv_tool.py # Arxiv search integration
├── frontend.py # Streamlit interface for chat interaction
├── pyproject.toml # Python dependencies
├── read_pdf.py # PDF parsing tool using PyPDF2
├── write_pdf.py # LaTeX → PDF rendering using Tectonic
└── README.md # Project documentation
1️⃣ ai_researcher.py — Base ReAct Agent Implementation
- This file contains the initial foundational version of the AI Research Agent using a ReAct (Reasoning + Acting) paradigm. It demonstrates the traditional LangChain workflow where the model reasons step-by-step and invokes tools when needed.
- Initializes the model using Google Gemini 2.5 Pro.
- Defines and registers critical tools:
arxiv_search,read_pdf, andrender_latex_pdf - Creates a ReAct-based agent graph using LangGraph prebuilt utilities.
- Streams model responses sequentially in the console using a generator.
- Handles continuous conversation through
while Trueloop.
2️⃣ ai_researcher_2.py — Advanced LangGraph Agent with Stateful Workflows
- This is the core intelligence engine of the project, built using LangGraph, enabling multi-step autonomous decision making and memory persistence.
- Defines the State object using
TypedDictto manage message context - Uses conditional routing logic to decide dynamically whether tool invocation is required or whether response is final.
- Implements bidirectional loop between:
- Agent node → LLM reasoning
- Tools node → external tool execution
- Adds memory checkpointing using MemorySaver which enables persistent conversations, reversible workflows, and reproducibility.
- Defines the State object using
| Capability | Description |
|---|---|
| Dynamic Tool Binding | LLM determines when to call tools automatically |
| Streaming Thought Process | Real-time incremental response construction |
| Stateful Graph | Maintains chat history across interactions |
| Re-entrant Architecture | Prevents model hallucinations by checking tool call necessity |
Note: This file transforms the project from a simple agent into a production-grade autonomous research workflow controller.
3️⃣ arxiv_tool.py — Research Paper Retrieval Utility
- This module integrates the system with arXiv.org, the world’s largest open-access scientific research repository.
- Accepts query topics and formulates structured search queries.
- Retrieves metadata such as:
- Title, abstract, authors
- Publication date
- PDF download link
- Formats the results in a structured payload for the agent to evaluate.
Important
Enables automated academic discovery and exploration of latest published works. And Powers the agent's ability to recommend relevant papers intelligently.
4️⃣ frontend.py — Streamlit Chat UI Interface
- The interactive user-facing interface that controls user prompts and visual output.
- Functional Capabilities:
- Provides chat input and streaming assistant responses with progressive updates.
- Displays model messages and maintains internal session history.
- Logs tool usage and backend operations during execution.
- Acts as the gateway client for interacting with LangGraph agent.
- Why it is critical:
- Converts backend agent workflow into a usable web application.
- Offers production-ready UI suitable for deployments and demos.
5️⃣ read_pdf.py — LaTeX to PDF Rendering Tool
- Responsible for exporting the final structured research paper.
- Core Capabilities:
- Generates .tex file dynamically using received LaTeX content.
- Invokes Tectonic engine to compile the LaTeX into a PDF.
- Saves time-stamped output into /output directory.
- Why it matters:
- Enables automatic generation of polished, academic-style research papers.
- Supports mathematical typesetting, proof formatting, and equations.
- Language: Python (>= 3.11)
- Environment & packaging: uv (for virtualenv + dependency management via pyproject.toml)
- AI Framework: LangChain, LangGraph
- UI Framework: Streamlit
- LLM / Agent: Google Gemini 2.5 Pro
- PDF Tools: PyPDF2, Tectonic
Dependencies (from pyproject.toml):
langchain>=0.3.27langchain-core>=0.3.72langchain-google-genai>=2.1.9langgraph>=0.6.3pypdf2>=3.0.1python-dotenv>=1.1.1requests>=2.32.4streamlit>=1.48.0
- Python 3.11+ installed on your system.
uvinstalled (for virtual environment + dependency management).- Google Gemini API Key.
- Tectonic PDF Processor installed locally.
All commands below assume you are in the project root: ResearchMate-AI/.
git clone https://github.com/MadtorXD/ResearchMate-AI.git
cd ResearchMate-AI
# Install dependencies and create .venv using uv
uv sync# windowsOS / powerShell
.venv\Scripts\Activate.ps1If you prefer not to activate the venv manually, you can also run commands through uv directly (see examples below).
Create a .env file
GOOGLE_API_KEY= "YOUR_API_KEY_HERE"`streamlit run frontend.py`Important
Security note: Don’t commit real keys; use .env or environment variables in production.
Note
If you prefer environment variables, adapt config.py to read from os.environ.
arXiv.org is the largest open-access academic repository for scientific research across domains such as:
- Computer Science
- Machine Learning & AI
- Physics
- Mathematics
- Economics
- Quantitative Biology
It returns:
- Paper title, author, publication date
- PDF download link
Below are the required API keys and tokens that are required for the proper functioning of the project:
| Parameter | Description |
|---|---|
GOOGLE_API_KEY |
Required. Your Google Gemini 3.5 Pro key |
- Streaming response enables real-time UI updates
- LangGraph manages decisions when to call tools
- PDF generation supports mathematical LaTeX equations
- Checkpointing ensures persistent conversation state
To further strengthen the reliability, safety, and production readiness of ResearchMate AI, consider implementing the following enhancements:
- Add citation formatting (IEEE / APA).
- Add local PDF upload option.
- Build user authentication & workspace saving.
- Add RAG embeddings for better research contextualization.
The ResearchMate AI project is intended for educational and research purposes only. Generated papers should not be used for unethical publication, plagiarism, or academic misconduct.
