A generative AI-powered teaching assistant that retrieves and explains research papers from arXiv, converts complex academic content into beginner-friendly lessons, and provides coding practice through LeetCode integration.
- Intelligent Paper Retrieval: Searches through a curated collection of LLM and AI systems papers
- Semantic Understanding: Uses vector embeddings and FAISS for similarity-based paper matching
- Automated Lesson Generation: Converts research paper sections into beginner-friendly explanations
- PDF Processing: Integrates with GROBID for structured document parsing
- LeetCode Integration: Fetches random coding problems for interview practice
- Conversational Interface: LangGraph-powered agent with memory and summarization
- Advanced Agent System: Sophisticated conversation flow with tool integration
llm-teaching-assistant/
βββ src/ # Main source code
β βββ agents/ # AI agent implementations
β β βββ __init__.py
β β βββ teaching_agent.py # LangGraph-powered teaching agent
β β βββ state_management.py # Agent state definitions
β βββ data_fetching/ # Data retrieval and fetching
β β βββ __init__.py
β β βββ paper_fetcher.py # Paper metadata and abstract retrieval
β β βββ leetcode_fetcher.py # LeetCode problem fetching
β βββ embeddings/ # Vector embeddings and FAISS operations
β β βββ __init__.py
β β βββ vector_store.py
β βββ document_processing/ # PDF parsing and lesson generation
β β βββ __init__.py
β β βββ pdf_processor.py
β β βββ lesson_generator.py
β βββ retrieval/ # Paper retrieval and search
β β βββ __init__.py
β β βββ paper_retriever.py # Advanced retrieval with GROBID
β βββ __init__.py
βββ config/ # Configuration management
β βββ __init__.py
β βββ settings.py
βββ scripts/ # Setup and example scripts
β βββ setup_environment.py
β βββ example_usage.py # Usage examples
βββ requirements.txt # Python dependencies
βββ README.md # This file
-
Clone the repository:
git clone https://github.com/ahv15/llm-teaching-assistant.git cd llm-teaching-assistant -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
export OPENAI_API_KEY="your-openai-api-key" export GROBID_URL="http://localhost:8070" # Optional, defaults to localhost
-
Install and start GROBID (for PDF processing):
# Download GROBID from: https://github.com/kermitt2/grobid # Follow their installation instructions # Start GROBID service on port 8070
-
Initialize the environment:
python scripts/setup_environment.py
This will:
- Fetch paper metadata from the LLMSys repository
- Download abstracts from arXiv
- Create vector embeddings
- Build the FAISS index
-
Run the example:
python scripts/example_usage.py
-
Use the Teaching Agent:
from src.agents.teaching_agent import TeachingAgent # Initialize the agent agent = TeachingAgent() # Ask about research topics result = agent.invoke({ "messages": [ {"role": "human", "content": "Teach me about transformer optimization techniques"} ], "context": {} }, {"configurable": {"thread_id": "session_1"}}) print(result["messages"][-1].content) # Get coding practice coding_result = agent.invoke({ "messages": [ {"role": "human", "content": "Give me a LeetCode problem to practice"} ], "context": {} }, {"configurable": {"thread_id": "session_2"}}) print(coding_result["messages"][-1].content)
teaching_agent.py: LangGraph-powered conversational agent with integrated toolsstate_management.py: State definitions for the agent system
paper_fetcher.py: Retrieves paper metadata from LLMSys repository and abstracts from arXivleetcode_fetcher.py: LeetCode problem fetching and processing
paper_retriever.py: Advanced retrieval with GROBID integration and lesson generationpdf_processor.py: Interfaces with GROBID for PDF parsing and section extractionlesson_generator.py: Converts academic sections into beginner-friendly lessons
vector_store.py: Manages OpenAI embeddings and FAISS vector operations
The system can be configured via environment variables or the config/settings.py file:
OPENAI_API_KEY: Your OpenAI API key (required)EMBEDDING_MODEL: Embedding model (default: "text-embedding-3-small")CHAT_MODEL: Chat model (default: "gpt-4o")GROBID_URL: GROBID service URL (default: "http://localhost:8070")FAISS_INDEX_PATH: Path to FAISS index file (default: "summary.faiss")URLS_JSON_PATH: Path to URLs JSON file (default: "urls.json")
from src.agents.teaching_agent import TeachingAgent
# Create agent instance
agent = TeachingAgent()
# Research paper learning
result = agent.invoke({
"messages": [{"role": "human", "content": "Explain BERT architecture"}],
"context": {"topic": "nlp"}
})
# Coding practice
result = agent.invoke({
"messages": [{"role": "human", "content": "Give me a medium difficulty coding problem"}],
"context": {"skill_level": "intermediate"}
})# LeetCode problem fetching
from src.data_fetching.leetcode_fetcher import get_problem
problem = get_problem.invoke({})
print(f"Problem: {problem['title']}")
print(f"Difficulty: {problem['difficulty']}")
print(f"Statement: {problem['statement']}")
# Paper retrieval
from src.retrieval.paper_retriever import paper_retriever
lesson = paper_retriever.invoke({"query": "attention mechanisms in transformers"})
print(lesson)from src.data_fetching.paper_fetcher import fetch_llm_sys_papers
from src.embeddings.vector_store import EmbeddingProcessor
# Fetch papers
papers = fetch_llm_sys_papers()
# Create embeddings
processor = EmbeddingProcessor()
embeddings = processor.create_embeddings(["sample text"])- Python 3.8+
- OpenAI API key
- GROBID service (for PDF processing)
- Required Python packages (see
requirements.txt)
Advanced conversation flow management with:
- Memory and conversation summarization
- Dynamic tool routing between paper retrieval and coding practice
- State persistence across conversations
Automated coding problem fetching with:
- API-based problem retrieval from LeetCode
- Selenium-based fallback for dynamic content
- Problem filtering by difficulty (Medium/Hard focus)
- Integration with teaching workflow
Sophisticated paper retrieval featuring:
- GROBID integration for structured PDF parsing
- Section-by-section lesson generation
- Caching for improved performance
- Batch processing capabilities
Built specifically for education with:
- Beginner-friendly lesson generation
- Step-by-step explanations with examples
- Smooth transitions between topics
- Comprehensive course-like output
Feel free to submit issues, feature requests, or pull requests to improve the teaching assistant!
This project is open source and available under the MIT License.