Skip to content

ahv15/llm-teaching-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

52 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM Teaching Assistant

A generative AI-powered teaching assistant that retrieves and explains research papers from arXiv, converts complex academic content into beginner-friendly lessons, and provides coding practice through LeetCode integration.

πŸš€ Features

  • Intelligent Paper Retrieval: Searches through a curated collection of LLM and AI systems papers
  • Semantic Understanding: Uses vector embeddings and FAISS for similarity-based paper matching
  • Automated Lesson Generation: Converts research paper sections into beginner-friendly explanations
  • PDF Processing: Integrates with GROBID for structured document parsing
  • LeetCode Integration: Fetches random coding problems for interview practice
  • Conversational Interface: LangGraph-powered agent with memory and summarization
  • Advanced Agent System: Sophisticated conversation flow with tool integration

πŸ“ Project Structure

llm-teaching-assistant/
β”œβ”€β”€ src/                                    # Main source code
β”‚   β”œβ”€β”€ agents/                             # AI agent implementations
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ teaching_agent.py               # LangGraph-powered teaching agent
β”‚   β”‚   └── state_management.py             # Agent state definitions
β”‚   β”œβ”€β”€ data_fetching/                      # Data retrieval and fetching
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ paper_fetcher.py                # Paper metadata and abstract retrieval
β”‚   β”‚   └── leetcode_fetcher.py             # LeetCode problem fetching
β”‚   β”œβ”€β”€ embeddings/                         # Vector embeddings and FAISS operations
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── vector_store.py
β”‚   β”œβ”€β”€ document_processing/                # PDF parsing and lesson generation
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ pdf_processor.py
β”‚   β”‚   └── lesson_generator.py
β”‚   β”œβ”€β”€ retrieval/                          # Paper retrieval and search
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── paper_retriever.py              # Advanced retrieval with GROBID
β”‚   └── __init__.py
β”œβ”€β”€ config/                                 # Configuration management
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── settings.py
β”œβ”€β”€ scripts/                                # Setup and example scripts
β”‚   β”œβ”€β”€ setup_environment.py
β”‚   └── example_usage.py                    # Usage examples
β”œβ”€β”€ requirements.txt                        # Python dependencies
└── README.md                              # This file

πŸ”§ Installation

  1. Clone the repository:

    git clone https://github.com/ahv15/llm-teaching-assistant.git
    cd llm-teaching-assistant
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up environment variables:

    export OPENAI_API_KEY="your-openai-api-key"
    export GROBID_URL="http://localhost:8070"  # Optional, defaults to localhost
  4. Install and start GROBID (for PDF processing):

    # Download GROBID from: https://github.com/kermitt2/grobid
    # Follow their installation instructions
    # Start GROBID service on port 8070

πŸš€ Quick Start

  1. Initialize the environment:

    python scripts/setup_environment.py

    This will:

    • Fetch paper metadata from the LLMSys repository
    • Download abstracts from arXiv
    • Create vector embeddings
    • Build the FAISS index
  2. Run the example:

    python scripts/example_usage.py
  3. Use the Teaching Agent:

    from src.agents.teaching_agent import TeachingAgent
    
    # Initialize the agent
    agent = TeachingAgent()
    
    # Ask about research topics
    result = agent.invoke({
        "messages": [
            {"role": "human", "content": "Teach me about transformer optimization techniques"}
        ],
        "context": {}
    }, {"configurable": {"thread_id": "session_1"}})
    
    print(result["messages"][-1].content)
    
    # Get coding practice
    coding_result = agent.invoke({
        "messages": [
            {"role": "human", "content": "Give me a LeetCode problem to practice"}
        ],
        "context": {}
    }, {"configurable": {"thread_id": "session_2"}})
    
    print(coding_result["messages"][-1].content)

πŸ“š Core Components

Teaching Agent

  • teaching_agent.py: LangGraph-powered conversational agent with integrated tools
  • state_management.py: State definitions for the agent system

Data Fetching

  • paper_fetcher.py: Retrieves paper metadata from LLMSys repository and abstracts from arXiv
  • leetcode_fetcher.py: LeetCode problem fetching and processing

Paper Processing

  • paper_retriever.py: Advanced retrieval with GROBID integration and lesson generation
  • pdf_processor.py: Interfaces with GROBID for PDF parsing and section extraction
  • lesson_generator.py: Converts academic sections into beginner-friendly lessons

Supporting Components

  • vector_store.py: Manages OpenAI embeddings and FAISS vector operations

βš™οΈ Configuration

The system can be configured via environment variables or the config/settings.py file:

  • OPENAI_API_KEY: Your OpenAI API key (required)
  • EMBEDDING_MODEL: Embedding model (default: "text-embedding-3-small")
  • CHAT_MODEL: Chat model (default: "gpt-4o")
  • GROBID_URL: GROBID service URL (default: "http://localhost:8070")
  • FAISS_INDEX_PATH: Path to FAISS index file (default: "summary.faiss")
  • URLS_JSON_PATH: Path to URLs JSON file (default: "urls.json")

πŸ“– Usage Examples

Main Teaching Agent

from src.agents.teaching_agent import TeachingAgent

# Create agent instance
agent = TeachingAgent()

# Research paper learning
result = agent.invoke({
    "messages": [{"role": "human", "content": "Explain BERT architecture"}],
    "context": {"topic": "nlp"}
})

# Coding practice
result = agent.invoke({
    "messages": [{"role": "human", "content": "Give me a medium difficulty coding problem"}],
    "context": {"skill_level": "intermediate"}
})

Individual Tool Usage

# LeetCode problem fetching
from src.data_fetching.leetcode_fetcher import get_problem

problem = get_problem.invoke({})
print(f"Problem: {problem['title']}")
print(f"Difficulty: {problem['difficulty']}")
print(f"Statement: {problem['statement']}")

# Paper retrieval
from src.retrieval.paper_retriever import paper_retriever

lesson = paper_retriever.invoke({"query": "attention mechanisms in transformers"})
print(lesson)

Manual Component Usage

from src.data_fetching.paper_fetcher import fetch_llm_sys_papers
from src.embeddings.vector_store import EmbeddingProcessor

# Fetch papers
papers = fetch_llm_sys_papers()

# Create embeddings
processor = EmbeddingProcessor()
embeddings = processor.create_embeddings(["sample text"])

πŸ› οΈ Requirements

  • Python 3.8+
  • OpenAI API key
  • GROBID service (for PDF processing)
  • Required Python packages (see requirements.txt)

🌟 Key Features

LangGraph Integration

Advanced conversation flow management with:

  • Memory and conversation summarization
  • Dynamic tool routing between paper retrieval and coding practice
  • State persistence across conversations

LeetCode Tools

Automated coding problem fetching with:

  • API-based problem retrieval from LeetCode
  • Selenium-based fallback for dynamic content
  • Problem filtering by difficulty (Medium/Hard focus)
  • Integration with teaching workflow

Advanced Paper Processing

Sophisticated paper retrieval featuring:

  • GROBID integration for structured PDF parsing
  • Section-by-section lesson generation
  • Caching for improved performance
  • Batch processing capabilities

Teaching-Focused Design

Built specifically for education with:

  • Beginner-friendly lesson generation
  • Step-by-step explanations with examples
  • Smooth transitions between topics
  • Comprehensive course-like output

🀝 Contributing

Feel free to submit issues, feature requests, or pull requests to improve the teaching assistant!

πŸ“„ License

This project is open source and available under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages