- Project Overview
- Architecture
- System Components
- File Structure & Descriptions
- Workflow Diagrams
- Installation & Setup
- Usage Guide
- Technical Specifications
- API Reference
- Development Guidelines
The Learning Agent System is an autonomous AI-powered tutoring platform that provides personalized, structured learning experiences through sequential checkpoint-based progression. Built on LangGraph, it combines intelligent content retrieval, adaptive assessment, and the Feynman Technique to ensure deep conceptual understanding.
- Structured Guidance: Sequential checkpoint-based learning paths
- Flexible Content: Dynamic web search + user-provided materials
- Rigorous Assessment: 70% mastery threshold enforcement
- Adaptive Simplification: Feynman Technique for concept re-teaching
- Mastery-Based Progression: No advancement without demonstrated understanding
- User Interface: Professional web UI and CLI options
- ✅ Multi-checkpoint learning path orchestration
- ✅ Dynamic content retrieval and validation
- ✅ AI-powered question generation (4 questions per checkpoint)
- ✅ Automated answer evaluation with detailed feedback
- ✅ Adaptive concept simplification using Feynman Technique
- ✅ Document upload support (PDF, DOCX, MD, TXT)
- ✅ Custom topic creation
- ✅ LangSmith integration for workflow observability
- ✅ ChromaDB vector storage for semantic search
- ✅ Professional Streamlit web interface
graph TB
subgraph "User Interface Layer"
A[Streamlit Web UI<br/>app.py]
B[CLI Interface<br/>multi_checkpoint.py]
end
subgraph "Workflow Orchestration"
C[LangGraph StateGraph<br/>workflow.py]
D[Workflow Nodes<br/>workflow_nodes.py]
end
subgraph "Intelligence Layer"
E[LLM Service<br/>Ollama + LangChain]
F[Feynman Teacher<br/>Adaptive Simplification]
G[Context Validator<br/>Relevance Scoring]
end
subgraph "Content Processing"
H[Document Processor<br/>PDF/DOCX/MD/TXT]
I[Context Processor<br/>Chunking + Embedding]
J[Web Search<br/>DuckDuckGo]
end
subgraph "Data Persistence"
K[ChromaDB<br/>Vector Store]
L[Custom Topics<br/>JSON Storage]
M[Session State<br/>LangGraph Checkpointer]
end
A --> C
B --> C
C --> D
D --> E
D --> F
D --> G
D --> H
D --> I
D --> J
I --> K
E --> F
H --> I
J --> I
L --> D
M --> C
| Layer | Technology | Purpose |
|---|---|---|
| Framework | LangGraph 0.2+ | Stateful workflow orchestration |
| LLM | Ollama (llama3.1) | AI reasoning and generation |
| Web UI | Streamlit 1.31+ | Professional web interface |
| Vector Store | ChromaDB 0.4+ | Document embeddings and retrieval |
| Search | DuckDuckGo | Dynamic content discovery |
| Embeddings | SentenceTransformers | Text vectorization |
| Document Processing | PyPDF2, python-docx | File upload support |
| Monitoring | LangSmith | Workflow tracing and debugging |
| Environment | Python 3.8+ | Core runtime |
The core orchestration system managing the complete learning lifecycle through a state machine:
States:
initialize→ Define checkpoint and objectivescollect_materials→ Gather user notes + web contentsummarize_materials→ Condense collected materialsevaluate_milestone1→ Validate content relevanceprocess_context→ Chunk and embed contentgenerate_questions→ Create assessment questionsverify_understanding→ Evaluate learner responsescheck_threshold→ Compare score vs 70% thresholdcomplete_checkpoint✅ /feynman_teaching🔄
Routing Logic:
- Score ≥ 70% → Progress to next checkpoint
- Score < 70% → Feynman re-teaching → Retry questions
Handles all AI operations through Ollama integration:
- Question generation (4 questions per checkpoint)
- Answer simulation (for testing)
- Answer evaluation and scoring
- Concept extraction
- Simplified explanation generation
Implements adaptive teaching methodology:
- Identifies knowledge gaps from low-scoring answers
- Generates simplified explanations with analogies
- Uses simpler vocabulary and concrete examples
- Manages retry attempts (max 3 retries)
Multi-stage content preparation:
- Collection: User uploads + web search
- Validation: Relevance scoring against checkpoint objectives
- Processing: Text chunking (500-1000 chars)
- Embedding: Vector generation with SentenceTransformers
- Storage: ChromaDB persistence with metadata
Dual interface support:
- Streamlit Web UI: Professional interface with progress tracking, file uploads, custom topics
- CLI: Terminal-based interactive sessions
Project/
├── app.py # Main Streamlit web application (762 lines)
├── requirements.txt # Python dependencies
├── goal.json # Project objectives and design specifications
├── custom_topics.json # User-created learning topics
├── learning_agent.log # Application logs
├── README.md # Quick start guide
├── LICENSE # Project license
├── .env # Environment variables (API keys)
├── .gitignore # Git exclusions
├── .streamlit/ # Streamlit configuration
│ └── config.toml # UI theme settings
├── chroma_db/ # ChromaDB vector database
├── .venv/ # Python virtual environment
└── src/ # Source code modules
Purpose: LangGraph workflow creation and routing logic
Key Functions:
create_unified_workflow()- Creates complete CLI workflow with all nodescreate_question_generation_workflow()- Creates partial workflow for Streamlit (stops before CLI prompts)_route_after_threshold_check()- Routes to completion or Feynman teaching based on score_route_after_feynman()- Decides retry or end after teaching
Dependencies: LangGraph, workflow_nodes, models
Purpose: Individual node implementations for each workflow stage
Key Nodes:
initialize_node()- Set up checkpoint contextcollect_materials_node()- Gather user materials and web contentsummarize_materials_node()- Condense collected materialsevaluate_milestone1_node()- Validate content qualityprocess_context_node()- Chunk and embed contentgenerate_questions_node()- Create assessment questionsverify_understanding_node()- Evaluate learner answerscheck_threshold_node()- Compare score to 70% thresholdcomplete_checkpoint_node()- Mark checkpoint as completefeynman_teaching_node()- Generate simplified explanations
Dependencies: All service modules (LLM, context, document processor)
Purpose: TypedDict definitions and data models
Key Models:
LearningAgentState- Complete workflow stateCheckpoint- Learning milestone definitionMaterial- Learning content structureProcessedContext- Embedded context chunkGeneratedQuestion- Question with metadataLearnerAnswer- User response structureVerificationResult- Evaluation results
Dependencies: Python typing module
Purpose: LLM integration for all AI operations
Key Methods:
generate_questions()- Creates 4 contextual questionssimulate_learner_answer()- Generates test answersevaluate_answer_with_rag()- Scores answers using contextextract_concepts()- Identifies key concepts from textgenerate_feynman_explanation()- Creates simplified explanations
AI Model: Ollama (llama3.1:latest)
Integration: LangChain OllamaLLM
Monitoring: LangSmith tracing decorators
Purpose: Adaptive teaching using Feynman Technique
Key Methods:
identify_knowledge_gaps()- Analyzes incorrect answersgenerate_simplified_explanation()- Creates simplified contentgenerate_analogy()- Produces relevant analogiesgenerate_all_explanations()- Batch explanation generation
Teaching Strategy:
- Identify questions with score < 0.7
- Extract concepts from weak areas
- Generate simple explanations with analogies
- Present teaching material
- Request retry (max 3 attempts)
Purpose: Text chunking and embedding generation
Key Functions:
chunk_text()- Splits text into 500-1000 char chunksgenerate_embeddings()- Creates vector embeddingsstore_in_vector_db()- Persists to ChromaDBretrieve_relevant_chunks()- Semantic similarity search
Embedding Model: SentenceTransformers (all-MiniLM-L6-v2)
Chunk Strategy: Recursive character splitting with overlap
Purpose: Extract text from various file formats
Supported Formats:
- PDF files (PyPDF2)
- Word documents (python-docx)
- Markdown files (native parsing)
- Plain text files (UTF-8)
Key Methods:
detect_file_type()- Auto-detect format from extensionextract_text_from_pdf()- PDF text extractionextract_text_from_docx()- Word document parsingextract_text_from_markdown()- Markdown parsingprocess_uploaded_file()- Unified file processing
Purpose: Validate content relevance to learning objectives
Key Functions:
calculate_relevance_score()- Scores content against requirementsvalidate_context_coverage()- Checks topic coverageidentify_missing_concepts()- Detects content gaps
Scoring Algorithm: Keyword matching + semantic similarity (0.0-1.0 scale)
Purpose: Dynamic web content retrieval
Search Engine: DuckDuckGo
Key Functions:
search_web()- Execute search queriesextract_content()- Parse search resultsgenerate_search_query()- Optimize query from checkpoint
Rate Limiting: Built-in throttling to respect API limits
Purpose: Collect and manage user inputs
Key Functions:
collect_user_answers()- Gather responses to questions (CLI)display_score_feedback()- Show evaluation resultsconfirm_progression()- Get user confirmation
Interface: CLI-based input/output
Purpose: Handle file uploads in Streamlit
Key Functions:
get_upload_handler()- Initialize upload managerprocess_uploaded_files()- Handle multiple filesvalidate_file_size()- Check file constraints
Constraints: Max 10MB per file, specific formats only
Purpose: Create and manage user-defined learning topics
Storage: JSON file (custom_topics.json)
Key Functions:
load_custom_topics()- Read from JSONadd_custom_topic()- Create new topiccreate_topic_wizard()- Streamlit topic creator
Topic Structure:
{
"id": "unique_id",
"name": "Topic Name",
"description": "Auto-generated",
"checkpoints": []
}Purpose: Predefined learning paths for testing
Included Paths:
- Data Structures & Algorithms
- Machine Learning Fundamentals
- Web Development (React + Node.js)
- System Design
Function: create_learning_paths() - Returns all sample paths
Purpose: Generate learning materials dynamically
Key Functions:
generate_checkpoint_materials()- Create content for checkpointretrieve_relevant_materials()- Fetch from vector DBaugment_with_web_search()- Supplement with web content
Purpose: LangSmith monitoring integration
Key Functions:
langsmith_config()- Initialize tracingget_langsmith_callbacks()- Return callback handlerstrace_llm_operation()- Decorator for operation tracing
Environment Variables:
LANGCHAIN_API_KEYLANGCHAIN_PROJECTLANGCHAIN_TRACING_V2
Purpose: CLI entry point for multi-checkpoint sessions
Usage: python -m src.main
Flow: Initialize → Select path → Execute checkpoints → Track progress
Purpose: Multi-checkpoint session orchestration
Key Functions:
run_multi_checkpoint_session()- Execute complete learning pathtrack_checkpoint_progress()- Monitor completiondetermine_next_checkpoint()- Sequential progression
Purpose: Package initialization and exports
Exports: All major functions and classes for external use
Purpose: Main Streamlit web application
Features:
- Modern responsive UI with custom theme
- File upload with drag-and-drop
- Custom topic creation wizard
- Real-time progress tracking
- Ollama status indicator
- Session state management
- Error handling with user-friendly messages
Key Sections:
render_header()- Title and status barrender_learning_path_selector()- Path selection UIrender_file_upload()- Document upload interfacerender_learning_session()- Active learning interfacerender_results()- Score display and progressionrender_feynman_teaching()- Adaptive teaching UI
Session State Variables:
selected_path- Current learning pathcurrent_checkpoint_index- Progress trackerlearning_state- Workflow statequestions- Generated questionsanswers- User responsesstage- UI stage (path_selection, learning, results, feynman)
Purpose: Streamlit UI theme configuration
Customizations:
- Primary color: #FF4B4B
- Background color: #FFFFFF
- Secondary background: #F0F2F6
- Font: Sans-serif
- Wide mode enabled
Purpose: Python dependency specifications
Categories:
- Core: LangGraph, LangChain, LangChain-Community
- LLM: langchain-ollama, ollama
- Vector DB: ChromaDB, sentence-transformers
- Web UI: Streamlit
- Document Processing: PyPDF2, python-docx
- Utilities: python-dotenv, httpx, nest-asyncio
Installation: pip install -r requirements.txt
flowchart TD
Start([Start Session]) --> Init[Initialize Checkpoint]
Init --> Collect[Collect Materials]
Collect --> Summarize[Summarize Materials]
Summarize --> Evaluate[Evaluate Relevance]
Evaluate --> Process[Process Context<br/>Chunk & Embed]
Process --> Generate[Generate Questions<br/>4 questions]
Generate --> Verify[Verify Understanding<br/>Collect Answers]
Verify --> Check{Score >= 70%?}
Check -->|Yes| Complete[Complete Checkpoint]
Check -->|No| Feynman[Feynman Teaching<br/>Simplify Concepts]
Feynman --> Retry{Retry Count < 3?}
Retry -->|Yes| Generate
Retry -->|No| End([End Session])
Complete --> Next{More Checkpoints?}
Next -->|Yes| Init
Next -->|No| End
style Start fill:#90EE90
style End fill:#FFB6C1
style Check fill:#FFD700
style Retry fill:#FFD700
style Complete fill:#98FB98
style Feynman fill:#87CEEB
flowchart LR
A[Landing Page] --> B{Select Path Type}
B -->|Predefined| C[Choose Sample Path]
B -->|Custom| D[Create Custom Topic]
C --> E[Upload Materials Optional]
D --> E
E --> F[Start Learning Session]
F --> G[View Questions]
G --> H[Submit Answers]
H --> I{Score >= 70%?}
I -->|Yes| J[View Results<br/>Next Checkpoint]
I -->|No| K[Feynman Teaching<br/>Retry]
K --> G
J --> L{More Checkpoints?}
L -->|Yes| F
L -->|No| M[Completion Summary]
style A fill:#E6F3FF
style M fill:#D4EDDA
style K fill:#FFF3CD
flowchart TB
subgraph Input
A1[User Uploads<br/>PDF/DOCX/MD/TXT]
A2[Web Search Results<br/>DuckDuckGo]
end
subgraph Processing
B1[Text Extraction]
B2[Summarization]
B3[Relevance Validation]
B4[Text Chunking<br/>500-1000 chars]
B5[Embedding Generation<br/>SentenceTransformers]
end
subgraph Storage
C1[ChromaDB<br/>Vector Store]
C2[Metadata Index]
end
subgraph Retrieval
D1[Semantic Search]
D2[Relevant Chunks]
end
A1 --> B1
A2 --> B1
B1 --> B2
B2 --> B3
B3 --> B4
B4 --> B5
B5 --> C1
B5 --> C2
C1 --> D1
C2 --> D1
D1 --> D2
style C1 fill:#FFE6CC
style D2 fill:#D4EDDA
sequenceDiagram
participant L as Learner
participant S as System
participant F as Feynman Teacher
participant LLM as LLM Service
L->>S: Submit Answers
S->>S: Evaluate (Score: 45%)
S->>F: Score < 70%, Trigger Teaching
F->>F: Identify Knowledge Gaps
F->>LLM: Request Simplified Explanation
LLM-->>F: Generate Analogies
F->>L: Present Simplified Content
L->>L: Review Teaching Material
F->>S: Request Retry (Attempt 1/3)
S->>S: Generate New Questions
S->>L: Present Questions Again
L->>S: Submit New Answers
S->>S: Re-evaluate
alt Score >= 70%
S->>L: ✅ Checkpoint Complete
else Score < 70% and Retries < 3
S->>F: Continue Teaching
else Max Retries Reached
S->>L: ⚠️ Suggest Review & Retry Later
end
flowchart TD
A[Start] --> B[Receive Context Chunks<br/>& Checkpoint Requirements]
B --> C[Combine Context<br/>Max 2000 chars]
C --> D[Create Prompt with<br/>Learning Objectives]
D --> E[LLM Generation]
E --> F[Parse Response]
F --> G{Valid Format?}
G -->|No| H[Retry with Clearer Prompt]
H --> E
G -->|Yes| I[Extract 4 Questions]
I --> J[Add Metadata<br/>question_id, context_chunks]
J --> K[Return Questions]
K --> L[End]
style E fill:#FFE6CC
style I fill:#D4EDDA
- Python: 3.8 or higher
- Ollama: Installed and running (
ollama serve) - Git: For cloning repository
- Operating System: Windows, macOS, or Linux
git clone <repository-url>
cd Project# Windows
python -m venv .venv
.venv\Scripts\activate
# macOS/Linux
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtVisit ollama.com and install for your OS.
# Pull the required model
ollama pull llama3.1Create .env file in project root:
# LangSmith (Optional - for monitoring)
LANGCHAIN_API_KEY=your_api_key_here
LANGCHAIN_PROJECT=learning-agent
LANGCHAIN_TRACING_V2=true
# Ollama Configuration
OLLAMA_MODEL=llama3.1:latest
OLLAMA_BASE_URL=http://localhost:11434# Test Ollama connection
python -c "from src.llm_service import LLMService; print('✓ LLM Service Ready')"
# Test all imports
python -c "from src import *; print('✓ All modules imported successfully')"streamlit run app.pyAccess at: http://localhost:8501
python -m src.multi_checkpoint-
Launch Application
streamlit run app.py
-
Select Learning Path
- Choose from predefined paths (Data Structures, ML, Web Dev, System Design)
- OR create custom topic with topic name
-
Upload Materials (Optional)
- Drag & drop PDF/DOCX/MD/TXT files
- Max 10MB per file
- System automatically extracts and processes text
-
Begin Learning
- Click "Start Learning Session"
- System gathers context from uploads + web search
- Generates 4 questions per checkpoint
-
Answer Questions
- Type answers in provided text areas
- Minimum 20 characters per answer
- Click "Submit Answers" when ready
-
View Results
- See detailed score (0-100%)
- Read feedback for each question
- If score >= 70%: Progress to next checkpoint
- If score < 70%: Receive Feynman teaching
-
Feynman Teaching (if needed)
- Review simplified explanations
- Study analogies and examples
- Click "I understand, retry questions"
- Maximum 3 retry attempts
-
Complete Path
- Progress through all checkpoints
- Track completion status
- View final summary
- Click "📝 Create Custom Topic"
- Enter topic name (e.g., "Quantum Computing Basics")
- System auto-generates:
- Unique ID
- Description
- Saves to
custom_topics.json
python -m src.multi_checkpointInteractive Flow:
- Select learning path from menu
- System executes workflow automatically
- Questions displayed in terminal
- Type answers and press Enter
- View scores and feedback
- Automatic progression or Feynman teaching
- Continue until all checkpoints complete
python -m src.mainfrom src.workflow import create_unified_workflow
from src.models import LearningAgentState
from src.sample_data import create_learning_paths
# Create workflow
workflow = create_unified_workflow()
app = workflow.compile()
# Prepare initial state
paths = create_learning_paths()
checkpoint = paths[0]['checkpoints'][0]
initial_state = {
"checkpoint": checkpoint,
"context_chunks": [],
"questions": [],
"verification_results": []
}
# Execute workflow
result = app.invoke(initial_state)
print(f"Score: {result.get('understanding_score', 0)}")from src.document_processor import DocumentProcessor
processor = DocumentProcessor()
# Process PDF
text = processor.extract_text_from_pdf("path/to/file.pdf")
# Process Word document
text = processor.extract_text_from_docx("path/to/file.docx")
# Detect format automatically
file_type = processor.detect_file_type("document.pdf") # Returns 'pdf'from src.llm_service import LLMService
import asyncio
llm_service = LLMService()
# Generate questions
context_chunks = [{"text": "Python is a programming language...", "chunk_id": "1"}]
requirements = ["Understand Python basics", "Know data types"]
questions = asyncio.run(
llm_service.generate_questions(context_chunks, requirements)
)
# Evaluate answer
score = asyncio.run(
llm_service.evaluate_answer_with_rag(
question="What is Python?",
answer="Python is a high-level programming language",
context_chunks=context_chunks
)
){
# Checkpoint definition
"checkpoint": Checkpoint,
"checkpoint_requirements": List[str],
# Content collection
"collected_materials": List[Material],
"summarized_materials": str,
"milestone1_score": float,
"context_is_relevant": bool,
# Processed context
"context_chunks": List[ProcessedContext],
# Question generation
"questions": List[GeneratedQuestion],
# User interaction
"learner_answers": List[LearnerAnswer],
# Evaluation
"verification_results": List[VerificationResult],
"understanding_score": float,
"meets_threshold": bool,
# Feynman teaching
"feynman_explanations": List[Dict],
"feynman_retry_requested": bool,
"feynman_retry_count": int,
# Metadata
"checkpoint_completed": bool,
"completion_timestamp": str
}{
"question_id": "q_001",
"question": "What is the time complexity of binary search?",
"context_chunks": ["chunk_001", "chunk_003"],
"expected_concepts": ["O(log n)", "divide and conquer"]
}-
Keyword Matching (40% weight)
- Extract expected concepts from context
- Check presence in answer
- Score: matches / total_concepts
-
Semantic Similarity (40% weight)
- Embed answer and context chunks
- Calculate cosine similarity
- Score: max_similarity
-
Completeness (20% weight)
- Answer length >= 50 chars: Full points
- 20-50 chars: Partial points
- < 20 chars: Low points
Final Score = (0.4 × keyword_score) + (0.4 × semantic_score) + (0.2 × completeness_score)
- Pass: Score >= 0.70 (70%)
- Fail: Score < 0.70 → Trigger Feynman teaching
- Max Retries: 3 attempts per checkpoint
# Collection: "learning_context"
{
"ids": ["chunk_001", "chunk_002", ...],
"documents": ["Text content...", ...],
"embeddings": [[0.1, 0.2, ...], ...],
"metadatas": [
{
"checkpoint_id": "ds_001",
"source": "web_search",
"timestamp": "2026-01-18T10:30:00"
},
...
]
}- Check: HTTP request to
http://localhost:11434 - Fallback: Display user-friendly error with setup instructions
- Retry: No automatic retry (user must fix)
- Validation: Size limit 10MB, format check
- Error Messages: Specific guidance for each error type
- Recovery: Allow re-upload without session loss
- Timeout: 30 seconds per operation
- Parsing Errors: Retry with clearer prompt
- Max Retries: 3 attempts before fallback
Creates complete learning workflow with all nodes.
Returns: StateGraph - Compiled LangGraph workflow
Usage:
workflow = create_unified_workflow()
app = workflow.compile()
result = app.invoke(initial_state)Creates partial workflow for Streamlit (stops before CLI prompts).
Returns: StateGraph - Workflow ending at question generation
Generate assessment questions.
Parameters:
context_chunks(List[ProcessedContext]): Embedded contentcheckpoint_requirements(List[str]): Learning objectives
Returns: List[GeneratedQuestion] - 4 questions
Score answer using RAG.
Parameters:
question(str): The question askedanswer(str): Learner's responsecontext_chunks(List[ProcessedContext]): Reference context
Returns: float - Score between 0.0 and 1.0
Identify concepts needing simplification.
Parameters:
verification_results(List[Dict]): Evaluation results
Returns: List[Dict] - Knowledge gaps with severity
Create Feynman-style explanation.
Parameters:
concept(str): Concept to explaincontext_chunks(List[ProcessedContext]): Reference material
Returns: str - Simplified explanation with analogies
Extract text from PDF file.
Parameters:
file_path(str): Path to PDF file
Returns: str - Extracted text content
Raises: Exception if file cannot be read
Score content relevance to objectives.
Parameters:
text(str): Content to evaluaterequirements(List[str]): Learning objectives
Returns: float - Relevance score 0.0-1.0
- One responsibility per file
- Clear naming conventions:
snake_casefor functions,PascalCasefor classes - Comprehensive docstrings: Google style
- Type hints: All function signatures
"""
Module description.
Detailed explanation of module purpose and capabilities.
"""
import logging
from typing import List, Dict, Optional
logger = logging.getLogger(__name__)
class ServiceName:
"""Service description."""
def __init__(self):
"""Initialize service."""
pass
def method_name(self, param: str) -> Dict:
"""
Method description.
Args:
param: Parameter description
Returns:
Return value description
Raises:
Exception: When error occurs
"""
logger.info(f"Processing {param}")
return {}# tests/test_llm_service.py
import pytest
from src.llm_service import LLMService
def test_question_generation():
"""Test question generation with sample context."""
llm = LLMService()
context = [{"text": "Python is...", "chunk_id": "1"}]
requirements = ["Understand Python"]
questions = asyncio.run(llm.generate_questions(context, requirements))
assert len(questions) == 4
assert all('question' in q for q in questions)# tests/test_workflow.py
def test_complete_workflow():
"""Test end-to-end workflow execution."""
workflow = create_unified_workflow()
app = workflow.compile()
initial_state = {...}
result = app.invoke(initial_state)
assert 'understanding_score' in result
assert result['checkpoint_completed'] == True- Define in
workflow_nodes.py:
def new_node(state: LearningAgentState) -> LearningAgentState:
"""New node description."""
logger.info("🔧 Executing new node")
# Node logic here
state["new_field"] = "value"
return state- Register in
workflow.py:
workflow.add_node("new_node", new_node)
workflow.add_edge("previous_node", "new_node")- Update
models.py:
class LearningAgentState(TypedDict):
...
new_field: str # Add new field- Update
document_processor.py:
def extract_text_from_new_format(self, file_path: str) -> str:
"""Extract from new format."""
# Implementation
return extracted_text
# Update supported_formats
self.supported_formats.append('.new')- Update
requirements.txt:
new-format-library>=1.0.0- Vector DB: ChromaDB persistent storage
- Session State: Streamlit session_state for UI
- LangGraph: Built-in checkpointing
# Use async for I/O operations
async def process_multiple_files(files: List[str]):
tasks = [process_file(f) for f in files]
return await asyncio.gather(*tasks)- DEBUG: Detailed debugging information
- INFO: General workflow progress (✅ ❌ 🔄 emojis)
- WARNING: Unusual situations
- ERROR: Error events
logger.info("✅ Questions generated successfully")
logger.warning("⚠️ Context relevance below optimal")
logger.error("❌ Failed to connect to Ollama")- Never commit
.envto repository - Use python-dotenv for loading
- Validate all API keys before use
- Validate file types: Whitelist only safe formats
- Check file sizes: Enforce limits
- Sanitize filenames: Remove special characters
- Scan content: Validate before processing
- Input validation: Sanitize user inputs
- Output filtering: Check generated content
- Rate limiting: Prevent abuse
Symptom: WinError 10061 - No connection could be made
Solution:
# Start Ollama server
ollama serve
# Verify model is pulled
ollama list
ollama pull llama3.1Symptom: ModuleNotFoundError: No module named 'X'
Solution:
# Reinstall dependencies
pip install -r requirements.txt
# Verify installation
pip list | grep langchainSymptom: sqlite3.OperationalError
Solution:
# Delete and reinitialize database
rm -rf chroma_db/
python -c "from src.context_processor import ContextProcessor; ContextProcessor()"Symptom: State not persisting between interactions
Solution:
- Use
st.session_statefor all persistent data - Initialize in
if 'key' not in st.session_state: - Clear with browser refresh or "Clear Cache" button
- Fork repository
- Create feature branch:
git checkout -b feature/new-feature - Make changes with tests
- Run tests:
pytest tests/ - Commit:
git commit -m "Add new feature" - Push:
git push origin feature/new-feature - Create Pull Request
- ✅ Code follows style guidelines
- ✅ All tests pass
- ✅ Documentation updated
- ✅ Type hints added
- ✅ Logging implemented
- ✅ Error handling included
This project is licensed under the MIT License. See LICENSE file for details.
- LangChain/LangGraph: Workflow orchestration framework
- Ollama: Local LLM execution
- Streamlit: Web UI framework
- ChromaDB: Vector database
- SentenceTransformers: Embedding models
For questions, issues, or contributions:
- GitHub Issues: Report bugs and request features
- Documentation: This file and inline code comments
- LangSmith: Monitor workflow executions at smith.langchain.com