Skip to content

sgworld123/Path-Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Learning Path Generator

An AI-powered backend that takes in multiple types of learning resources — PDFs, GitHub READMEs, YouTube videos, and blog articles — and generates an optimized, ordered curriculum for learning a topic.

Overview

Given a mixed set of sources, the system:

  1. Extracts text from each source (PDF parsing, GitHub README fetching, YouTube transcript extraction, web scraping)
  2. Extracts key concepts using KeyBERT
  3. Generates semantic embeddings using SentenceTransformers
  4. Scores each source's difficulty (1-10) using the Gemini API
  5. Builds a dependency graph between sources based on concept similarity and difficulty
  6. Removes cycles to ensure a valid ordering exists
  7. Generates three candidate learning paths:
    • Easy-first — progressive difficulty
    • Hard-first — challenge-first
    • Balanced — interleaved difficulty
  8. Uses Gemini as an LLM-judge to evaluate all three paths and recommend the best one, with reasoning

Tech Stack

  • Framework: FastAPI (async)
  • NLP: KeyBERT for concept/keyphrase extraction
  • Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
  • LLM: Google Gemini 2.5 Flash — difficulty scoring + path evaluation
  • Algorithms: cosine similarity, cycle detection, topological sort (heap-based and median-selection variants)
  • Source ingestion: BeautifulSoup (web scraping), youtube-transcript-api, GitHub raw README fetching

How It Works

1. Source Ingestion

The API accepts a mix of uploaded files and URLs in a single request. URLs are auto-detected:

  • YouTube links → transcript extraction via youtube-transcript-api
  • GitHub links → raw README fetching
  • Other URLs → article text extraction via BeautifulSoup

2. Concept Extraction & Embedding

Each source's text is processed through KeyBERT to extract key concepts (1-2 word keyphrases), and SentenceTransformers generates a semantic embedding for the full text.

3. Difficulty Scoring

Gemini rates each source's technical difficulty on a 1-10 scale. Scores are cached to avoid redundant API calls.

4. Dependency Graph Construction

A directed graph is built where edges represent "should be learned before" relationships, determined by:

  • Concept similarity (cosine similarity ≥ 0.3)
  • Relative difficulty scores
  • Shared concept overlap (for sources of equal difficulty)

5. Cycle Removal

Any cycles in the dependency graph are detected and broken so a valid topological ordering exists.

6. Path Generation

Three topological sorts are run over the resulting graph:

  • Easy-first — min-heap ordered by difficulty
  • Hard-first — max-heap ordered by difficulty
  • Balanced — median-selection scheduling

7. LLM-Based Path Evaluation

All three paths, along with source metadata, are sent to Gemini, which returns a structured evaluation — pros, cons, best-fit learner type, and a recommended path with justification.

API

POST /generate-paths/

Accepts multipart form-data with files and/or URLs in any combination.

curl -X POST http://localhost:8000/generate-paths/ \
  -F "files=@intro.txt" \
  -F "files=@advanced.txt" \
  -F "urls=https://github.com/user/repo" \
  -F "urls=https://youtube.com/watch?v=VIDEO_ID" \
  -F "urls=https://some-blog.com/article"

Response shape:

{
  "sources": {
    "source_name": { "difficulty": 1, "concepts": ["..."] }
  },
  "graph": {
    "source_name": ["dependent_source_1", "dependent_source_2"]
  },
  "difficulty_scores": {
    "source_name": 1
  },
  "paths": {
    "easy_first": ["..."],
    "hard_first": ["..."],
    "balanced": ["..."]
  },
  "analysis": {
    "easy_first": { "pros": ["..."], "cons": ["..."], "best_for": "...", "score": 1 },
    "hard_first": { "pros": ["..."], "cons": ["..."], "best_for": "...", "score": 1 },
    "balanced": { "pros": ["..."], "cons": ["..."], "best_for": "...", "score": 1 },
    "best_path": "easy_first",
    "reason": "..."
  }
}

Setup

pip install fastapi uvicorn keybert sentence-transformers scikit-learn \
            google-generativeai httpx beautifulsoup4 youtube-transcript-api

# Add your Gemini API key
# genai.configure(api_key="YOUR_API_KEY")

uvicorn main:app --reload

Status

🚧 This is a working backend — the full pipeline (multi-source ingestion, concept extraction, embeddings, dependency graph construction, path generation, and LLM-based evaluation) is functional and tested.

It does not have a frontend yet :( — I'm actively working on this, and within a few weeks there will be a UI to make this usable end-to-end :)


⭐ Show Some Love

Thanks for stopping by and checking this out! If you found it interesting or useful, a star on the repo would mean a lot — it keeps me motivated to keep building 🚀

cat coding

Thanks for visiting

About

AI-powered learning path generator that transforms PDFs, GitHub repos, videos, and articles into structured curricula.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages