Skip to content

anilabhadatta/educative.io_scraper

Repository files navigation

Educative.io Scraper & Downloader

This tool efficiently scrapes and saves Educative.io courses, paths, projects, and cloud labs for offline use. It extracts course data directly via the Educative API and stores it in a local database for rendering.

Disclaimer: I am not accountable for any inappropriate use of this scraper. I developed it solely for research purposes and take no responsibility for its misuse.

  Repository Version: v4.0.36 (Recommended)
  Master Branch: v4-master

🚀 Getting Started

Prerequisites

  • Python 3.12 or higher
  • Supported OS: Windows / macOS / Linux

Installation

Clone the repository and install dependencies using the automated setup script.

git clone https://github.com/anilabhadatta/educative.io_scraper.git
cd educative.io_scraper

# For Windows:
python setup.py --install
python setup.py --run

# For macOS/Linux:
python3 setup.py --install
python3 setup.py --run

Note: --install creates a virtual environment and installs dependencies. --run starts the scraper GUI.


Recommeded GUI Settings
---

🛠️ How to Use (Recommended Workflow)

The scraper is optimized to use the API-JSON-Scraper, which is significantly faster, cleaner, and more reliable than traditional browser automation.

Step 1: Generate the Course URLs Excel File

To quickly get the exact URLs of all available courses, paths, and projects:

  1. Open the Scraper GUI (python EducativeScraper.py).
  2. Select All-Course-Urls-Text-File-Generator as the Scraper Type.
  3. Click Start Scraper.
  4. The script will fetch the latest sitemaps and API data. Once finished, a file named educative_sitemap_analysis_updated.xlsx will be generated in your project folder containing highly organized, categorized links.

Step 2: Prepare Your Download List

  1. Create a plain text file (e.g., urls.txt).
  2. Open the generated educative_sitemap_analysis_updated.xlsx file.
  3. Copy the Topic Link URLs for the items you want to download and paste them into your text file.
    • Note: For Projects, just use the main Project Link from the spreadsheet, not the topic link.

Step 3: Run the API Scraper

  1. In the Scraper GUI, select API-JSON-Scraper as the Scraper Type.
  2. Select the Text File you created in Step 2.
  3. Select a Save Directory where you want the database to be stored.
  4. Click Login Account to authenticate your Educative session. A browser will open—log in, and once authenticated, close the browser window.
  5. Click Start Scraper. The scraper will use internal APIs to download the courses cleanly into a local database.

Step 4: Scraping Public Content (Answers, Blog, Newsletter)

If you want to download free public content from Educative (e.g., Blog posts, Edpresso Answers, Newsletters):

  1. Ensure your text file contains the desired public URLs (these are also categorized in the Excel file generated in Step 1).
  2. In the Scraper GUI, select Public-Content-Scraper as the Scraper Type.
  3. Select your Text File and Save Directory.
  4. Click Start Scraper. The tool will use clean internal JSON APIs to fetch public pages and store them in the database identically to standard courses.

Step 5: Extract and Download Assets

Once the courses or public content are scraped, use the GUI to run Extract Assets, followed by Download Assets. This step will fetch all embedded images, SVGs, and files referenced in the content.


📖 Viewing the Courses

To view your newly downloaded courses, you should use the Educative-Viewer V5. The viewer reads the database generated by this scraper and dynamically renders the courses in an interface extremely close to the native Educative.io experience.


⚠️ Important Notes & Tips

  • Overwrite Option: If you enable "Overwrite" to force redownloads, note that scraping will begin exactly from the topic link provided in your text file. Because the links generated in the Excel file are not guaranteed to be the very first lesson of a course, using them with Overwrite may result in only partially overwriting the course from that mid-point onward. To overwrite an entire course, ensure you manually provide its true first topic link.
  • Auto Resume: Automatically restarts the scraper process up to 3 times if it crashes due to errors, reading the log to resume from the exact failed URL.
  • DB Skipping: Independently of Auto Resume, the DB inherently tracks topic status. If you restart the scraper, it natively skips any already-downloaded topics to seamlessly resume from leftovers.
  • Auto Fix URL: Automatically updates your text file to remove URLs that have been fully scraped, preventing unnecessary rescraping.
  • SeleniumBase(uc mode): Runs the browser in an undetected mode to bypass anti-bot challenges (like Cloudflare) during login. Enable this if you get blocked.
  • Retry Failed URLs: Used by the Asset Downloader to specifically retry downloading any images or static assets that failed during previous runs.
  • Proxies: You can configure IP proxies natively in the GUI if needed.
  • Headless Mode: If you do not want to see the browser window during operations, choose the headless option.

About

Educative.io Course Downloader developed using Python and Selenium. Refer Readme.md for setup instructions.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors