This project contains a Python script to download videos from Instagram and Google Drive links stored in CSV files.
- Downloads videos from Instagram reels and Google Drive links
- Processes CSV files with video IDs and URLs
- Saves downloaded videos to the
data/rawfolder with ID-based filenames - Comprehensive logging and progress tracking
- Handles rate limiting and error recovery
- Install the required dependencies:
pip install -r requirements.txt- Make sure you have Python 3.7+ installed.
To download all videos from the CSV files:
python scripts/download_links.pyThe script will:
- Process both
datatrain.csvanddatatest.csvfiles - Download videos from Instagram and Google Drive links
- Save files as
{id}.mp4in thedata/rawfolder - Log progress to both console and
download_log.txt
The script expects CSV files with the following structure:
- Column A: Video ID (starting from row 2)
- Column B: Video URL (starting from row 2)
Example:
id,video
1,https://www.instagram.com/reel/ABC123/
2,https://drive.google.com/file/d/xyz789/view- Downloaded videos are saved in
data/raw/folder - Files are named as
{id}.mp4(e.g.,1.mp4,2.mp4) - Progress and errors are logged to
download_log.txt
- Instagram: Reels and posts (using yt-dlp)
- Google Drive: Direct file downloads
- The script continues downloading even if some files fail
- Failed downloads are logged with detailed error messages
- Duplicate files are skipped automatically
- Rate limiting is implemented to avoid being blocked
- Instagram downloads may require authentication for private content
- Google Drive links must be publicly accessible
- The script includes a 1-second delay between downloads to avoid rate limiting
- Large files may take time to download depending on your internet connection
- Instagram download failures: Some Instagram content may be private or require login
- Google Drive access denied: Ensure the Google Drive links are publicly accessible
- Network errors: Check your internet connection and try again
- Permission errors: Ensure you have write permissions to the
data/rawfolder
pandas: CSV file processingrequests: HTTP requests for Google Drive downloadsyt-dlp: Instagram video downloadstqdm: Progress barspathlib: File path handling