Target Scrapers - Python

Production-ready Python scrapers for extracting product category, product data, product search from target.com using BeautifulSoup, Playwright, Selenium.

A comprehensive collection of production-ready Python scrapers for extracting data from target.com. Build powerful target product category, product data, product search scrapers using BeautifulSoup, Playwright, Selenium. Perfect for web scraping target pages with integrated anti-bot protection.

📊 What Data You Can Scrape

These Python scrapers extract data from target.com:

Target Category Listing Pages (product_category) - Extract product listings from category/browse pages with pagination and subcategory navigation
Target Product Pages (product_data) - Extract detailed product information including specifications, pricing, images, reviews, and seller details
Target Search Result Pages (product_search) - Extract search results with product listings, pagination, related searches, and sponsored products

📁 Scraper Structure

Each scraper type in the Target repository follows this structure:

BeautifulSoup/
├── product_data/
│   ├── scraper/
│   │   └── {site}_scraper_product_v1.py
│   ├── example/
│   │   └── product.json
│   └── README.md
├── product_search/
│   ├── scraper/
│   │   └── {site}_scraper_product_search_v1.py
│   ├── example/
│   │   └── product_search.json
│   └── README.md
├── product_category/
│   ├── scraper/
│   │   └── {site}_scraper_product_category_v1.py
│   ├── example/
│   │   └── product_category.json
│   └── README.md
├── reviews/          # Coming soon
└── sellers/          # Coming soon

Each scraper directory contains:

scraper/ - Implementation files
example/ - Sample JSON output files
README.md - Detailed documentation for that scraper

🚀 Features

Multiple Framework Support: Choose from BeautifulSoup, Playwright, Selenium
Production-Ready: Battle-tested scrapers with error handling and retry logic
Anti-Bot Protection: Optional ScrapeOps support that may help with proxy rotation and request optimization
Comprehensive Data Extraction: Product data, search results, and category listings
JSONL Output Format: Efficient, line-by-line JSON output for easy processing
Well-Documented: Detailed READMEs for each scraper with examples and troubleshooting
Active Maintenance: Regular updates to handle target's changing HTML structure

📋 Requirements

Python: Python 3.7 or higher
pip: Python package manager
ScrapeOps API Key: Free tier available at https://scrapeops.io/app/register/ai-builder
Virtual Environment (recommended): For dependency isolation

🎯 Quick Start

Choose a framework based on your needs (see comparison below)
Navigate to the framework directory and follow its README for setup
Get your ScrapeOps API key from https://scrapeops.io/app/register/ai-builder

For framework-specific setup and usage, see:

BeautifulSoup/README.md - BeautifulSoup implementation
Playwright/README.md - Playwright implementation
Selenium/README.md - Selenium implementation

📚 Supported Frameworks

Framework	Speed	JavaScript	Dependencies	Browser	Best For
BeautifulSoup	⚡⚡⚡ Very Fast	❌ No	✅ Minimal	❌ None	Static HTML, high volume
Playwright	⚡⚡ Medium	✅ Yes	⚠️ Moderate	✅ Chromium/Firefox/WebKit	Modern JS sites, cross-browser
Selenium	⚡⚡ Medium	✅ Yes	⚠️ Moderate	✅ Chrome/Firefox/Edge	Legacy support, WebDriver

Framework Documentation

BeautifulSoup - BeautifulSoup implementation
Playwright - Playwright implementation
Selenium - Selenium implementation

🛡️ Anti-Bot Protection

All scrapers can integrate with ScrapeOps to help handle target's anti-bot measures:

Proxy Rotation: May help distribute requests across multiple IP addresses
Request Header Optimization: May optimize headers to reduce detection
Rate Limiting Management: Built-in rate limiting and retry logic

Note: Anti-bot measures vary by site and may change over time. CAPTCHA challenges may occur and cannot be guaranteed to be resolved automatically. Using proxies and browser automation can help reduce blocking, but effectiveness depends on the target site's specific anti-bot measures.

Free Tier Available: ScrapeOps offers a generous free tier perfect for testing and small-scale scraping.

Get your API key at https://scrapeops.io/app/register/ai-builder

📦 Output Format

All scrapers output data in JSONL format (one JSON object per line):

Efficient: Each line is a complete JSON object
Streamable: Process line-by-line without loading entire file
Database-Friendly: Easy to import into databases
Large Dataset Support: Handles millions of records efficiently

Example output file: {site}_com_product_page_scraper_data_20260114_120000.jsonl

🤔 Choosing the Right Framework

Use BeautifulSoup when:

✅ Pages don't require JavaScript rendering
✅ You need maximum speed and throughput
✅ You want minimal dependencies
✅ You're scraping static HTML content

Use Playwright when:

✅ Pages require JavaScript rendering
✅ You need cross-browser support
✅ You want modern async/await API
✅ You need to interact with dynamic elements

Use Selenium when:

✅ Pages require JavaScript rendering
✅ You prefer mature, widely-used framework
✅ You need WebDriver protocol support
✅ You want extensive community resources

⚠️ Common Issues & Solutions

Issue: "ModuleNotFoundError: No module named 'beautifulsoup4'"

Solution:

pip install beautifulsoup4 requests lxml
# Or use virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install beautifulsoup4 requests lxml

Issue: "Playwright browsers not installed"

Solution:

playwright install chromium
# Or install all browsers
playwright install

Issue: "Rate limiting or blocked requests"

Solution:

Verify ScrapeOps API key is correct
Check ScrapeOps dashboard for account status
Reduce concurrency settings
Increase delays between requests

Issue: "Empty output or missing data"

Solution:

Verify URL format is correct
Check if target changed HTML structure
Update selectors if needed
Enable debug logging to see extraction steps

🔗 Alternative Implementations

This repository also provides Node.js implementations:

Node.js Cheerio & Axios - Cheerio & Axios implementation
Node.js Playwright - Playwright implementation
Node.js Puppeteer - Puppeteer implementation

📖 Best Practices

Use Virtual Environments: Isolate dependencies per project
Respect Rate Limits: Use appropriate delays and concurrency settings
Monitor ScrapeOps Usage: Track your API usage in the ScrapeOps dashboard
Handle Errors Gracefully: Implement proper error handling and logging
Validate URLs: Ensure URLs are valid target pages before scraping
Update Selectors Regularly: target may change HTML structure
Test Regularly: Test scrapers regularly to catch breaking changes early
Handle Missing Data: Some products may not have all fields; handle null values appropriately
Browser Management: For browser automation, ensure proper cleanup and resource management
Use JSONL Format: Efficient for large datasets and streaming processing

📚 Resources & Documentation

Framework Documentation

BeautifulSoup: BeautifulSoup documentation
Playwright: Playwright documentation
Selenium: Selenium documentation

External Resources

ScrapeOps Documentation: https://scrapeops.io/docs
Python Documentation: https://docs.python.org/

Project Resources

Root README: ../README.md - Overview of all implementations
Framework READMEs: See individual framework directories for specific guides
Scraper READMEs: See individual scraper directories for detailed documentation

⚖️ License

This scraper is provided as-is for educational and commercial use. Please ensure compliance with target's Terms of Service and robots.txt when using these scrapers.

See LICENSE for full license details.

⚠️ Disclaimer

This software is provided for educational and commercial purposes. Users are responsible for ensuring their use complies with:

target's Terms of Service
target's robots.txt
Applicable laws and regulations
Rate limiting and respectful scraping practices

The authors and contributors are not responsible for any misuse of this software.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Target Scrapers - Python

📊 What Data You Can Scrape

📁 Scraper Structure

🚀 Features

📋 Requirements

🎯 Quick Start

📚 Supported Frameworks

Framework Documentation

🛡️ Anti-Bot Protection

📦 Output Format

🤔 Choosing the Right Framework

Use BeautifulSoup when:

Use Playwright when:

Use Selenium when:

⚠️ Common Issues & Solutions

Issue: "ModuleNotFoundError: No module named 'beautifulsoup4'"

Issue: "Playwright browsers not installed"

Issue: "Rate limiting or blocked requests"

Issue: "Empty output or missing data"

🔗 Alternative Implementations

📖 Best Practices

📚 Resources & Documentation

Framework Documentation

External Resources

Project Resources

⚖️ License

⚠️ Disclaimer

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Target Scrapers - Python

📊 What Data You Can Scrape

📁 Scraper Structure

🚀 Features

📋 Requirements

🎯 Quick Start

📚 Supported Frameworks

Framework Documentation

🛡️ Anti-Bot Protection

📦 Output Format

🤔 Choosing the Right Framework

Use BeautifulSoup when:

Use Playwright when:

Use Selenium when:

⚠️ Common Issues & Solutions

Issue: "ModuleNotFoundError: No module named 'beautifulsoup4'"

Issue: "Playwright browsers not installed"

Issue: "Rate limiting or blocked requests"

Issue: "Empty output or missing data"

🔗 Alternative Implementations

📖 Best Practices

📚 Resources & Documentation

Framework Documentation

External Resources

Project Resources

⚖️ License

⚠️ Disclaimer