Production-ready Python scrapers for extracting product category, product data, product search from target.com using BeautifulSoup, Playwright, Selenium.
A comprehensive collection of production-ready Python scrapers for extracting data from target.com. Build powerful target product category, product data, product search scrapers using BeautifulSoup, Playwright, Selenium. Perfect for web scraping target pages with integrated anti-bot protection.
These Python scrapers extract data from target.com:
- Target Category Listing Pages (
product_category) - Extract product listings from category/browse pages with pagination and subcategory navigation - Target Product Pages (
product_data) - Extract detailed product information including specifications, pricing, images, reviews, and seller details - Target Search Result Pages (
product_search) - Extract search results with product listings, pagination, related searches, and sponsored products
Each scraper type in the Target repository follows this structure:
BeautifulSoup/
├── product_data/
│ ├── scraper/
│ │ └── {site}_scraper_product_v1.py
│ ├── example/
│ │ └── product.json
│ └── README.md
├── product_search/
│ ├── scraper/
│ │ └── {site}_scraper_product_search_v1.py
│ ├── example/
│ │ └── product_search.json
│ └── README.md
├── product_category/
│ ├── scraper/
│ │ └── {site}_scraper_product_category_v1.py
│ ├── example/
│ │ └── product_category.json
│ └── README.md
├── reviews/ # Coming soon
└── sellers/ # Coming soon
Each scraper directory contains:
scraper/- Implementation filesexample/- Sample JSON output filesREADME.md- Detailed documentation for that scraper
- Multiple Framework Support: Choose from BeautifulSoup, Playwright, Selenium
- Production-Ready: Battle-tested scrapers with error handling and retry logic
- Anti-Bot Protection: Optional ScrapeOps support that may help with proxy rotation and request optimization
- Comprehensive Data Extraction: Product data, search results, and category listings
- JSONL Output Format: Efficient, line-by-line JSON output for easy processing
- Well-Documented: Detailed READMEs for each scraper with examples and troubleshooting
- Active Maintenance: Regular updates to handle target's changing HTML structure
- Python: Python 3.7 or higher
- pip: Python package manager
- ScrapeOps API Key: Free tier available at https://scrapeops.io/app/register/ai-builder
- Virtual Environment (recommended): For dependency isolation
- Choose a framework based on your needs (see comparison below)
- Navigate to the framework directory and follow its README for setup
- Get your ScrapeOps API key from https://scrapeops.io/app/register/ai-builder
For framework-specific setup and usage, see:
- BeautifulSoup/README.md - BeautifulSoup implementation
- Playwright/README.md - Playwright implementation
- Selenium/README.md - Selenium implementation
| Framework | Speed | JavaScript | Dependencies | Browser | Best For |
|---|---|---|---|---|---|
| BeautifulSoup | ⚡⚡⚡ Very Fast | ❌ No | ✅ Minimal | ❌ None | Static HTML, high volume |
| Playwright | ⚡⚡ Medium | ✅ Yes | ✅ Chromium/Firefox/WebKit | Modern JS sites, cross-browser | |
| Selenium | ⚡⚡ Medium | ✅ Yes | ✅ Chrome/Firefox/Edge | Legacy support, WebDriver |
- BeautifulSoup - BeautifulSoup implementation
- Playwright - Playwright implementation
- Selenium - Selenium implementation
All scrapers can integrate with ScrapeOps to help handle target's anti-bot measures:
- Proxy Rotation: May help distribute requests across multiple IP addresses
- Request Header Optimization: May optimize headers to reduce detection
- Rate Limiting Management: Built-in rate limiting and retry logic
Note: Anti-bot measures vary by site and may change over time. CAPTCHA challenges may occur and cannot be guaranteed to be resolved automatically. Using proxies and browser automation can help reduce blocking, but effectiveness depends on the target site's specific anti-bot measures.
Free Tier Available: ScrapeOps offers a generous free tier perfect for testing and small-scale scraping.
Get your API key at https://scrapeops.io/app/register/ai-builder
All scrapers output data in JSONL format (one JSON object per line):
- Efficient: Each line is a complete JSON object
- Streamable: Process line-by-line without loading entire file
- Database-Friendly: Easy to import into databases
- Large Dataset Support: Handles millions of records efficiently
Example output file: {site}_com_product_page_scraper_data_20260114_120000.jsonl
- ✅ Pages don't require JavaScript rendering
- ✅ You need maximum speed and throughput
- ✅ You want minimal dependencies
- ✅ You're scraping static HTML content
- ✅ Pages require JavaScript rendering
- ✅ You need cross-browser support
- ✅ You want modern async/await API
- ✅ You need to interact with dynamic elements
- ✅ Pages require JavaScript rendering
- ✅ You prefer mature, widely-used framework
- ✅ You need WebDriver protocol support
- ✅ You want extensive community resources
Solution:
pip install beautifulsoup4 requests lxml
# Or use virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install beautifulsoup4 requests lxmlSolution:
playwright install chromium
# Or install all browsers
playwright installSolution:
- Verify ScrapeOps API key is correct
- Check ScrapeOps dashboard for account status
- Reduce concurrency settings
- Increase delays between requests
Solution:
- Verify URL format is correct
- Check if target changed HTML structure
- Update selectors if needed
- Enable debug logging to see extraction steps
This repository also provides Node.js implementations:
- Node.js Cheerio & Axios - Cheerio & Axios implementation
- Node.js Playwright - Playwright implementation
- Node.js Puppeteer - Puppeteer implementation
- Use Virtual Environments: Isolate dependencies per project
- Respect Rate Limits: Use appropriate delays and concurrency settings
- Monitor ScrapeOps Usage: Track your API usage in the ScrapeOps dashboard
- Handle Errors Gracefully: Implement proper error handling and logging
- Validate URLs: Ensure URLs are valid target pages before scraping
- Update Selectors Regularly: target may change HTML structure
- Test Regularly: Test scrapers regularly to catch breaking changes early
- Handle Missing Data: Some products may not have all fields; handle null values appropriately
- Browser Management: For browser automation, ensure proper cleanup and resource management
- Use JSONL Format: Efficient for large datasets and streaming processing
- BeautifulSoup: BeautifulSoup documentation
- Playwright: Playwright documentation
- Selenium: Selenium documentation
- ScrapeOps Documentation: https://scrapeops.io/docs
- Python Documentation: https://docs.python.org/
- Root README: ../README.md - Overview of all implementations
- Framework READMEs: See individual framework directories for specific guides
- Scraper READMEs: See individual scraper directories for detailed documentation
This scraper is provided as-is for educational and commercial use. Please ensure compliance with target's Terms of Service and robots.txt when using these scrapers.
See LICENSE for full license details.
This software is provided for educational and commercial purposes. Users are responsible for ensuring their use complies with:
- target's Terms of Service
- target's robots.txt
- Applicable laws and regulations
- Rate limiting and respectful scraping practices
The authors and contributors are not responsible for any misuse of this software.