A collection of production-ready Python scrapers for extracting data from target.com using BeautifulSoup. These scrapers provide fast, lightweight HTML parsing without requiring browser automation, making them ideal for scraping static HTML content.
This directory contains Python scrapers built with BeautifulSoup.
Documentation: product_category/README.md
Quick Start:
cd product_category
pip install beautifulsoup4 requests lxml
python scraper/*.pyDocumentation: product_data/README.md
Quick Start:
cd product_data
pip install beautifulsoup4 requests lxml
python scraper/*.pyDocumentation: product_search/README.md
Quick Start:
cd product_search
pip install beautifulsoup4 requests lxml
python scraper/*.pyBeautifulSoup is the best choice when:
- ✅ You need fast, lightweight scraping
- ✅ JavaScript rendering is not required
- ✅ You want minimal dependencies
- ✅ You're scraping simple HTML pages
- ✅ You need high-performance scraping
- ✅ You prefer Python ecosystem
Consider browser automation (Playwright or Selenium) when:
- ❌ Pages require JavaScript rendering
- ❌ Content is dynamically loaded
- ❌ You need to interact with page elements
- Python: Python 3.7 or higher
- pip: Python package manager
- ScrapeOps API Key: For anti-bot protection (free tier available)
- Navigate to the specific scraper directory:
cd product_category # or product_data, product_search- Install dependencies:
pip install beautifulsoup4 requests lxml-
Get your ScrapeOps API key from https://scrapeops.io/app/register/ai-builder
-
Update the API key in the scraper file:
API_KEY = 'YOUR-API-KEY'All scrapers can integrate with ScrapeOps to help handle target's anti-bot measures:
- Proxy rotation (may help reduce IP blocking)
- Request header optimization (can help reduce detection)
- Rate limiting management
Note: Anti-bot measures vary by site and may change over time. CAPTCHA challenges may occur and cannot be guaranteed to be resolved automatically. Using proxies and browser automation can help reduce blocking, but effectiveness depends on the target site's specific anti-bot measures.
Free Tier Available: ScrapeOps offers a generous free tier perfect for testing and small-scale scraping.
All scrapers output data in JSONL format (one JSON object per line):
- Each line represents one product/result
- Efficient for large datasets
- Easy to process line-by-line
- Can be imported into databases or data processing tools
Example output files:
{site}_com_product_category_page_scraper_data_20260114_120000.jsonl{site}_com_product_page_scraper_data_20260114_120000.jsonl{site}_com_product_search_page_scraper_data_20260114_120000.jsonl
This repository provides multiple implementations for different use cases:
- Playwright - Playwright implementation
- Selenium - Selenium implementation
- Cheerio & Axios - Cheerio & Axios implementation
- Playwright - Playwright implementation
- Puppeteer - Puppeteer implementation
BeautifulSoup/
BeautifulSoup/
├── product_category/
│ ├── scraper/
│ │ └── target_scraper_product_category_v1.py
│ ├── example/
│ │ └── product_category.json
│ └── README.md
├── product_data/
│ ├── scraper/
│ │ └── target_scraper_product_data_v1.py
│ ├── example/
│ │ └── product_data.json
│ └── README.md
├── product_search/
│ ├── scraper/
│ │ └── target_scraper_product_search_v1.py
│ ├── example/
│ │ └── product_search.json
│ └── README.md
- Respect Rate Limits: Use appropriate delays and concurrency settings
- Monitor ScrapeOps Usage: Track your API usage in the ScrapeOps dashboard
- Handle Errors Gracefully: Implement proper error handling and logging
- Validate URLs: Ensure URLs are valid target pages before scraping
- Update Selectors: target may change HTML structure; update selectors as needed
- Test Regularly: Test scrapers regularly to catch breaking changes early
- Handle Missing Data: Some products may not have all fields; handle null values appropriately
- ScrapeOps Documentation: https://scrapeops.io/docs
- BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- Example Outputs: See
example/folders in each scraper directory
This scraper is provided as-is for educational and commercial use. Please ensure compliance with target's Terms of Service and robots.txt when using these scrapers.