A production-ready python scraper for extracting product search from target using Selenium. This scraper efficiently extracts breadcrumbs, pagination, products and related data from target pages.
- What This Scraper Extracts
- Quick Start
- Supported URLs
- Configuration
- Output Schema
- Anti-Bot Protection
- How It Works
- Error Handling & Troubleshooting
- Alternative Implementations
- Products: Complete product listings with:
- Ratings and reviews
- Availability status
- Brand information
- Specifications and features
- Product images
- Pagination: Current page, total pages, results per page, total results
- Breadcrumbs: Navigation path showing category hierarchy
- Search Metadata: Query information, result counts, and search type
- Related Searches: Related search terms and suggestions
- Recommendations: Product recommendations and related items
- Python 3.7 or higher
- pip package manager (for Python) or npm (for Node.js)
- Install required dependencies:
pip install selenium beautifulsoup4 requests-
Get your ScrapeOps API key from https://scrapeops.io/app/register/ai-builder
-
Update the API key in the scraper:
API_KEY = "YOUR-API-KEY" # Replace with your ScrapeOps API key- Navigate to the scraper directory:
cd python/selenium/product_search- Edit the URLs in
scraper/target.com_scraper_product_search_v1.py:
if __name__ == "__main__":
urls = [
"https://www.target.com/s?searchTerm=men+shoes",
]- Run the scraper:
python scraper/target.com_scraper_product_search_v1.pyThe scraper will generate a timestamped JSONL file (e.g., target_com_product_search_page_scraper_data_20260114_120000.jsonl) containing all extracted data.
See example/product_search.json for a sample of the extracted data structure.
This scraper supports target product search page URLs:
https://www.target.comhttps://www.target.com/s?searchTerm=men+shoes
The scraper supports several configuration options. See the scraper code for available parameters.
The scraper can use ScrapeOps for anti-bot protection and request optimization:
API_KEY = "YOUR-API-KEY" # Your ScrapeOps API key
payload = {
"api_key": API_KEY,
"url": url,
"optimize_request": True, # Enables request optimization
}ScrapeOps Features:
- Proxy rotation (may help reduce IP blocking)
- Request header optimization (can help reduce detection)
- Rate limiting management
- Note: CAPTCHA challenges may occur depending on site behavior and cannot be guaranteed to be resolved automatically
The scraper outputs data in JSONL format (one JSON object per line). Each object contains:
| Field | Type | Description | Example |
|---|---|---|---|
breadcrumbs |
null | Navigation breadcrumb path | null |
pagination |
object | Pagination information | Object with 6 fields |
products |
array | Array of objects with 21 fields each | Array of objects (see example) |
recommendations |
object | Product recommendations | Object with 1 fields |
relatedSearches |
array | Related search terms | Array of objects (see example) |
searchMetadata |
object | Search query metadata | Object with 5 fields |
sponsoredProducts |
null | Sponsored product information | null |
The scraper outputs data in JSONL format (one JSON object per line). Each object contains the fields listed in the table above. See example/product_search.json for a complete example.
Product/Listing Fields:
products(array): Array of objects with 21 fields eachsponsoredProducts(null): Sponsored product informationpagination(object): Pagination informationbreadcrumbs(null): Navigation breadcrumb pathrecommendations(object): Product recommendationsrelatedSearches(array): Related search termssearchMetadata(object): Search query metadata- Rate limiting and IP blocking
- Browser fingerprinting
- CAPTCHA challenges (may occur depending on site behavior)
- JavaScript rendering requirements
- Request pattern analysis
- Proxy Rotation: May help distribute requests across multiple IP addresses
- Request Optimization: May optimize headers and request patterns to reduce detection
- Retry Logic: Built-in retry mechanism with exponential backoff
- Sign up for a free account at https://scrapeops.io/app/register/ai-builder
- Get your API key from the dashboard
- Replace
YOUR-API-KEYin the scraper code - The scraper can use ScrapeOps for requests (if configured)
- Verify the URL format is correct
- Check if the page requires JavaScript rendering
- Ensure your ScrapeOps API key is valid
- Check network connectivity
- Reduce concurrency settings
- Increase delay between requests
- Verify ScrapeOps API key has sufficient credits
- The site may have updated their HTML structure
- Check if selectors need updating
- Review the actual HTML structure of the target page
- Request URLs and responses
- Extraction steps
- Parsing errors
- Retry attempts
- BeautifulSoup - BeautifulSoup implementation
- Playwright - Playwright implementation
- Cheerio & Axios - Cheerio & Axios implementation
- Playwright - Playwright implementation
- Puppeteer - Puppeteer implementation
- You need fast, lightweight scraping
- JavaScript rendering is not required
- You want minimal dependencies
- You're scraping simple HTML pages
- Pages require JavaScript rendering
- You need to interact with dynamic content
- You need to handle complex anti-bot measures
- You want to simulate real browser behavior
- Start with minimal concurrency for testing
- Gradually increase based on your ScrapeOps plan limits
- Monitor for rate limiting or blocking
- Efficient for large datasets
- Easy to process line-by-line
- Can be imported into databases or data processing tools
- Each line is a complete, valid JSON object
- Products are written to file immediately after extraction
- No need to load entire dataset into memory
- Suitable for scraping large pages
- Respect Rate Limits: Use appropriate delays and concurrency settings
- Monitor ScrapeOps Usage: Track your API usage in the ScrapeOps dashboard
- Handle Errors Gracefully: Implement proper error handling and logging
- Validate URLs: Ensure URLs are valid target pages before scraping
- Update Selectors: target may change HTML structure; update selectors as needed
- Test Regularly: Test scrapers regularly to catch breaking changes early
- ScrapeOps Documentation: https://scrapeops.io/docs
- Framework Documentation: See framework-specific documentation
- Example Output: See
example/product_search.jsonfor sample data structure - Scraper Code: See
scraper/target.com_scraper_product_search_v1.pyfor implementation details
Pagination Fields:
Navigation Fields:
Search Fields:
This scraper can integrate with ScrapeOps to help handle target's anti-bot measures:
target may employ various anti-scraping measures including:
The scraper can use ScrapeOps proxy service which may provide:
Note: Anti-bot measures vary by site and may change over time. CAPTCHA challenges may occur and cannot be guaranteed to be resolved automatically. Using proxies and browser automation can help reduce blocking, but effectiveness depends on the target site's specific anti-bot measures.
Free Tier: ScrapeOps offers a generous free tier perfect for testing and small-scale scraping.
The scraper uses Selenium to navigate to target.com pages in a browser, wait for content to load, and extract structured data using CSS selectors and DOM parsing. The extracted data is normalized and saved in JSONL format for efficient processing.
1. No Data Extracted
Symptoms: Scraper runs but produces empty output files.
Solutions:
2. Rate Limiting / Blocked Requests
Symptoms: HTTP 429 errors or empty responses.
Solutions:
3. Parsing Errors
Symptoms: Errors in extraction logic or missing fields.
Solutions:
Enable detailed logging:
logging.basicConfig(level=logging.DEBUG) # Change from INFO to DEBUGThis will show:
The scraper includes retry logic with configurable retry attempts and exponential backoff.
This repository provides multiple implementations for scraping target Product Search pages:
Use BeautifulSoup/Cheerio when:
Use Playwright or Selenium when:
The scraper supports concurrent requests. See the scraper code for configuration options.
Recommendations:
Data is saved in JSONL format (one JSON object per line):
The scraper processes data incrementally:
This scraper is provided as-is for educational and commercial use. Please ensure compliance with target's Terms of Service and robots.txt when using this scraper.