Target Product Search Scraper - Selenium (Python)

A production-ready python scraper for extracting product search from target using Selenium. This scraper efficiently extracts breadcrumbs, pagination, products and related data from target pages.

What This Scraper Extracts
Quick Start
Supported URLs
Configuration
Output Schema
Anti-Bot Protection
How It Works
Error Handling & Troubleshooting
Alternative Implementations

What This Scraper Extracts

Products: Complete product listings with:
- Ratings and reviews
- Availability status
- Brand information
- Specifications and features
- Product images
Pagination: Current page, total pages, results per page, total results
Breadcrumbs: Navigation path showing category hierarchy
Search Metadata: Query information, result counts, and search type
Related Searches: Related search terms and suggestions
Recommendations: Product recommendations and related items

Quick Start

Prerequisites

Python 3.7 or higher
pip package manager (for Python) or npm (for Node.js)

Installation

Install required dependencies:

pip install selenium beautifulsoup4 requests

Get your ScrapeOps API key from https://scrapeops.io/app/register/ai-builder
Update the API key in the scraper:

API_KEY = "YOUR-API-KEY"  # Replace with your ScrapeOps API key

Running the Scraper

Navigate to the scraper directory:

cd python/selenium/product_search

Edit the URLs in scraper/target.com_scraper_product_search_v1.py:

if __name__ == "__main__":
    urls = [
        "https://www.target.com/s?searchTerm=men+shoes",
    ]

Run the scraper:

python scraper/target.com_scraper_product_search_v1.py

The scraper will generate a timestamped JSONL file (e.g., target_com_product_search_page_scraper_data_20260114_120000.jsonl) containing all extracted data.

Example Output

See example/product_search.json for a sample of the extracted data structure.

Supported URLs

This scraper supports target product search page URLs:

https://www.target.com
https://www.target.com/s?searchTerm=men+shoes

Configuration

Scraper Parameters

The scraper supports several configuration options. See the scraper code for available parameters.

ScrapeOps Configuration

The scraper can use ScrapeOps for anti-bot protection and request optimization:

API_KEY = "YOUR-API-KEY"  # Your ScrapeOps API key

payload = {
    "api_key": API_KEY,
    "url": url,
    "optimize_request": True,  # Enables request optimization
}

ScrapeOps Features:

Proxy rotation (may help reduce IP blocking)
Request header optimization (can help reduce detection)
Rate limiting management
Note: CAPTCHA challenges may occur depending on site behavior and cannot be guaranteed to be resolved automatically

Output Schema

The scraper outputs data in JSONL format (one JSON object per line). Each object contains:

Field	Type	Description	Example
`breadcrumbs`	null	Navigation breadcrumb path	`null`
`pagination`	object	Pagination information	`Object with 6 fields`
`products`	array	Array of objects with 21 fields each	`Array of objects (see example)`
`recommendations`	object	Product recommendations	`Object with 1 fields`
`relatedSearches`	array	Related search terms	`Array of objects (see example)`
`searchMetadata`	object	Search query metadata	`Object with 5 fields`
`sponsoredProducts`	null	Sponsored product information	`null`

Field Descriptions

The scraper outputs data in JSONL format (one JSON object per line). Each object contains the fields listed in the table above. See example/product_search.json for a complete example.

Product/Listing Fields:

products (array): Array of objects with 21 fields each
sponsoredProducts (null): Sponsored product information

Pagination Fields:

pagination (object): Pagination information

Navigation Fields:

breadcrumbs (null): Navigation breadcrumb path

Search Fields:

recommendations (object): Product recommendations
relatedSearches (array): Related search terms
searchMetadata (object): Search query metadata

Anti-Bot Protection

This scraper can integrate with ScrapeOps to help handle target's anti-bot measures:

Why ScrapeOps?

target may employ various anti-scraping measures including:

Rate limiting and IP blocking
Browser fingerprinting
CAPTCHA challenges (may occur depending on site behavior)
JavaScript rendering requirements
Request pattern analysis

ScrapeOps Integration

The scraper can use ScrapeOps proxy service which may provide:

Proxy Rotation: May help distribute requests across multiple IP addresses
Request Optimization: May optimize headers and request patterns to reduce detection
Retry Logic: Built-in retry mechanism with exponential backoff

Note: Anti-bot measures vary by site and may change over time. CAPTCHA challenges may occur and cannot be guaranteed to be resolved automatically. Using proxies and browser automation can help reduce blocking, but effectiveness depends on the target site's specific anti-bot measures.

Getting Started with ScrapeOps

Sign up for a free account at https://scrapeops.io/app/register/ai-builder
Get your API key from the dashboard
Replace YOUR-API-KEY in the scraper code
The scraper can use ScrapeOps for requests (if configured)

Free Tier: ScrapeOps offers a generous free tier perfect for testing and small-scale scraping.

How It Works

The scraper uses Selenium to navigate to target.com pages in a browser, wait for content to load, and extract structured data using CSS selectors and DOM parsing. The extracted data is normalized and saved in JSONL format for efficient processing.

Error Handling & Troubleshooting

1. No Data Extracted

Symptoms: Scraper runs but produces empty output files.

Solutions:

Verify the URL format is correct
Check if the page requires JavaScript rendering
Ensure your ScrapeOps API key is valid
Check network connectivity

2. Rate Limiting / Blocked Requests

Symptoms: HTTP 429 errors or empty responses.

Solutions:

Reduce concurrency settings
Increase delay between requests
Verify ScrapeOps API key has sufficient credits

3. Parsing Errors

Symptoms: Errors in extraction logic or missing fields.

Solutions:

The site may have updated their HTML structure
Check if selectors need updating
Review the actual HTML structure of the target page

Debugging

Enable detailed logging:

logging.basicConfig(level=logging.DEBUG)  # Change from INFO to DEBUG

This will show:

Request URLs and responses
Extraction steps
Parsing errors
Retry attempts

Retry Logic

The scraper includes retry logic with configurable retry attempts and exponential backoff.

Alternative Implementations

This repository provides multiple implementations for scraping target Product Search pages:

Python Implementations

BeautifulSoup - BeautifulSoup implementation
Playwright - Playwright implementation

Node.js Implementations

Cheerio & Axios - Cheerio & Axios implementation
Playwright - Playwright implementation
Puppeteer - Puppeteer implementation

Choosing the Right Implementation

Use BeautifulSoup/Cheerio when:

You need fast, lightweight scraping
JavaScript rendering is not required
You want minimal dependencies
You're scraping simple HTML pages

Use Playwright or Selenium when:

Pages require JavaScript rendering
You need to interact with dynamic content
You need to handle complex anti-bot measures
You want to simulate real browser behavior

Performance Considerations

Concurrency

The scraper supports concurrent requests. See the scraper code for configuration options.

Recommendations:

Start with minimal concurrency for testing
Gradually increase based on your ScrapeOps plan limits
Monitor for rate limiting or blocking

Output Format

Data is saved in JSONL format (one JSON object per line):

Efficient for large datasets
Easy to process line-by-line
Can be imported into databases or data processing tools
Each line is a complete, valid JSON object

Memory Usage

The scraper processes data incrementally:

Products are written to file immediately after extraction
No need to load entire dataset into memory
Suitable for scraping large pages

Best Practices

Respect Rate Limits: Use appropriate delays and concurrency settings
Monitor ScrapeOps Usage: Track your API usage in the ScrapeOps dashboard
Handle Errors Gracefully: Implement proper error handling and logging
Validate URLs: Ensure URLs are valid target pages before scraping
Update Selectors: target may change HTML structure; update selectors as needed
Test Regularly: Test scrapers regularly to catch breaking changes early

Support & Resources

ScrapeOps Documentation: https://scrapeops.io/docs
Framework Documentation: See framework-specific documentation
Example Output: See example/product_search.json for sample data structure
Scraper Code: See scraper/target.com_scraper_product_search_v1.py for implementation details

License

This scraper is provided as-is for educational and commercial use. Please ensure compliance with target's Terms of Service and robots.txt when using this scraper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Target Product Search Scraper - Selenium (Python)

Table of Contents

What This Scraper Extracts

Quick Start

Prerequisites

Installation

Running the Scraper

Example Output

Supported URLs

Configuration

Scraper Parameters

ScrapeOps Configuration

Output Schema

Field Descriptions

Anti-Bot Protection

Why ScrapeOps?

ScrapeOps Integration

Getting Started with ScrapeOps

How It Works

Error Handling & Troubleshooting

Debugging

Retry Logic

Alternative Implementations

Python Implementations

Node.js Implementations

Choosing the Right Implementation

Performance Considerations

Concurrency

Output Format

Memory Usage

Best Practices

Support & Resources

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Target Product Search Scraper - Selenium (Python)

Table of Contents

What This Scraper Extracts

Quick Start

Prerequisites

Installation

Running the Scraper

Example Output

Supported URLs

Configuration

Scraper Parameters

ScrapeOps Configuration

Output Schema

Field Descriptions

Anti-Bot Protection

Why ScrapeOps?

ScrapeOps Integration

Getting Started with ScrapeOps

How It Works

Error Handling & Troubleshooting

Debugging

Retry Logic

Alternative Implementations

Python Implementations

Node.js Implementations

Choosing the Right Implementation

Performance Considerations

Concurrency

Output Format

Memory Usage

Best Practices

Support & Resources

License