This project scrapes controller product data from the Flipkart website.
-
Install the required libraries:
pandas requests beautifulsoup4 -
I have created multiple models/scripts. You can choose any one of them to run.
-
Run the selected program.
While scraping, the API sometimes returns hidden or extra data that does not exist in the visible HTML. Because of this, incorrect or unwanted data gets added to the dataset, which affects data quality.
The scraper needs to be refined so that:
- Only data present in the actual HTML page is collected
- Dataset contains clean and valid product information
- Extra or unrelated data is filtered out before saving
The goal is to make the scraper reliable and ensure that the final dataset matches what users actually see on the Flipkart website.