This script scrapes perfume data from Parfumo.com, specifically the main accords and their relative prominence for each perfume.
- Make sure you have Node.js installed
- Install dependencies:
npm install- Place your fragrances.csv file in the root directory
- Run the script:
node scraper.jsThe script will generate three files:
results.json: Contains successfully scraped data with main accords and their sizesfailed.txt: List of perfumes that failed to scrape with error messagesno_accords.txt: List of perfumes where no main accords were found
The results.json file will contain an array of objects with this structure:
[
{
"brand": "Brand Name",
"model": "Model Name",
"mainAccords": [
{
"name": "Accord Name",
"size": "large|medium|small"
}
]
}
]- The script includes a 1-second delay between requests to avoid overwhelming the server
- Some URLs may fail due to different formatting on Parfumo.com
- Not all perfumes have main accords listed