Skip to content

amna-techcorp17/DataMiner-Pro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š DataMiner Pro

AI-Powered Web Scraping, Data Cleaning & Excel Report Generation Platform

πŸš€ Modern AI-Style Analytics & Scraping Dashboard


✨ Features

πŸ” Smart Web Scraping

  • Extract product/listing data from websites
  • Supports static and dynamic pages
  • Multi-page scraping support
  • Auto scraping mode detection

🧹 Intelligent Data Cleaning

  • Remove duplicates
  • Handle missing values
  • Normalize prices and ratings
  • Clean inconsistent formatting

πŸ“Š Analytics Dashboard

  • Product statistics
  • Average pricing insights
  • Rating analysis
  • Category breakdown
  • Duplicate tracking

πŸ“ Export System

  • Export CSV reports
  • Export professional Excel reports
  • Styled spreadsheets
  • Auto formatted columns

⚑ Advanced Features

  • Selenium support for JavaScript websites
  • Scheduled scraping support
  • Progress tracking
  • Live scraping logs
  • AI-style insights
  • Responsive dashboard UI


🧠 Workflow

Input Website URL
        ↓
Fetch HTML Content
        ↓
Parse Product Cards
        ↓
Extract Data
        ↓
Clean & Normalize
        ↓
Generate Analytics
        ↓
Export Excel/CSV Reports

Architecture

flowchart TD
    A["User enters URL"] --> B["Scraper: requests or Selenium"]
    B --> C["Parser: product/listing extraction"]
    C --> D["Cleaner: Pandas validation and normalization"]
    D --> E["Exporter: CSV and Excel"]
    D --> F["Dashboard: stats and charts"]
    G["Scheduler"] --> B
Loading

πŸ—οΈ Project Structure

dataminer-pro/
β”œβ”€β”€ main.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ .env
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ scraper.py
β”‚   β”œβ”€β”€ parser.py
β”‚   β”œβ”€β”€ cleaner.py
β”‚   β”œβ”€β”€ exporter.py
β”‚   β”œβ”€β”€ scheduler.py
β”‚   β”œβ”€β”€ config.py
β”‚   β”œβ”€β”€ logger.py
β”‚   └── utils.py
β”œβ”€β”€ output/
β”‚   β”œβ”€β”€ reports/
β”‚   └── csv/
β”œβ”€β”€ logs/
β”œβ”€β”€ screenshots/
└── assets/
    └── styles.css

πŸ› οΈ Tech Stack

Technology Purpose
Python Core Backend
BeautifulSoup HTML Parsing
Selenium Dynamic Scraping
Pandas Data Cleaning
Openpyxl Excel Reports
Streamlit Dashboard UI
APScheduler Automation

πŸš€ Installation

1️⃣ Clone Repository

git clone https://github.com/your-username/dataminer-pro.git
cd dataminer-pro

2️⃣ Create Virtual Environment

python -m venv venv

Windows

venv\Scripts\activate

Mac/Linux

source venv/bin/activate

3️⃣ Install Dependencies

pip install -r requirements.txt

▢️ Run Application

streamlit run main.py

🌐 Example Websites

Static Website

https://books.toscrape.com/

Dynamic Website

Use Selenium mode for:

  • Infinite scroll
  • JavaScript-rendered pages
  • Dynamic product loading

πŸ“Š Dashboard Features

Analytics Cards

  • Total Products
  • Average Price
  • Top Category
  • Duplicate Count
  • Average Rating

Interactive Table

  • Product preview
  • Filtering
  • Sorting
  • Pagination

Export Options

  • CSV Export
  • Excel Export
  • Styled Reports

βš™οΈ Scraping Modes

Mode Description
Auto Detect scraping strategy automatically
BeautifulSoup Fast static scraping
Selenium Dynamic JavaScript scraping

πŸ“ˆ Planned Improvements

  • πŸ€– AI-generated data insights
  • πŸ“§ Automated email reports
  • πŸ•’ Scheduled scraping jobs
  • 🌍 Proxy rotation
  • πŸ”’ CAPTCHA handling
  • πŸ“Š Interactive visual analytics
  • ☁️ Cloud deployment support

πŸ“Έ Highlights

This project demonstrates:

βœ… Web Scraping
βœ… Automation Engineering
βœ… Data Cleaning Pipelines
βœ… Analytics Dashboard Design
βœ… Excel Report Generation
βœ… Selenium Automation
βœ… BeautifulSoup Parsing
βœ… SaaS-style UI Development


πŸ§ͺ Test Dataset

Recommended demo site:

https://books.toscrape.com/


πŸ“¦ Requirements

beautifulsoup4
requests
selenium
pandas
openpyxl
streamlit
lxml
webdriver-manager
apscheduler

πŸ‘©β€πŸ’» Author

Amna Chaudhary

AI & Automation Developer

  • Generative AI
  • Agentic AI Systems
  • Automation Workflows
  • AI SaaS Applications

⭐ Support

If you found this project useful:

⭐ Star the repository
🍴 Fork the project
πŸ“’ Share with others


πŸ“„ License

This project is licensed under the MIT License.


πŸ”₯ DataMiner Pro

Turning raw web data into actionable insights.

About

AI-powered web scraping and Excel report generation platform built with Python, Selenium, BeautifulSoup, Pandas, and Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors