Skip to content

vennelavarshini18/QueryClone-Detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quora Duplicate Question Pairs Detector

Web App Image

🔍 Overview

This Streamlit web application predicts whether two questions from Quora are duplicates, aiming to reduce redundancy and enhance user experience on Q&A platforms. By leveraging Natural Language Processing (NLP) techniques and machine learning models, the app provides real-time predictions on question similarity.

🧠 Features

  • User-Friendly Interface: Input two questions and receive instant feedback on their similarity.
  • Preprocessing Pipeline: Includes text cleaning, stopword removal, and vectorization using Bag of Words (BoW).
  • Machine Learning Model: Utilizes a trained model (e.g., Logistic Regression) to predict duplicate questions.

🗂 Project Structure

├── app.py # Main Streamlit application
├── helper.py # Helper functions for preprocessing
├── model.pkl # Trained ML model
├── cv.pkl # CountVectorizer object
├── stopwords.pkl # List of stopwords
├── requirements.txt # Python dependencies
└── README.md # Project documentation

🚀 Getting Started

Would recommend using powershell/windows for streamlit apps .

Prerequisites

  • Python 3.7 or higher
  • pip (Python package installer)

Installation & Setup

  1. Clone the repository:
    https://github.com/vennelavarshini18/Quora-Duplicate-Question-Pairs-Detector.git
  2. Navigate to the project directory:
    cd Quora-Duplicate-Question-Pairs-Detector 
  3. Create a virtual environment (optional but recommended):
    python -m venv venv
  4. Activate the virtual environment:
    • Windows:
      venv\Scripts\activate
    • Mac/Linux:
      source venv/bin/activate
  5. Install the required dependencies:
    pip install -r requirements.txt

Running the Application

After installation, follow these steps to run the project:

  1. Start the application:
    streamlit run app.py
  2. Open your browser and go to:
    https://127.0.0.1:5000/
    
    (or the address displayed in your terminal)

📈 Model Details

  • Vectorization: Bag of Words (BoW) using CountVectorizer
  • Model: Logistic Regression trained on preprocessed question pairs
  • Evaluation Metric: Accuracy score on a validation set

📚 References

🤝 Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages