Quora Duplicate Question Pairs Detector

🔍 Overview

This Streamlit web application predicts whether two questions from Quora are duplicates, aiming to reduce redundancy and enhance user experience on Q&A platforms. By leveraging Natural Language Processing (NLP) techniques and machine learning models, the app provides real-time predictions on question similarity.

🧠 Features

User-Friendly Interface: Input two questions and receive instant feedback on their similarity.
Preprocessing Pipeline: Includes text cleaning, stopword removal, and vectorization using Bag of Words (BoW).
Machine Learning Model: Utilizes a trained model (e.g., Logistic Regression) to predict duplicate questions.

🗂 Project Structure

├── app.py # Main Streamlit application
├── helper.py # Helper functions for preprocessing
├── model.pkl # Trained ML model
├── cv.pkl # CountVectorizer object
├── stopwords.pkl # List of stopwords
├── requirements.txt # Python dependencies
└── README.md # Project documentation

🚀 Getting Started

Would recommend using powershell/windows for streamlit apps .

Prerequisites

Python 3.7 or higher
pip (Python package installer)

Installation & Setup

Clone the repository:

https://github.com/vennelavarshini18/Quora-Duplicate-Question-Pairs-Detector.git

Navigate to the project directory:

cd Quora-Duplicate-Question-Pairs-Detector

Create a virtual environment (optional but recommended):
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
venv\Scripts\activate
```
- Mac/Linux:
```
source venv/bin/activate
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Running the Application

After installation, follow these steps to run the project:

Start the application:
```
streamlit run app.py
```
Open your browser and go to:
```
https://127.0.0.1:5000/
```
(or the address displayed in your terminal)

📈 Model Details

Vectorization: Bag of Words (BoW) using CountVectorizer
Model: Logistic Regression trained on preprocessed question pairs
Evaluation Metric: Accuracy score on a validation set

📚 References

🤝 Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quora Duplicate Question Pairs Detector

🔍 Overview

🧠 Features

🗂 Project Structure

🚀 Getting Started

Prerequisites

Installation & Setup

Running the Application

📈 Model Details

📚 References

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
README.md		README.md
app.py		app.py
cv.pkl		cv.pkl
helper.py		helper.py
model.pkl		model.pkl
requirements.txt		requirements.txt
stopwords.pkl		stopwords.pkl

Folders and files

Latest commit

History

Repository files navigation

Quora Duplicate Question Pairs Detector

🔍 Overview

🧠 Features

🗂 Project Structure

🚀 Getting Started

Prerequisites

Installation & Setup

Running the Application

📈 Model Details

📚 References

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages