📋 Quick Reference Guide

A quick lookup guide for common NLP commands, concepts, and notebook structure.

🚀 Quick Commands

Installation & Setup

# Clone repository
git clone https://github.com/intronep666/Natural-Language-Processing.git

# Install dependencies
pip install -r requirements.txt

# Download NLP models
python -m spacy download en_core_web_sm
python -c "import nltk; nltk.download('punkt')"

# Launch Jupyter
jupyter notebook

Jupyter Shortcuts

Action	Shortcut
Run cell	`Shift + Enter`
New cell below	`B`
New cell above	`A`
Delete cell	`D + D`
Save notebook	`Ctrl + S`
Toggle comment	`Ctrl + /`

📚 Practical Notebooks at a Glance

#	Notebook	Main Topic	Key Skills
01	Comprehensive Pipeline	Full NLP workflow	Tokenization, POS, NER, Lemmatization
02	N-Gram Analysis	Word sequences	Unigrams, bigrams, trigrams, probability
03	Feature Extraction	TF-IDF	Vectorization, importance weighting
04	Word Embeddings	Semantic vectors	Word2Vec, GloVe, FastText, BERT
05	Text Classification	Supervised learning	Naïve Bayes, SVM
06	K-Means Clustering	Unsupervised learning	Document clustering, similarity
07	POS Tagging	Grammar analysis	Part-of-speech, syntax
08	LSTM Sentiment	Neural networks	Sequence models, sentiment
09	Advanced LSTM	Regularization	Dropout, overfitting prevention
10	Spam Detection	Real-world app	Bag-of-Words, classification

🔤 NLP Concepts Quick Reference

Tokenization

from nltk.tokenize import word_tokenize
tokens = word_tokenize("Hello world!")
# Output: ['Hello', 'world', '!']

Lemmatization

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
lemmatizer.lemmatize("running")
# Output: "run"

Stemming

from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
stemmer.stem("running")
# Output: "run"

Stop Words

from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
# Remove: "the", "a", "is", etc.

TF-IDF

from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)

Word2Vec

from gensim.models import Word2Vec
model = Word2Vec(sentences, vector_size=100, window=5)
vector = model.wv['word']

POS Tagging

from nltk import pos_tag, word_tokenize
tokens = word_tokenize("I love NLP")
tags = pos_tag(tokens)
# Output: [('I', 'PRP'), ('love', 'VB'), ('NLP', 'NN')]

Named Entity Recognition

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is in California")
for ent in doc.ents:
    print(f"{ent.text} → {ent.label_}")

Text Classification

from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

K-Means Clustering

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
labels = kmeans.fit_predict(X)

LSTM Sentiment

from tensorflow.keras.layers import LSTM, Dense, Embedding
model = Sequential([
    Embedding(vocab_size, embedding_dim),
    LSTM(64),
    Dense(1, activation='sigmoid')
])

📊 Common NLP Metrics

Classification

Accuracy: (TP + TN) / Total
Precision: TP / (TP + FP) - Correctness
Recall: TP / (TP + FN) - Coverage
F1-Score: 2 × (Precision × Recall) / (Precision + Recall)

Clustering

Silhouette Score: -1 to 1 (higher is better)
Davies-Bouldin Index: Lower is better
Calinski-Harabasz Index: Higher is better

🔧 Common Import Statements

# NLTK
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import stopwords
from nltk import pos_tag

# spaCy
import spacy
nlp = spacy.load("en_core_web_sm")

# Data Processing
import pandas as pd
import numpy as np

# Machine Learning
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score, classification_report

# Deep Learning
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Word Embeddings
from gensim.models import Word2Vec, FastText
import gensim.downloader as api

# Transformers
from transformers import BertTokenizer, BertModel

📁 File Organization

Your Local Folder:
├── 01_Comprehensive_NLP_Pipeline_Linguistic_Analysis.ipynb
├── 02_N_Gram_Analysis_Tokenization_Probability.ipynb
├── ... (practicals 3-10)
├── README.md          ← Start here!
├── GETTING_STARTED.md ← Setup guide
├── CONTRIBUTING.md    ← How to contribute
├── CHANGELOG.md       ← What's new
├── LICENSE            ← MIT License
└── requirements.txt   ← Install dependencies

🎯 Recommended Learning Path

Beginner (First Time with NLP)

Read: README.md overview
Do: Setup from GETTING_STARTED.md
Run: Practical 01 (Comprehensive Pipeline)
Run: Practical 02 (N-Grams)
Run: Practical 03 (TF-IDF)

Intermediate (Some ML knowledge)

Run: Practical 04 (Word Embeddings)
Run: Practical 05 (Classification)
Run: Practical 06 (Clustering)
Run: Practical 07 (POS Tagging)

Advanced (Deep Learning)

Run: Practical 08 (LSTM)
Run: Practical 09 (Advanced LSTM)
Run: Practical 10 (Real-world App)

🐛 Troubleshooting Quick Fixes

Problem	Solution
`ModuleNotFoundError`	`pip install [package]`
NLTK data missing	`nltk.download('[resource]')`
spaCy model missing	`python -m spacy download en_core_web_sm`
Kernel crashes	Close other apps, reduce batch size
Slow training	Use smaller dataset, GPU acceleration
Out of memory	Reduce batch size, use smaller model

📖 Document Quick Links

Document	Purpose
README.md	Full project overview
GETTING_STARTED.md	Setup & installation
CONTRIBUTING.md	Contribution guidelines
CHANGELOG.md	Version history
LICENSE	MIT License terms

🌐 External Resources

Documentation

Tutorials

Communities

⚡ Pro Tips

Use Virtual Environment: Keep dependencies isolated
Start Small: Test with small datasets first
Save Often: Ctrl + S in Jupyter frequently
Read Comments: Every notebook has detailed comments
Modify Code: Don't just run - change parameters and experiment!
Document Your Work: Add notes to notebooks
Version Your Models: Save trained models for reuse
Use GPU: If available, speeds up training 10-100x
Monitor Resources: Watch RAM/CPU usage during training
Ask for Help: Create GitHub issues or contact maintainer

📞 Need Help?

Questions? Check GETTING_STARTED.md troubleshooting section
Want to contribute? See CONTRIBUTING.md
Found a bug? Open a GitHub Issue
Email: prexitjoshi@gmail.com

Happy Learning! 🚀

Last Updated: November 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📋 Quick Reference Guide

🚀 Quick Commands

Installation & Setup

Jupyter Shortcuts

📚 Practical Notebooks at a Glance

🔤 NLP Concepts Quick Reference

Tokenization

Lemmatization

Stemming

Stop Words

TF-IDF

Word2Vec

POS Tagging

Named Entity Recognition

Text Classification

K-Means Clustering

LSTM Sentiment

📊 Common NLP Metrics

Classification

Clustering

🔧 Common Import Statements

📁 File Organization

🎯 Recommended Learning Path

Beginner (First Time with NLP)

Intermediate (Some ML knowledge)

Advanced (Deep Learning)

🐛 Troubleshooting Quick Fixes

📖 Document Quick Links

🌐 External Resources

Documentation

Tutorials

Communities

⚡ Pro Tips

📞 Need Help?

FilesExpand file tree

QUICK_REFERENCE.md

Latest commit

History

QUICK_REFERENCE.md

File metadata and controls

📋 Quick Reference Guide

🚀 Quick Commands

Installation & Setup

Jupyter Shortcuts

📚 Practical Notebooks at a Glance

🔤 NLP Concepts Quick Reference

Tokenization

Lemmatization

Stemming

Stop Words

TF-IDF

Word2Vec

POS Tagging

Named Entity Recognition

Text Classification

K-Means Clustering

LSTM Sentiment

📊 Common NLP Metrics

Classification

Clustering

🔧 Common Import Statements

📁 File Organization

🎯 Recommended Learning Path

Beginner (First Time with NLP)

Intermediate (Some ML knowledge)

Advanced (Deep Learning)

🐛 Troubleshooting Quick Fixes

📖 Document Quick Links

🌐 External Resources

Documentation

Tutorials

Communities

⚡ Pro Tips

📞 Need Help?