A quick lookup guide for common NLP commands, concepts, and notebook structure.
# Clone repository
git clone https://github.com/intronep666/Natural-Language-Processing.git
# Install dependencies
pip install -r requirements.txt
# Download NLP models
python -m spacy download en_core_web_sm
python -c " import nltk; nltk.download('punkt')"
# Launch Jupyter
jupyter notebook
Action
Shortcut
Run cell
Shift + Enter
New cell below
B
New cell above
A
Delete cell
D + D
Save notebook
Ctrl + S
Toggle comment
Ctrl + /
📚 Practical Notebooks at a Glance
#
Notebook
Main Topic
Key Skills
01
Comprehensive Pipeline
Full NLP workflow
Tokenization, POS, NER, Lemmatization
02
N-Gram Analysis
Word sequences
Unigrams, bigrams, trigrams, probability
03
Feature Extraction
TF-IDF
Vectorization, importance weighting
04
Word Embeddings
Semantic vectors
Word2Vec, GloVe, FastText, BERT
05
Text Classification
Supervised learning
Naïve Bayes, SVM
06
K-Means Clustering
Unsupervised learning
Document clustering, similarity
07
POS Tagging
Grammar analysis
Part-of-speech, syntax
08
LSTM Sentiment
Neural networks
Sequence models, sentiment
09
Advanced LSTM
Regularization
Dropout, overfitting prevention
10
Spam Detection
Real-world app
Bag-of-Words, classification
🔤 NLP Concepts Quick Reference
from nltk .tokenize import word_tokenize
tokens = word_tokenize ("Hello world!" )
# Output: ['Hello', 'world', '!']
from nltk .stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer ()
lemmatizer .lemmatize ("running" )
# Output: "run"
from nltk .stem import PorterStemmer
stemmer = PorterStemmer ()
stemmer .stem ("running" )
# Output: "run"
from nltk .corpus import stopwords
stop_words = set (stopwords .words ('english' ))
# Remove: "the", "a", "is", etc.
from sklearn .feature_extraction .text import TfidfVectorizer
vectorizer = TfidfVectorizer ()
tfidf_matrix = vectorizer .fit_transform (documents )
from gensim .models import Word2Vec
model = Word2Vec (sentences , vector_size = 100 , window = 5 )
vector = model .wv ['word' ]
from nltk import pos_tag , word_tokenize
tokens = word_tokenize ("I love NLP" )
tags = pos_tag (tokens )
# Output: [('I', 'PRP'), ('love', 'VB'), ('NLP', 'NN')]
import spacy
nlp = spacy .load ("en_core_web_sm" )
doc = nlp ("Apple is in California" )
for ent in doc .ents :
print (f"{ ent .text } → { ent .label_ } " )
from sklearn .naive_bayes import MultinomialNB
model = MultinomialNB ()
model .fit (X_train , y_train )
predictions = model .predict (X_test )
from sklearn .cluster import KMeans
kmeans = KMeans (n_clusters = 3 )
labels = kmeans .fit_predict (X )
from tensorflow .keras .layers import LSTM , Dense , Embedding
model = Sequential ([
Embedding (vocab_size , embedding_dim ),
LSTM (64 ),
Dense (1 , activation = 'sigmoid' )
])
Accuracy : (TP + TN) / Total
Precision : TP / (TP + FP) - Correctness
Recall : TP / (TP + FN) - Coverage
F1-Score : 2 × (Precision × Recall) / (Precision + Recall)
Silhouette Score : -1 to 1 (higher is better)
Davies-Bouldin Index : Lower is better
Calinski-Harabasz Index : Higher is better
🔧 Common Import Statements
# NLTK
import nltk
from nltk .tokenize import word_tokenize
from nltk .stem import PorterStemmer , WordNetLemmatizer
from nltk .corpus import stopwords
from nltk import pos_tag
# spaCy
import spacy
nlp = spacy .load ("en_core_web_sm" )
# Data Processing
import pandas as pd
import numpy as np
# Machine Learning
from sklearn .feature_extraction .text import TfidfVectorizer , CountVectorizer
from sklearn .naive_bayes import MultinomialNB
from sklearn .svm import SVC
from sklearn .cluster import KMeans
from sklearn .metrics import accuracy_score , classification_report
# Deep Learning
import tensorflow as tf
from tensorflow .keras .models import Sequential
from tensorflow .keras .layers import Embedding , LSTM , Dense , Dropout
from tensorflow .keras .preprocessing .text import Tokenizer
from tensorflow .keras .preprocessing .sequence import pad_sequences
# Word Embeddings
from gensim .models import Word2Vec , FastText
import gensim .downloader as api
# Transformers
from transformers import BertTokenizer , BertModel
Your Local Folder:
├── 01_Comprehensive_NLP_Pipeline_Linguistic_Analysis.ipynb
├── 02_N_Gram_Analysis_Tokenization_Probability.ipynb
├── ... (practicals 3-10)
├── README.md ← Start here!
├── GETTING_STARTED.md ← Setup guide
├── CONTRIBUTING.md ← How to contribute
├── CHANGELOG.md ← What's new
├── LICENSE ← MIT License
└── requirements.txt ← Install dependencies
🎯 Recommended Learning Path
Beginner (First Time with NLP)
Read: README.md overview
Do: Setup from GETTING_STARTED.md
Run: Practical 01 (Comprehensive Pipeline)
Run: Practical 02 (N-Grams)
Run: Practical 03 (TF-IDF)
Intermediate (Some ML knowledge)
Run: Practical 04 (Word Embeddings)
Run: Practical 05 (Classification)
Run: Practical 06 (Clustering)
Run: Practical 07 (POS Tagging)
Run: Practical 08 (LSTM)
Run: Practical 09 (Advanced LSTM)
Run: Practical 10 (Real-world App)
🐛 Troubleshooting Quick Fixes
Problem
Solution
ModuleNotFoundError
pip install [package]
NLTK data missing
nltk.download('[resource]')
spaCy model missing
python -m spacy download en_core_web_sm
Kernel crashes
Close other apps, reduce batch size
Slow training
Use smaller dataset, GPU acceleration
Out of memory
Reduce batch size, use smaller model
Use Virtual Environment : Keep dependencies isolated
Start Small : Test with small datasets first
Save Often : Ctrl + S in Jupyter frequently
Read Comments : Every notebook has detailed comments
Modify Code : Don't just run - change parameters and experiment!
Document Your Work : Add notes to notebooks
Version Your Models : Save trained models for reuse
Use GPU : If available, speeds up training 10-100x
Monitor Resources : Watch RAM/CPU usage during training
Ask for Help : Create GitHub issues or contact maintainer
Happy Learning! 🚀
Last Updated: November 2025