1. Regression and Classification Problems
- Regression: Predicting a continuous output (e.g., predicting house prices).
- Classification: Predicting a categorical output (e.g., spam detection).
2. Simple Linear Regression
-
Definition: Models the relationship between a dependent variable $\ y $ and an independent variable
$\ x $ . -
Equation:
$\ y = \beta_0 + \beta_1 x $ -
Implementation in Python:
from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X, y)
3. Multiple Linear Regression
- Definition: Extends simple linear regression to multiple independent variables.
-
Equation:
$\ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n $
4. Ridge Regression
- Definition: Linear regression with L2 regularization to prevent overfitting.
-
Equation:
$\ \min ||y - X\beta||^2_2 + \lambda||\beta||^2_2 $ -
Implementation in Python:
from sklearn.linear_model import Ridge model = Ridge(alpha=1.0) model.fit(X, y)
5. Logistic Regression
- Definition: Used for binary classification problems.
-
Equation:
$\ P(y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \ldots + \beta_n x_n)}} $ -
Implementation in Python:
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X, y)
6. K-Nearest Neighbour (KNN)
- Definition: Classifies a data point based on the majority class among its k-nearest neighbors.
- Implementation in Python:
from sklearn.neighbors import KNeighborsClassifier model = KNeighborsClassifier(n_neighbors=3) model.fit(X, y)
7. Naive Bayes Classifier
- Definition: Based on Bayes' theorem, assumes independence between predictors.
- Implementation in Python:
from sklearn.naive_bayes import GaussianNB model = GaussianNB() model.fit(X, y)
8. Linear Discriminant Analysis (LDA)
- Definition: Finds a linear combination of features that separates two or more classes.
- Implementation in Python:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis model = LinearDiscriminantAnalysis() model.fit(X, y)
9. Support Vector Machine (SVM)
- Definition: Finds the hyperplane that best separates the classes in the feature space.
- Implementation in Python:
from sklearn.svm import SVC model = SVC(kernel='linear') model.fit(X, y)
10. Decision Trees
- Definition: Tree-like model of decisions and their possible consequences.
- Implementation in Python:
from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier() model.fit(X, y)
11. Bias-Variance Trade-off
- Definition: Trade-off between the model's ability to generalize (low variance) and its ability to fit the training data (low bias).
12. Cross-Validation Methods
-
Leave-One-Out (LOO) Cross-Validation: Uses a single observation from the original sample as the validation data, and the remaining observations as the training data.
from sklearn.model_selection import LeaveOneOut loo = LeaveOneOut() for train_index, test_index in loo.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index]
-
K-Folds Cross-Validation: Splits the data into k equally-sized folds, trains the model k times, each time using a different fold as the validation data.
from sklearn.model_selection import KFold kf = KFold(n_splits=5) for train_index, test_index in kf.split(X): X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index]
13. Multi-Layer Perceptron (MLP)
- Definition: A class of feedforward artificial neural network (ANN).
- Implementation in Python:
from sklearn.neural_network import MLPClassifier model = MLPClassifier(hidden_layer_sizes=(100,), max_iter=300) model.fit(X, y)
14. Feed-Forward Neural Network
- Definition: Type of neural network where connections between nodes do not form a cycle.
- Structure: Input layer, hidden layers, and output layer.
1. Clustering Algorithms
K-Means/K-Medoid:
-
K-Means: Partitions the data into k clusters by minimizing the variance within each cluster.
from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=3) kmeans.fit(X)
-
K-Medoid: Similar to K-Means but uses actual data points as cluster centers.
2. Hierarchical Clustering
- Definition: Builds a hierarchy of clusters using either a bottom-up or top-down approach.
- Types:
- Single-Linkage: The distance between two clusters is defined as the minimum distance between any single data point in the first cluster and any single data point in the second cluster.
- Multiple-Linkage: Can be complete (maximum distance between clusters) or average (average distance between clusters).
3. Dimensionality Reduction
Principal Component Analysis (PCA):
- Definition: Technique to reduce the number of features by transforming the original variables into a new set of variables (principal components) that are uncorrelated and capture the most variance in the data.
- Implementation in Python:
from sklearn.decomposition import PCA pca = PCA(n_components=2) X_reduced = pca.fit_transform(X)