- Describe the dataset.
- Explain the problem you aim to solve.
- Explore patterns or structure using clustering and dimensionality reduction (e.g., PCA).
- Visualize training data:
- Plot individual feature distributions (e.g., histograms, density plots).
- Plot relationships between features and the target variable.
- Create a correlation matrix.
- Discuss any interesting structures or explain attempts to find them.
- Train >=3 different models: (1) Logistic Regression, (2) SVM, (3) Neural Networks
- Use the following for implementation:
- Custom implementation or existing libraries (e.g., Keras, scikit-learn, TensorFlow).
- Experiment with different feature transformations (at least three, e.g., polynomial, PCA, radial-basis function kernel).
- Apply different regularization techniques (at least six values per model).
- Document all transformations and regularization results.
- Include training accuracy and validation metrics for every model.
- Provide results for varying parameter settings:
- Classification metrics (e.g., precision, recall).
- Regression metrics (e.g., MSE, R²).
- Plot and analyze performance metrics (e.g., accuracy, precision, recall, MSE) with different transformations and hyperparameters.
- Analyze and explain experimental results.
- Include a chart of key findings.
- Discuss the impact of:
- Feature transformations.
- Regularization techniques.
- Other hyperparameters on model performance.
- Interpret:
- Overfitting and underfitting observations.
- Bias-variance trade-offs.
- Parameter choices improving generalization.
- Upload the following to Gradescope:
- Presentation slides.
- Project write-up (PDF format).
- Project code as a Jupyter Notebook (or GitHub link, if necessary).
- Custom dataset (if used, either upload or provide a GitHub link).