This repository contains Jupyter notebooks designed for preprocessing, training, and prediction workflows using machine learning models. Below is an overview of each file and its functionalities. The dataset is sourced from the Solafune competition, accessible via the following link:
This notebook handles the preprocessing of raw data, preparing it for model training. Key operations include cleaning, transforming, and splitting the dataset.
- Data Cleaning: Removes null values, handles missing data, and standardizes formats.
- Feature Engineering: Includes encoding categorical variables, scaling numerical features, and creating derived features.
- Train-Test Split: Divides the dataset into training and testing subsets.
- Visualization: Provides plots and graphs to understand the data distribution.
- Open the notebook in Jupyter or any compatible environment.
- Update the input data path.
- Run the cells sequentially to generate preprocessed datasets.
This notebook is used to train a machine learning model using the preprocessed data. It includes model definition, training, and evaluation.
- Model Architecture: Defines the architecture using Resnet50 as encoder and U-Net as decoder.
- Hyperparameter Tuning: Includes adjustable parameters for optimizing model performance.
- Training Pipeline: Executes the training loop and tracks metrics like loss, accuracy, jaccard coefficient, dice loss.
- Model Evaluation: Evaluates the model on the test dataset and generates performance metrics such as pixel accuracy, jaccard coefficient and IoU.
- Ensure that the preprocessed dataset from
preprocessing.ipynbis available. - Open the notebook and configure the training parameters.
- Run the cells to train and save the model.
This notebook uses the trained model to make predictions on new data.
- Model Loading: Loads the trained model from a specified directory.
- Data Input: Accepts new data in the required format.
- Prediction Pipeline: Processes the input data and outputs predictions.
- Visualization: Displays the prediction results in a user-friendly format (e.g., visuals).
- Ensure the trained model file is accessible.
- Provide the input data for predictions.
- Run the cells to generate predictions and visualize the results.
- Clone the repository:
git clone https://github.com/Bangkit-Capstone-Solafune-C242-FS01/machine-learning cd machine-learning - Choose model framework:
or
cd pytorchcd tensorflow - Install dependencies:
pip install -r requirements.txt
- Open Jupyter Lab:
jupyter lab
- Navigate to the desired notebook and follow the instructions in the file.
- Python 3.9+
- jupyter notebook
- Common Python libraries: NumPy, Scikit-learn, Matplotlib, TensorFlow/PyTorch (adjust based on model framework)
- rasterio
- albumentations
- timm
- segmentation-models
- OpenCV
- Patchify
- Scikit-image