Create a conda environment with python 3.12
conda create -n sift python=3.12
conda activate siftClone this repo and install the required packages
git clone https://github.com/imbulana/sift.git
cd sift
python3 -m pip install -r requirements.txtThe dataset, intermediate steps, models, and experiment results are either stored and tracked in a DVC remote storage on Google Drive.
If you have access to the remote storage, pull it with
dvc pullOtherwise set up your own remote storage following the instructions here.
Then download the dataset. See here for more information about the dataset.
cd data/raw
curl -L -o data.zip \
https://www.kaggle.com/api/v1/datasets/download/clmentbisaillon/fake-and-real-news-dataset
unzip data.zip && rm data.zip
mv True.csv real.csv && mv Fake.csv fake.csv
cd ../..To reproduce the pipeline dvc.yaml in the current workspace, run
dvc reproTo create a new experiment, modify the hyperparameters in params.yaml and the pipeline in dvc.yaml as required, then run
dvc exp runTo easily compare experiments, install the DVC extension on VSCode.
To run a series of experiments with different hyperparamters in params.yaml, add them to an experiment queue
dvc exp run -S 'featurize.max_features=5,10' -S 'featurize.ngrams=1,2,3' --queueThen, run the experiments in parallel locally
dvc queue start -j <number of parallel jobs>
# OR if you want to time the set of experiments
time dvc exp run --run-all -j <number of parallel jobs>To run a set of experiments on a k8s cluster, add the experiments to the queue as above
dvc exp run -S 'featurize.max_features=5,10' -S 'featurize.ngrams=1,2,3' --queueBuild a docker image
rm -rf bulid # remove existing build (if any)
mlem build docker_dir --model models/random_forest --server fastapi --target build
docker build build -t mlem-model:latestRun the docker container to serve the model with FastAPI
docker run -p 8080:8080 mlem-model:latestNavigate to http://localhost:8080/docs to see the OpenAPI spec.
See here more instructions and other build and serve options.
Install minikube and kubectl following the instructions here and here.
Then start a minkube cluser and point the shell to minikube's docker-daemon
minikube start
eval $(minikube -p minikube docker-env)Build the docker image in minikube
docker build build -t mlem-model:latestFor isolation, create a k8s namespace named sift-app
kubectl apply -f k8s/local/namespace.yamlThen create a new context for the namespace and switch to it
kubectl config set-context sift-app --namespace sift-app --cluster=minikube --user minikube
kubectl config use-context sift-appRun the following the to see all contexts and to verify that the current context is sift-app
kubectl config get-contextsCreate the deployment found in k8s/local/deployment.yaml
kubectl apply -f k8s/local/deployment.yamlVerify that the deployment is successful by running
kubectl get deploymentsAccess the app by creating the service in k8s/local/service.yaml
kubectl apply -f k8s/local/service.yaml
minikube service sift-app-service -n sift-app --urlWhen done, clean up all resources in the namespace
kubectl delete namespace sift-appTo delete the context from the config
kubectl config delete-context sift-app-local
kubectl config use-context minikube # switch to defaultModels are versioned within this repository using git tags. However the model files are stored in the remote DVC repository.
First store the repo url to a shell variable
export REPO=https://github.com/imbulana/siftTo see registered models, run
gto showTo register a new model, run
gto register <path_to_model> --repo $REPO