GitHub - imbulana/sift: SIFT: Semantically dIscerning Fact from Tale

SIFT: Semantically dIscerning Fact from Tale

Setup

Create a conda environment with python 3.12

conda create -n sift python=3.12
conda activate sift

Clone this repo and install the required packages

git clone https://github.com/imbulana/sift.git

cd sift
python3 -m pip install -r requirements.txt

Dataset

The dataset, intermediate steps, models, and experiment results are either stored and tracked in a DVC remote storage on Google Drive.

If you have access to the remote storage, pull it with

dvc pull

Otherwise set up your own remote storage following the instructions here.

Then download the dataset. See here for more information about the dataset.

cd data/raw
curl -L -o data.zip \
    https://www.kaggle.com/api/v1/datasets/download/clmentbisaillon/fake-and-real-news-dataset

unzip data.zip && rm data.zip
mv True.csv real.csv && mv Fake.csv fake.csv
cd ../..

Experiments

Reproduce Current Workspace

To reproduce the pipeline dvc.yaml in the current workspace, run

dvc repro

To create a new experiment, modify the hyperparameters in params.yaml and the pipeline in dvc.yaml as required, then run

dvc exp run

To easily compare experiments, install the DVC extension on VSCode.

Experiment Queue / Parallel Runs

To run a series of experiments with different hyperparamters in params.yaml, add them to an experiment queue

dvc exp run -S 'featurize.max_features=5,10' -S 'featurize.ngrams=1,2,3' --queue

Then, run the experiments in parallel locally

dvc queue start -j <number of parallel jobs>
# OR if you want to time the set of experiments
time dvc exp run --run-all -j <number of parallel jobs>

Experiments on k8s Cluster (todo)

To run a set of experiments on a k8s cluster, add the experiments to the queue as above

dvc exp run -S 'featurize.max_features=5,10' -S 'featurize.ngrams=1,2,3' --queue

Deployment

Build a docker image

rm -rf bulid # remove existing build (if any)

mlem build docker_dir --model models/random_forest --server fastapi --target build
docker build build -t mlem-model:latest

Local w/ Docker

Run the docker container to serve the model with FastAPI

docker run -p 8080:8080 mlem-model:latest

Navigate to http://localhost:8080/docs to see the OpenAPI spec.

See here more instructions and other build and serve options.

Minikube

Install minikube and kubectl following the instructions here and here.

Then start a minkube cluser and point the shell to minikube's docker-daemon

minikube start
eval $(minikube -p minikube docker-env)

Build the docker image in minikube

docker build build -t mlem-model:latest

For isolation, create a k8s namespace named sift-app

kubectl apply -f k8s/local/namespace.yaml

Then create a new context for the namespace and switch to it

kubectl config set-context sift-app --namespace sift-app --cluster=minikube --user minikube
kubectl config use-context sift-app

Run the following the to see all contexts and to verify that the current context is sift-app

kubectl config get-contexts

Create the deployment found in k8s/local/deployment.yaml

kubectl apply -f k8s/local/deployment.yaml

Verify that the deployment is successful by running

kubectl get deployments

Access the app by creating the service in k8s/local/service.yaml

kubectl apply -f k8s/local/service.yaml
minikube service sift-app-service -n sift-app --url

When done, clean up all resources in the namespace

kubectl delete namespace sift-app

To delete the context from the config

kubectl config delete-context sift-app-local
kubectl config use-context minikube # switch to default

Model Registry

Models are versioned within this repository using git tags. However the model files are stored in the remote DVC repository.

First store the repo url to a shell variable

export REPO=https://github.com/imbulana/sift

To see registered models, run

gto show

To register a new model, run

gto register <path_to_model> --repo $REPO

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.dvc		.dvc
.github/workflows		.github/workflows
assets		assets
build		build
data		data
k8s/local		k8s/local
models		models
src		src
.dvcignore		.dvcignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Prepare-Dockerfile		Prepare-Dockerfile
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIFT: Semantically dIscerning Fact from Tale

Setup

Dataset

Experiments

Reproduce Current Workspace

Experiment Queue / Parallel Runs

Experiments on k8s Cluster (todo)

Deployment

Local w/ Docker

Minikube

Model Registry

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

imbulana/sift

Folders and files

Latest commit

History

Repository files navigation

SIFT: Semantically dIscerning Fact from Tale

Setup

Dataset

Experiments

Reproduce Current Workspace

Experiment Queue / Parallel Runs

Experiments on k8s Cluster (todo)

Deployment

Local w/ Docker

Minikube

Model Registry

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages