Skip to content

dilipwarrier/eval_ai

Repository files navigation

Steps for running AI eval tests

This directory has promptfoo-based tests for AI models.

The instructions below are for Linux.

Python virtual environment setup

If you do not have Python installed already, install it as follows.

sudo apt-get install python3

Create a .venv directory and activate your Python virtual environment as follows.

python3 -m venv .venv
source .venv/bin/activate

The virtual environment is now activated. You should see the text (.venv) in parentheses appear on the left of your command prompt.

You must always run this step before executing any Python scripts in this repository, else you will encounter run-time errors.

Python package installation

Install all the Python packages from requirements.txt as follows.

python -m pip install --upgrade pip
pip install -r requirements.txt

This will install all the necessary Python packages in your virtual environment. Due to the large number of packages, this will take some time.

Running vllm

Currently, this will only work in a GPU environment.

After activating your GPU virtual environment, run the following.

python basic_vllm.py

This script will test a Meta LLM with some basic prompts.

Promptfoo installation

Install Node.js

We use NVM (Node Version Manager) to handle Node.js versions on Ubuntu WSL.

# Update package list and install curl
sudo apt update && sudo apt install -y curl

# Download and install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash

# Load nvm into the current session
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"

# Install the Long Term Support (LTS) version of Node.js
nvm install --lts

Install promptfoo

Next, we install promptfoo globally for CLI access.

# Install promptfoo globally using npm
npm install -g promptfoo

# Verify the installation
promptfoo --version

Ollama installation

Install Ollama and pull a model. In the example below, I pulled the 8B Llama3 model.

sudo apt-get install zstd
curl -fsSL https://ollama.com/install.sh | sh

# Note that this is the 8B model (5 GB file size)
ollama pull llama3.1:8b

Running tests

Run the tests as follows.

promptfoo eval --config basic_local_llama3_tests.yaml

This will run the tests in the named YAML file.

Downloading Llama model files and running sample scripts

Go to https://llama.meta.com/llama-downloads/, accept the terms, and select the llama 3.2 1B and llama 3.1 8B models.

You will then receive an email with a URL for each model type. You need to use this URL within 48 hours in the following steps, else it will expire.

List the available llama models as follows.

llama model list --show-all

Find the model ID for your model (left-most column in the table). Then, download the appropriate model as follows.

llama model download --source meta --model-id llama3.2-1B

When prompted, enter the URL that you received by email. The process will now commence and download the requested model to your computer. Note that the model you chose to download must be a model that you accepted the terms for earlier. Otherwise, you will get a download error.

Next, download the llama3.1 model.

llama model download --source meta --model-id llama3.1-8B

The models get downloaded to ~/.llama.

Then, run the sample llama3 chat completion script as follows.

torchrun --nproc_per_node 1 llama3_sample_completion.py ~/.llama/checkpoints/Llama3.2-1B

Installing huggingface cli

Install huggingface to download models that vllm will use.

curl -LsSf https://hf.co/cli/install.sh | bash

Next, setup an account on huggingface.co and get an access token.

Then, type the following and enter your access token when asked.

hf auth login

Download the facebook/opt-125m model as follows.

hf download --repo-type model facebook/opt-125m

If it is correctly downloaded, the following command will show you the model in the hf cache.

hf cache scan

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published