This directory has promptfoo-based tests for AI models.
The instructions below are for Linux.
If you do not have Python installed already, install it as follows.
sudo apt-get install python3
Create a .venv directory and activate your Python virtual environment as follows.
python3 -m venv .venv source .venv/bin/activate
The virtual environment is now activated. You should see the text (.venv) in parentheses appear on the left of your command prompt.
You must always run this step before executing any Python scripts in this repository, else you will encounter run-time errors.
Install all the Python packages from requirements.txt as follows.
python -m pip install --upgrade pip pip install -r requirements.txt
This will install all the necessary Python packages in your virtual environment. Due to the large number of packages, this will take some time.
Currently, this will only work in a GPU environment.
After activating your GPU virtual environment, run the following.
python basic_vllm.py
This script will test a Meta LLM with some basic prompts.
We use NVM (Node Version Manager) to handle Node.js versions on Ubuntu WSL.
# Update package list and install curl
sudo apt update && sudo apt install -y curl
# Download and install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
# Load nvm into the current session
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"
# Install the Long Term Support (LTS) version of Node.js
nvm install --ltsNext, we install promptfoo globally for CLI access.
# Install promptfoo globally using npm
npm install -g promptfoo
# Verify the installation
promptfoo --versionInstall Ollama and pull a model. In the example below, I pulled the 8B Llama3 model.
sudo apt-get install zstd curl -fsSL https://ollama.com/install.sh | sh # Note that this is the 8B model (5 GB file size) ollama pull llama3.1:8b
Run the tests as follows.
promptfoo eval --config basic_local_llama3_tests.yaml
This will run the tests in the named YAML file.
Go to https://llama.meta.com/llama-downloads/, accept the terms, and select the llama 3.2 1B and llama 3.1 8B models.
You will then receive an email with a URL for each model type. You need to use this URL within 48 hours in the following steps, else it will expire.
List the available llama models as follows.
llama model list --show-all
Find the model ID for your model (left-most column in the table). Then, download the appropriate model as follows.
llama model download --source meta --model-id llama3.2-1B
When prompted, enter the URL that you received by email. The process will now commence and download the requested model to your computer. Note that the model you chose to download must be a model that you accepted the terms for earlier. Otherwise, you will get a download error.
Next, download the llama3.1 model.
llama model download --source meta --model-id llama3.1-8B
The models get downloaded to ~/.llama.
Then, run the sample llama3 chat completion script as follows.
torchrun --nproc_per_node 1 llama3_sample_completion.py ~/.llama/checkpoints/Llama3.2-1B
Install huggingface to download models that vllm will use.
curl -LsSf https://hf.co/cli/install.sh | bash
Next, setup an account on huggingface.co and get an access token.
Then, type the following and enter your access token when asked.
hf auth login
Download the facebook/opt-125m model as follows.
hf download --repo-type model facebook/opt-125m
If it is correctly downloaded, the following command will show you the model in the hf cache.
hf cache scan