# Install compiler
pip install -i https://pypi.rbln.ai/simple rebel-compiler==0.10.2
# Navigate to model directory
cd huggingface/transformers/text2text-generation/llama/llama3.1-8b
# Install dependencies
pip install -r requirements.txt
# Compile and run
python compile.py && python inference.pyNote
The versions pinned above match what this repo was tested with as of 2026-03-27; a newer stable release may already be available. For current versions and install steps, see the RBLN installation guide.
Tip
For models that support configuration presets, use --model_name <preset> to specify model-specific configurations. See each model's README for available presets.
Important
A RBLN portal account is required to install rebel-compiler from PyPI.
Select the ecosystem or API for your AI serving workload on RBLN NPUs.
| Ecosystem | # Models | Key packages |
|---|---|---|
| Hugging Face | 150+ | transformers, diffusers |
| PyTorch | 250+ | torch |
| TensorFlow | 75+ | keras, tensorflow |
C API — C/C++ inference bindings. Install via APT, then build from source.
Compile a model from the Model Zoo, then deploy with:
# Compile
python compile.py
# Install vLLM-RBLN
pip3 install \
--extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://wheels.vllm.ai/0.13.0/cpu \
vllm-rbln==0.10.2Note
The versions pinned above match what this repo was tested with as of 2026-03-27; a newer stable release may already be available. For current versions and install steps, see the RBLN installation guide.
# Import
from vllm import LLM, SamplingParams
# Load model and generate
llm = LLM(model="Llama-3.1-8B-Instruct")
out = llm.generate(["Hello"], SamplingParams(max_tokens=64))
print(out[0].outputs[0].text)- vLLM-RBLN — LLM serving on RBLN NPUs
- Triton — Triton Inference Server
- TorchServe — PyTorch model serving
