A lightweight, embedding microservice powered by Alibaba's Qwen3-Embedding models.
# build image
docker build --pull -t qwen3-emb-service .
# run service (cpu)
docker run -p $HOST_PORT:$CONTAINER_PORT -e HF_TOKEN="$HF_TOKEN" qwen3-emb-service
# run service (gpu) (requires NVIDIA container toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
docker run --gpus 1 -p $HOST_PORT:$CONTAINER_PORT -e HF_TOKEN="$HF_TOKEN" qwen3-emb-serviceOption #1:
curl -X POST http://0.0.0.0:{$HOST_PORT}/embedding/embed \
-H "Content-Type: application/json" \
-d '{"messages": [{"type": "text", "text": "What is RAG?"}]}'Option #2:
uv run examples test_call.pyYou'll get a JSON response response.json with L2-normalized embeddings.
- input: list of strings (or dict)
- output: list of L2-normalized embeddings
- model: Qwen3-Embedding-
XB / Qwen3-VL-Embedding-XB
POST /embeddings/embed:
{
"messages": [
{"type": "text", "text": "Text document #1"},
{"type": "image_url", "image_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"},
{"type": "image", "image": "uploads/demo.jpeg"}
]
}- Images can be passed as URL. For example: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg
- Upload images into
uploadsdirectory and pass filename. For example for image./uploads/demo.jpg, in JSON you should passuploads/demo.jpg. - Image in base64 format.
See examples/generate_vl_request.py.
Qwen3 Embedding models have user defined output size. You can change hidden_size in config.json for specific model.
For example, Qwen/Qwen3-VL-Embedding-2B supports user-defined output dimensions ranging from 64 to 2048.
To load VL model, you need to set VL = True in src/core/settings.py. Use False to load text only model.