feat: OpenAI-compatible API Endpoints for Embedding Models (vLLM) by yinggeh · Pull Request #8483 · triton-inference-server/server

yinggeh · 2025-10-30T01:08:24Z

What does the PR do?

Enable /v1/embeddings inference request for OpenAI API frontend in vLLM container

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

feat

Related PRs:

triton-inference-server/vllm_backend#104

Where should the reviewer start?

Test plan:

CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

python/openai/openai_frontend/engine/triton_engine.py

python/openai/tests/test_embeddings.py

…to yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

yinggeh · 2025-10-30T23:00:05Z

Rebase to r25.10 to run pipeline.

python/openai/openai_frontend/engine/utils/triton.py

python/openai/openai_frontend/engine/engine.py

python/openai/tests/test_embeddings.py

python/openai/openai_frontend/engine/utils/triton.py

… yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

python/openai/tests/test_embeddings.py

pskiran1

LGTM! Thank you for adding this feature.

whoisj

overall, this LGTM.

left one suggestion, but I approve of these changes as they are.

whoisj · 2025-11-05T22:33:00Z

python/openai/openai_frontend/engine/triton_engine.py

            backend = self.backend

        # Request conversion from OpenAI format to backend-specific format
        if backend == "vllm":


wouldn't this be safer as below?

if backend == 'trtllm': # do something elif backend == 'vllm': # do something else else: raise ValueError(f'Unknown backend "{backend}" provided.')

server/python/openai/openai_frontend/engine/triton_engine.py

Lines 475 to 499 in b48425e

# Explicitly handle ensembles to avoid any runtime validation errors

if not backend and model.config()["platform"] == "ensemble":

backend = "ensemble"

print(f"Found model: {name=}, {backend=}")

lora_names = None

if self.backend == "vllm" or backend == "vllm":

lora_names = _get_vllm_lora_names(

self.server.options.model_repository, name, model.version

)

metadata = TritonModelMetadata(

name=name,

backend=backend,

model=model,

tokenizer=self.tokenizer,

lora_names=lora_names,

create_time=self.create_time,

inference_request_converter=self._determine_request_converter(

backend, RequestKind.GENERATION

),

embedding_request_converter=self._determine_request_converter(

backend, RequestKind.EMBEDDING

),

)

backend can be ensemble.

makes sense.

when backend == "ensemble" then we hit this code:

if request_type == RequestKind.GENERATION: return _create_trtllm_generate_request else: return _create_trtllm_embedding_request

is that desirable?

also, adding the switch-like statement future-proofs the function.

# Use TRT-LLM format as default for everything else. This could be # an ensemble, a python or BLS model, a TRT-LLM backend model, etc.

mc-nv and others added 9 commits October 13, 2025 14:00

update: Versions (#8459)

4a23bb1

Add libnvshmem installation (#8460)

2cfabea

fix: Add environment variable defaults to BACKENDS (#8465)

ea3c687

Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

Removing unused values for TensorRT-LLM container build (#8472)

46fe7a8

rollback: Reverting TensorRT-LLM Backend changes. (#8478)

31ebf6a

ci: Fix TensorRT engine build error for vision models (#8479)

33b6a14

Remove additional installation (#8480)

da6d4bb

docs: Update release related versions and values (#8481)

2961724

Add embeddings endpoint

9878606

yinggeh requested review from oandreeva-nv, pskiran1 and whoisj October 30, 2025 01:08

yinggeh self-assigned this Oct 30, 2025

yinggeh added the Enhancement New feature or request label Oct 30, 2025

github-advanced-security bot found potential problems Oct 30, 2025

View reviewed changes

python/openai/openai_frontend/engine/triton_engine.py Fixed Show fixed Hide fixed

python/openai/tests/test_embeddings.py Fixed Show fixed Hide fixed

mc-nv and others added 3 commits October 29, 2025 21:53

security: Update package version due to CVE-2025-62727 (#8482)

34808de

model

8fccf3a

Revert "security: Update package version due to CVE-2025-62727" (#8484)

76eeeec

whoisj previously approved these changes Oct 30, 2025

View reviewed changes

Minor fix

876582b

yinggeh dismissed whoisj’s stale review via 876582b October 30, 2025 22:00

whoisj previously approved these changes Oct 30, 2025

View reviewed changes

Merge branch 'r25.10' of github.com:triton-inference-server/server in…

7c40d1c

…to yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

yinggeh dismissed whoisj’s stale review via 7c40d1c October 30, 2025 22:59

yinggeh changed the base branch from main to r25.10 October 30, 2025 22:59

Add example document

3c3d6cc

pskiran1 reviewed Nov 3, 2025

View reviewed changes

Merge branch 'main' of github.com:triton-inference-server/server into…

2bc797f

… yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

yinggeh changed the base branch from r25.10 to main November 3, 2025 18:12

Fix conflict

05e0f98

yinggeh changed the base branch from main to r25.10 November 3, 2025 18:15

yinggeh changed the base branch from r25.10 to main November 3, 2025 18:15

Update naming

458b845

yinggeh requested review from pskiran1 and whoisj November 4, 2025 17:09

yinggeh added 2 commits November 4, 2025 21:54

More tests

87ca9bc

Fix trt tests

61076d9

github-advanced-security bot found potential problems Nov 5, 2025

View reviewed changes

python/openai/tests/test_embeddings.py Fixed Show fixed Hide fixed

yinggeh added 2 commits November 4, 2025 23:02

Update tests

976ca52

Update naming

b48425e

pskiran1 previously approved these changes Nov 5, 2025

View reviewed changes

whoisj previously approved these changes Nov 5, 2025

View reviewed changes

Fix issue on A100 runner

c022c3e

yinggeh dismissed stale reviews from whoisj and pskiran1 via c022c3e November 5, 2025 22:43

yinggeh requested review from pskiran1 and whoisj November 5, 2025 23:07

whoisj approved these changes Nov 6, 2025

View reviewed changes

yinggeh changed the title ~~feat: OpenAI-compatible API Endpoints for Embedding Models~~ feat: OpenAI-compatible API Endpoints for Embedding Models (vLLM) Nov 6, 2025

yinggeh merged commit f4ae90c into main Nov 6, 2025
3 checks passed

yinggeh deleted the yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton branch November 6, 2025 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenAI-compatible API Endpoints for Embedding Models (vLLM)#8483

feat: OpenAI-compatible API Endpoints for Embedding Models (vLLM)#8483
yinggeh merged 23 commits intomainfrom
yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

yinggeh commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

yinggeh commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pskiran1 left a comment

Uh oh!

whoisj left a comment

Uh oh!

whoisj Nov 5, 2025

Uh oh!

yinggeh Nov 5, 2025

Uh oh!

whoisj Nov 5, 2025

Uh oh!

yinggeh Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

	# Explicitly handle ensembles to avoid any runtime validation errors
	if not backend and model.config()["platform"] == "ensemble":
	backend = "ensemble"
	print(f"Found model: {name=}, {backend=}")

	lora_names = None
	if self.backend == "vllm" or backend == "vllm":
	lora_names = _get_vllm_lora_names(
	self.server.options.model_repository, name, model.version
	)

	metadata = TritonModelMetadata(
	name=name,
	backend=backend,
	model=model,
	tokenizer=self.tokenizer,
	lora_names=lora_names,
	create_time=self.create_time,
	inference_request_converter=self._determine_request_converter(
	backend, RequestKind.GENERATION
	),
	embedding_request_converter=self._determine_request_converter(
	backend, RequestKind.EMBEDDING
	),
	)

Conversation

yinggeh commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Uh oh!

Uh oh!

yinggeh commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pskiran1 left a comment

Choose a reason for hiding this comment

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

whoisj Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

yinggeh Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

whoisj Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

yinggeh Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

yinggeh commented Oct 30, 2025 •

edited

Loading