feat: Support explicit model control mode in OpenAI frontend by pskiran1 · Pull Request #8682 · triton-inference-server/server

pskiran1 · 2026-03-03T17:18:44Z

What does the PR do?

This PR adds explicit model-control support to the OpenAI-compatible FastAPI frontend, enabling runtime model load/unload (EXPLICIT mode) and startup model selection via CLI flags, along with tests and documentation.

Changes:

Add --model-control-mode and --load-model CLI options and pass them into tritonserver.Server startup configuration.
Introduce /v1/models/{model_name}/load and /v1/models/{model_name}/unload endpoints plus API restriction grouping for model-management.
Extend TritonLLMEngine with dynamic model load/unload support and add comprehensive model-management tests/docs.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

feat

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID: 46877415

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Copilot

Pull request overview

This PR adds explicit model-control support to the OpenAI-compatible FastAPI frontend, enabling runtime model load/unload (EXPLICIT mode) and startup model selection via CLI flags, along with tests and documentation.

Changes:

Add --model-control-mode and --load-model CLI options and pass them into tritonserver.Server startup configuration.
Introduce /v1/models/{model_name}/load and /v1/models/{model_name}/unload endpoints plus API restriction grouping for model-management.
Extend TritonLLMEngine with dynamic model load/unload support and add comprehensive model-management tests/docs.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
python/openai/openai_frontend/main.py	Adds CLI flags and wires model control mode + startup models into Triton server creation.
python/openai/openai_frontend/frontend/fastapi_frontend.py	Registers the new model-management router in the FastAPI app.
python/openai/openai_frontend/frontend/fastapi/routers/model_management.py	Implements load/unload REST endpoints that call the engine.
python/openai/openai_frontend/frontend/fastapi/middleware/api_restriction.py	Adds `model-management` to restricted API mapping.
python/openai/openai_frontend/engine/engine.py	Extends the engine protocol with async load/unload methods.
python/openai/openai_frontend/engine/triton_engine.py	Implements load/unload and refreshes model metadata handling.
python/openai/tests/utils.py	Adds an explicit-mode server helper and a `port` parameter to `OpenAIServer`.
python/openai/tests/test_model_management.py	Adds test coverage for none/explicit modes, concurrency, CLI flags, and restricted API behavior.
python/openai/README.md	Documents model management modes, CLI usage, endpoints, and restriction options.

Comments suppressed due to low confidence (1)

python/openai/tests/utils.py:111

OpenAIServer.__init__ accepts a port parameter and uses it to build URLs, but it never passes --openai-port (or otherwise configures the subprocess) to actually start the server on that port. If a caller overrides port, health checks and requests will hit the wrong port. Either remove the parameter, or ensure the subprocess is launched with the matching --openai-port value when port is provided.

    def __init__(
        self,
        cli_args: List[str],
        *,
        port: int = 9000,
        env_dict: Optional[Dict[str, str]] = None,
    ) -> None:
        # TODO: Incorporate caller's cli_args passed to this instance instead
        self.host = "localhost"
        self.port = port

        env = os.environ.copy()
        if env_dict is not None:
            env.update(env_dict)

        this_dir = Path(__file__).resolve().parent
        script_path = this_dir / ".." / "openai_frontend" / "main.py"
        self.proc = subprocess.Popen(
            ["python3", script_path] + cli_args,
            env=env,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/openai/openai_frontend/main.py

python/openai/openai_frontend/engine/triton_engine.py

python/openai/tests/test_model_management.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add

…rprise-czech-national-bank-add

whoisj

Overall, this is an excellent PR. Thanks for your hard work. I've left a couple of comments/questions that are preventing me from immediately approving things.

python/openai/openai_frontend/engine/triton_engine.py

python/openai/openai_frontend/frontend/fastapi/routers/model_management.py

python/openai/openai_frontend/main.py

python/openai/tests/test_model_management.py

python/openai/openai_frontend/frontend/fastapi/middleware/api_restriction.py

python/openai/tests/utils.py

python/openai/openai_frontend/frontend/fastapi/routers/models.py

…rprise-czech-national-bank-add

python/openai/tests/test_model_management.py

…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add

…rprise-czech-national-bank-add

…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add

python/openai/openai_frontend/utils/utils.py

python/openai/openai_frontend/frontend/fastapi/middleware/api_restriction.py

…rprise-czech-national-bank-add

…h-national-bank-add' of github.com:triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/openai/openai_frontend/engine/triton_engine.py

python/openai/tests/test_model_management.py

python/openai/openai_frontend/main.py

python/openai/tests/test_model_management.py

whoisj

approved w/ 1 question

whoisj · 2026-03-24T15:18:21Z

python/openai/openai_frontend/engine/triton_engine.py

+            # Blocking C API call dispatched to thread pool. The C API handles
+            # in-flight request draining and conflict resolution internally.
+            try:
+                await asyncio.to_thread(self._unload_model_sync, model_name)


an await under lock? how long are we expecting the lock to be held here?

Depending on how long core loads/unloads the model

Update

66f2acc

pskiran1 requested a review from Copilot March 3, 2026 17:18

pskiran1 added the PR: feat A new feature label Mar 3, 2026

Copilot started reviewing on behalf of pskiran1 March 3, 2026 17:19 View session

pskiran1 added 2 commits March 3, 2026 22:51

Update

e56b855

Update

3fadc99

Copilot AI reviewed Mar 3, 2026

View reviewed changes

pskiran1 and others added 6 commits March 3, 2026 22:56

Update python/openai/openai_frontend/main.py

10a77ef

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update

7c20f6e

Merge branch 'spolisetty/tri-738-business-standard-ai-enterprise-czec…

e37e094

…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add

Update

3415d92

Update

3fa4695

Merge branch 'main' into spolisetty/tri-738-business-standard-ai-ente…

ff1e9be

…rprise-czech-national-bank-add

pskiran1 requested review from mattwittwer, whoisj and yinggeh March 6, 2026 16:48

Merge branch 'main' into spolisetty/tri-738-business-standard-ai-ente…

630cf14

…rprise-czech-national-bank-add

whoisj reviewed Mar 9, 2026

View reviewed changes

yinggeh reviewed Mar 10, 2026

View reviewed changes

python/openai/openai_frontend/frontend/fastapi/middleware/api_restriction.py Outdated Show resolved Hide resolved

python/openai/tests/utils.py Outdated Show resolved Hide resolved

python/openai/openai_frontend/frontend/fastapi/routers/models.py Show resolved Hide resolved

pskiran1 added 3 commits March 12, 2026 15:41

Update

bb6538a

Fix pre-commit errors

c37b639

Merge branch 'main' into spolisetty/tri-738-business-standard-ai-ente…

21d3c71

…rprise-czech-national-bank-add

github-advanced-security bot found potential problems Mar 12, 2026

View reviewed changes

python/openai/tests/test_model_management.py Fixed Show fixed Hide fixed

pskiran1 added 4 commits March 12, 2026 15:46

Update

b9e85ce

Merge branch 'spolisetty/tri-738-business-standard-ai-enterprise-czec…

fed45f1

…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add

Update

ded522a

Update

0e68c9d

pskiran1 requested a review from yinggeh March 12, 2026 15:59

pskiran1 added 2 commits March 13, 2026 00:01

Update

ec73a43

Update

074bd46

pskiran1 requested a review from whoisj March 12, 2026 18:48

pskiran1 added 4 commits March 13, 2026 00:45

Update

dc70f3c

Merge branch 'main' into spolisetty/tri-738-business-standard-ai-ente…

4f908b6

…rprise-czech-national-bank-add

Update

647f74e

Merge branch 'spolisetty/tri-738-business-standard-ai-enterprise-czec…

70a6455

…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add

yinggeh reviewed Mar 16, 2026

View reviewed changes

python/openai/openai_frontend/utils/utils.py Outdated Show resolved Hide resolved

python/openai/openai_frontend/frontend/fastapi/middleware/api_restriction.py Show resolved Hide resolved

pskiran1 and others added 7 commits March 18, 2026 23:21

Merge branch 'main' into spolisetty/tri-738-business-standard-ai-ente…

509c09a

…rprise-czech-national-bank-add

Remove model name validation from the frontend, as the Core handles it.

4bfb5b9

Fix pre-commit

88a9658

Update

f9a8cf6

Fix trtllm generate_engine.py

dca17fa

Merge branch 'spolisetty/tri-738-business-standard-ai-enterprise-czec…

c55aee0

…h-national-bank-add' of github.com:triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add

asfas

02eb203

yinggeh self-assigned this Mar 24, 2026

yinggeh requested a review from Copilot March 24, 2026 10:52

Copilot started reviewing on behalf of yinggeh March 24, 2026 10:52 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Update tests

55af90c

github-advanced-security bot found potential problems Mar 24, 2026

View reviewed changes

python/openai/tests/test_model_management.py Fixed Show fixed Hide fixed

yinggeh added 2 commits March 24, 2026 06:24

Update docs

ea5273f

update test

9db96f1

github-advanced-security bot found potential problems Mar 24, 2026

View reviewed changes

python/openai/tests/test_model_management.py Show resolved Hide resolved

python/openai/tests/test_model_management.py Show resolved Hide resolved

yinggeh self-requested a review March 24, 2026 14:06

whoisj approved these changes Mar 24, 2026

View reviewed changes

yinggeh merged commit bf2bd08 into main Mar 24, 2026
3 checks passed

yinggeh deleted the spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add branch March 24, 2026 18:37

Conversation

pskiran1 commented Mar 3, 2026 • edited by yinggeh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

whoisj Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

yinggeh Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

pskiran1 commented Mar 3, 2026 •

edited by yinggeh

Loading