Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds explicit model-control support to the OpenAI-compatible FastAPI frontend, enabling runtime model load/unload (EXPLICIT mode) and startup model selection via CLI flags, along with tests and documentation.
Changes:
- Add
--model-control-modeand--load-modelCLI options and pass them intotritonserver.Serverstartup configuration. - Introduce
/v1/models/{model_name}/loadand/v1/models/{model_name}/unloadendpoints plus API restriction grouping for model-management. - Extend
TritonLLMEnginewith dynamic model load/unload support and add comprehensive model-management tests/docs.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| python/openai/openai_frontend/main.py | Adds CLI flags and wires model control mode + startup models into Triton server creation. |
| python/openai/openai_frontend/frontend/fastapi_frontend.py | Registers the new model-management router in the FastAPI app. |
| python/openai/openai_frontend/frontend/fastapi/routers/model_management.py | Implements load/unload REST endpoints that call the engine. |
| python/openai/openai_frontend/frontend/fastapi/middleware/api_restriction.py | Adds model-management to restricted API mapping. |
| python/openai/openai_frontend/engine/engine.py | Extends the engine protocol with async load/unload methods. |
| python/openai/openai_frontend/engine/triton_engine.py | Implements load/unload and refreshes model metadata handling. |
| python/openai/tests/utils.py | Adds an explicit-mode server helper and a port parameter to OpenAIServer. |
| python/openai/tests/test_model_management.py | Adds test coverage for none/explicit modes, concurrency, CLI flags, and restricted API behavior. |
| python/openai/README.md | Documents model management modes, CLI usage, endpoints, and restriction options. |
Comments suppressed due to low confidence (1)
python/openai/tests/utils.py:111
OpenAIServer.__init__accepts aportparameter and uses it to build URLs, but it never passes--openai-port(or otherwise configures the subprocess) to actually start the server on that port. If a caller overridesport, health checks and requests will hit the wrong port. Either remove the parameter, or ensure the subprocess is launched with the matching--openai-portvalue whenportis provided.
def __init__(
self,
cli_args: List[str],
*,
port: int = 9000,
env_dict: Optional[Dict[str, str]] = None,
) -> None:
# TODO: Incorporate caller's cli_args passed to this instance instead
self.host = "localhost"
self.port = port
env = os.environ.copy()
if env_dict is not None:
env.update(env_dict)
this_dir = Path(__file__).resolve().parent
script_path = this_dir / ".." / "openai_frontend" / "main.py"
self.proc = subprocess.Popen(
["python3", script_path] + cli_args,
env=env,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add
…rprise-czech-national-bank-add
…rprise-czech-national-bank-add
whoisj
left a comment
There was a problem hiding this comment.
Overall, this is an excellent PR. Thanks for your hard work. I've left a couple of comments/questions that are preventing me from immediately approving things.
python/openai/openai_frontend/frontend/fastapi/routers/model_management.py
Show resolved
Hide resolved
python/openai/openai_frontend/frontend/fastapi/middleware/api_restriction.py
Outdated
Show resolved
Hide resolved
…rprise-czech-national-bank-add
…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add
…rprise-czech-national-bank-add
…h-national-bank-add' of https://github.com/triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add
python/openai/openai_frontend/frontend/fastapi/middleware/api_restriction.py
Show resolved
Hide resolved
…rprise-czech-national-bank-add
…h-national-bank-add' of github.com:triton-inference-server/server into spolisetty/tri-738-business-standard-ai-enterprise-czech-national-bank-add
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Blocking C API call dispatched to thread pool. The C API handles | ||
| # in-flight request draining and conflict resolution internally. | ||
| try: | ||
| await asyncio.to_thread(self._unload_model_sync, model_name) |
There was a problem hiding this comment.
an await under lock? how long are we expecting the lock to be held here?
There was a problem hiding this comment.
Depending on how long core loads/unloads the model
What does the PR do?
This PR adds explicit model-control support to the OpenAI-compatible FastAPI frontend, enabling runtime model load/unload (EXPLICIT mode) and startup model selection via CLI flags, along with tests and documentation.
Changes:
--model-control-modeand--load-modelCLI options and pass them intotritonserver.Serverstartup configuration./v1/models/{model_name}/loadand/v1/models/{model_name}/unloadendpoints plus API restriction grouping for model-management.TritonLLMEnginewith dynamic model load/unload support and add comprehensive model-management tests/docs.Checklist
<commit_type>: <Title>Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)