[Feature] Dynamic Runtime Loading and Refresh of MCP Tools and Prompts #543
Replies: 2 comments 2 replies
-
|
Thanks for preparing this, @yanand0909. There seems to be an overlap of efforts with @weiqingy #549. You guys may exchange ideas and figure out how to collaborate on this. Concerning the design doc, I had a brief look at the Proposed Solution and Public Interface parts, but not the Implementation Details. A few comments so far:
|
Beta Was this translation helpful? Give feedback.
-
|
Hi, @yanand0909 , thanks for preparing this document. The overall design looks good to me, including runtime discovering and dynamic refreshing. I have some comments about the details.
I think maybe we don't need to introduce a new type The difference between The benefit is we don't need to modify the SerDe for resource provider.
Currently the
In #548, we have extracted
Currently, user must register all the names of tools including mcp tools when declaring a chat model setup, which results in the inability to support scenarios where MCP tools are dynamically added or removed at runtime. After we support dynamic refresh of mcp server, we may also support register a mcp server to chat model setup, rather than only support being precise down to each tool's name. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
GitHub Issue: #458
Motivation
Current MCP Resource Discovery
The Flink Agents framework currently discovers MCP tools and prompts at compile time during
AgentPlanconstruction. InAgentPlan.extractJavaMCPServer(), the framework:JavaResourceProvider.provide()listTools()andlistPrompts()over the networkJavaSerializableResourceProviderinstancesThe Problem
This compile-time discovery approach has several limitations:
Build-time server dependency: The MCP server must be reachable during
AgentPlanconstruction. If the server is down, the build fails — even though the agent won't run until later.Static capabilities: Tools and prompts are frozen at compile time. If an MCP server adds, removes, or updates its tools, the agent cannot see the changes without recompilation.
No connection reuse: The MCP server connection is opened for discovery and immediately closed. At runtime, a new connection must be established for tool calls, duplicating work.
No graceful degradation: A transient MCP server outage during compilation is a hard failure with no recovery path.
Proposed Solution
We propose a two-part solution:
Runtime Discovery — Defer MCP tool/prompt discovery from compile-time to operator startup (
ActionExecutionOperator.open()). This decouples the build from MCP server availability and enables connection reuse.Dynamic Refresh — Allow MCP tools and prompts to be refreshed at runtime, so agents can adapt to changes in MCP server capabilities without redeployment.
Public Interface
MCPServer Builder Additions
New configuration options on the existing
MCPServer.Builder:New Public API on MCPServer
1.2 Modify AgentPlan.extractJavaMCPServer()
File:
plan/src/main/java/.../AgentPlan.java(lines 407-452)Remove all MCP server instantiation,
listTools()/listPrompts()calls, andclose(). Replace with a simple provider registration:Impact:
AgentPlanconstruction drops from ~1-2s (MCP round-trip) to <10ms. Builds no longer depend on MCP server availability.1.3 Runtime Discovery in ActionExecutionOperator
File:
runtime/src/main/java/.../operator/ActionExecutionOperator.javaAdd MCP resource discovery during
open()and cleanup duringclose():1.4 AgentPlan Cache Helper
File:
plan/src/main/java/.../AgentPlan.java1.5 Serialization Support
Files:
ResourceProviderJsonSerializer.java,ResourceProviderJsonDeserializer.javaAdd serialization/deserialization cases for
MCPServerResourceProviderthat persist only theResourceDescriptor(no runtime state):Part 2: Dynamic Refresh
Once runtime discovery is in place, dynamic refresh builds on the same infrastructure. We propose a phased approach with multiple strategies.
2.1 Periodic Polling (Phase 1)
A background daemon thread periodically calls
refreshNow():Flink Integration:
ActionExecutionOperator.open()callsstartAutoRefresh()on each MCP server after initial discovery.close()callsstopAutoRefresh().2.3 Thread Safety
Tool reads are frequent (every tool call); refreshes are infrequent. We use
StampedLockfor optimal read-heavy concurrency:2.4 Graceful Degradation
On refresh failures, the server falls back to the last known good state:
2.5 Operator-Level vs Task-Manager-Level Refresh
We recommend operator-level refresh for simplicity and isolation. Each
ActionExecutionOperatormanages the refresh lifecycle of its own MCP servers.Future Work
Event-Driven Refresh
If MCP server notifications become part of the MCP specification, subscribe to server-sent events for near-instant tool updates:
This would eliminate polling overhead entirely but requires MCP server-side support. The implementation should fall back to polling if the SSE connection drops.
TTL-Based Lazy Refresh
An alternative cache strategy where tools are refreshed lazily on access after a configurable TTL expires. Good for development environments with sporadic tool usage — avoids background threads while still ensuring freshness.
Advanced Resilience
Python MCP Server Support
Extend runtime discovery to Python MCP servers using the existing Python client.
Other Agents approach
Beta Was this translation helpful? Give feedback.
All reactions