Skip to content

Add offline LLM-assisted tool description optimizer #100

@dgenio

Description

@dgenio

Context / Problem

MCP tool descriptions are authored in isolation — each server writes descriptions for its own tools without awareness of what other tools exist in the ecosystem. This creates several problems for MCP-based agents:

  • Ambiguity between similar tools: "search", "query", "lookup", and "find" appear across different servers with overlapping semantics but no disambiguation.
  • Inconsistent granularity: Some tools have 3-word descriptions, others have 3-paragraph essays — the LLM wastes context window on verbose descriptions while missing important distinctions.
  • Token inefficiency: Descriptions optimized for humans are not optimized for LLM token budgets. An LLM needs just enough to pick the right tool — not a full user guide.
  • No ecosystem awareness: A tool description for "search" is written without knowing that 4 other tools also do "search" variants. The LLM has no disambiguation signal.

An offline service with visibility across all registered tools can rewrite descriptions to be maximally discriminative, concise, and LLM-friendly — something no individual tool author can do.

Proposal

1. ToolDescriptionOptimizer

Create chainweaver/optimizer.py (or extend chainweaver/compiler_llm.py from #28):

def optimize_tool_descriptions(
    tools: list[Tool],
    *,
    llm_fn: Callable[[str], str],
    strategy: OptimizationStrategy = OptimizationStrategy.DISCRIMINATIVE,
) -> list[ToolDescriptionProposal]:
    """
    Use an LLM (offline, at build time) to rewrite tool descriptions
    for maximum discriminability within the known tool ecosystem.
    Never called at runtime.
    """

2. Optimization strategies

class OptimizationStrategy(str, Enum):
    DISCRIMINATIVE = "discriminative"   # Maximize distinction between similar tools
    CONCISE = "concise"                 # Minimize token count while preserving semantics
    STRUCTURED = "structured"           # Enforce consistent format across all tools

3. Output: rewrite proposals

@dataclass
class ToolDescriptionProposal:
    tool_name: str
    original_description: str
    proposed_description: str
    rationale: str                      # Why this change improves discriminability
    similarity_group: list[str]         # Other tools this was disambiguated against
    token_delta: int                    # Negative = fewer tokens (improvement)
    source: str = "description-optimizer"

4. LLM prompt design

The optimizer builds a prompt containing:

  • All tool names and descriptions in the ecosystem
  • Input/output schema summaries per tool
  • Explicit instruction: "Rewrite descriptions to help an LLM agent choose the correct tool with minimal ambiguity. Focus on what makes each tool different from similar tools."
  • Optional: Example of a good vs bad description from the same ecosystem

5. Batch vs incremental

  • Full batch: optimize_tool_descriptions(all_tools) — rewrite all descriptions together for maximum global coherence.
  • Incremental: optimize_new_tool_description(new_tool, existing_tools) — optimize only the new tool's description given the existing ecosystem, and flag existing tools whose descriptions should be updated.

6. Safety guardrails

  • Banned from runtime: Like compiler_llm.py, this module MUST NOT be imported by executor.py. Enforce via test.
  • No auto-application: Proposals are returned as data objects for human review, never applied automatically.
  • Abstracted LLM call: llm_fn: Callable[[str], str] — no dependency on any LLM provider.
  • Original descriptions preserved: Proposals always include the original description for comparison.

Relevant Code Locations

Acceptance Criteria

  • optimize_tool_descriptions() accepts a list of tools and an llm_fn callable
  • Proposals are returned as ToolDescriptionProposal objects with original, proposed, rationale, and similarity group
  • The prompt template includes all tool schemas and descriptions for ecosystem-aware optimization
  • At least 3 strategies: discriminative (default), concise, structured
  • Incremental mode: optimize_new_tool_description() for single-tool optimization against existing ecosystem
  • A test verifies that chainweaver/executor.py does NOT import optimizer
  • llm_fn is a plain Callable[[str], str] — no dependency on any LLM provider
  • token_delta is computed (approximate, e.g., word count × 1.3)
  • At least 5 test cases: mock LLM returns valid proposals, handles similar tool names, handles unique tools (no changes needed), incremental mode, invalid LLM output (graceful error)

Out of Scope

  • Runtime description rewriting (explicitly banned)
  • Automatic application of proposals (human review required)
  • Specific LLM provider integrations
  • Description quality scoring against live agent performance

Relationship to Existing Issues

Notes

  • This is a novel differentiator: no other orchestration framework optimizes tool descriptions across the ecosystem.
  • The key insight is that discrimination is a property of the set, not the individual tool. You can't write a good "search" description without knowing all the other "search" tools.
  • Consider outputting a "confusion matrix" — pairs of tools most likely to be confused by an LLM based on description similarity.
  • The optimizer could also detect and flag duplicate tools (same capability offered by multiple MCP servers) and suggest consolidation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai-friendlyDesigned for AI-assisted implementationarea:compilerFlow compilation and optimizationarea:integrationsExternal system integrationscomplexity:averageModerate effort, some design neededpriority:mediumImportant but not blockingsize:MMedium effort (1-3 days)type:featureNew feature or capability

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions