Skip to content

fix: add docstrings to semantic search tools to prevent null description#431

Open
dj0nes wants to merge 2 commits intovitali87:mainfrom
dj0nes:fix/lm-studio-tool-null-description
Open

fix: add docstrings to semantic search tools to prevent null description#431
dj0nes wants to merge 2 commits intovitali87:mainfrom
dj0nes:fix/lm-studio-tool-null-description

Conversation

@dj0nes
Copy link

@dj0nes dj0nes commented Mar 7, 2026

Problem

When using cgr with LM Studio as the backend, the agent fails immediately with:

pydantic_ai.exceptions.ModelHTTPError: status_code: 400, model_name: ...,
body: {'message': 'Invalid', 'type': 'invalid_request_error', 'param': 'tools.8.type', 'code': 'invalid_string'}

The root cause is that semantic_search_functions and get_function_source_by_id had no docstrings. pydantic-ai derives the tool description field from the function docstring, so without one it sends "description": null in the tool schema. LM Studio's OpenAI-compatible API layer rejects null descriptions, while the real OpenAI API silently accepts them.

Fix

Add docstrings to both inner tool functions in codebase_rag/tools/semantic_search.py:

  • semantic_search_functions: describes semantic similarity search over the codebase
  • get_function_source_by_id: describes retrieval of source code by graph node ID

This ensures both tools send a valid non-null description string, resolving the 400 error.

Testing

Run cgr start with an LM Studio backend (or any OpenAI-compatible server that strictly validates tool schemas). The agent should initialize and respond without error.

pydantic-ai uses function docstrings as the tool description field.
Without a docstring, it sends null, which LM Studio's OpenAI-compatible
API rejects with: tools.N.type: invalid_string.

Add docstrings to semantic_search_functions and get_function_source_by_id
so both tools have a valid description string.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue preventing the cgr agent from initializing with LM Studio backends. The problem stemmed from pydantic-ai generating null descriptions for certain tools due to missing docstrings, which LM Studio's API strictly rejects. By adding appropriate docstrings to the affected tool functions, this change ensures that valid tool schemas are always provided, thereby resolving the 400 error and enabling seamless operation with strict API environments.

Highlights

  • Problem Identification: Identified that pydantic-ai generates null descriptions for tools without docstrings, causing LM Studio to reject tool schemas with a 400 error.
  • Root Cause: The semantic_search_functions and get_function_source_by_id functions in codebase_rag/tools/semantic_search.py lacked docstrings.
  • Solution Implemented: Added docstrings to semantic_search_functions and get_function_source_by_id to ensure valid, non-null tool descriptions are generated.
  • Impact: Resolves the 400 error when using cgr with LM Studio or other strict OpenAI-compatible backends, allowing the agent to initialize correctly.
Changelog
  • codebase_rag/tools/semantic_search.py
    • Added a docstring to the semantic_search_functions asynchronous function.
    • Added a docstring to the get_function_source_by_id asynchronous function.
Activity
  • No activity has occurred on this pull request since its creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR fixes a 400 Invalid error thrown by LM Studio (and other strict OpenAI-compatible backends) by adding docstrings to the two inner tool functions in semantic_search.py so that pydantic-ai sends a non-null description field in the tool schema. While the fix resolves the immediate symptom, it uses the wrong approach for this codebase.

  • Root cause is correct: create_semantic_search_tool and create_get_function_source_tool both call Tool(fn, name=...) without a description= argument, so pydantic-ai falls back to the docstring — which was absent — resulting in "description": null.
  • Fix approach is inconsistent: Every other tool factory in the codebase (file_reader.py, codebase_query.py, file_writer.py, etc.) passes description=td.<CONSTANT> directly to the Tool() constructor. The descriptions for these two tools (td.SEMANTIC_SEARCH and td.GET_FUNCTION_SOURCE) already exist in tool_descriptions.py (lines 51–59) and are even registered in the AGENTIC_TOOLS dict, but are never wired to the Tool() calls.
  • Coding standard violation: The project explicitly forbids docstrings ("No Comments or Docstrings" rule). The correct fix is to add description=td.SEMANTIC_SEARCH and description=td.GET_FUNCTION_SOURCE to the two Tool() constructor calls and remove the docstrings.

Confidence Score: 2/5

  • The fix works at runtime but uses the wrong mechanism and violates project coding standards; should be reworked before merging.
  • The docstring approach does technically resolve the null description issue, but it contradicts the explicit "No Docstrings" coding standard and ignores the pre-existing td.SEMANTIC_SEARCH / td.GET_FUNCTION_SOURCE constants that were designed for exactly this purpose. All other tool factories in the codebase use description=td.<CONSTANT> in the Tool() constructor — this PR is the only inconsistent one. The fix is straightforward but needs to be done the right way.
  • codebase_rag/tools/semantic_search.py — the two Tool() constructor calls at lines 143 and 158 need description=td.SEMANTIC_SEARCH and description=td.GET_FUNCTION_SOURCE added, and the docstrings removed.

Important Files Changed

Filename Overview
codebase_rag/tools/semantic_search.py Adds docstrings to fix null tool descriptions for LM Studio compatibility, but this violates the project's no-docstrings standard and overlooks the existing td.SEMANTIC_SEARCH / td.GET_FUNCTION_SOURCE constants that should be passed to the Tool() constructors instead.

Sequence Diagram

sequenceDiagram
    participant Factory as Tool Factory
    participant PydanticAI as pydantic_ai.Tool
    participant LLM as LM Studio / OpenAI

    Factory->>PydanticAI: Tool(fn, name=...) [no description=]
    Note over PydanticAI: Falls back to fn.__doc__
    alt docstring present (this PR)
        PydanticAI-->>LLM: description = "Search for functions..."
        LLM-->>PydanticAI: 200 OK
    else no docstring (before PR)
        PydanticAI-->>LLM: description = null
        LLM-->>PydanticAI: 400 invalid_string
    end

    Note over Factory,PydanticAI: Correct fix (per codebase pattern)
    Factory->>PydanticAI: Tool(fn, name=..., description=td.SEMANTIC_SEARCH)
    Note over PydanticAI: Uses explicit description, no docstring needed
    PydanticAI-->>LLM: description = "Performs a semantic search..."
    LLM-->>PydanticAI: 200 OK
Loading

Last reviewed commit: 6c63bdd


def create_semantic_search_tool() -> Tool:
async def semantic_search_functions(query: str, top_k: int = 5) -> str:
"""Search for functions and classes in the codebase using semantic similarity."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings violate project standards; use existing td descriptions instead

The project's coding standard explicitly prohibits docstrings ("No Comments or Docstrings — code should be self-documenting"). More importantly, this codebase already has an established pattern for providing tool descriptions: passing description= directly to the Tool() constructor from constants in tool_descriptions.py.

Looking at every other tool in the project — file_reader.py uses description=td.FILE_READER, codebase_query.py uses description=td.CODEBASE_QUERY, etc. — the description kwarg is what actually avoids the null issue. The docstring is a fallback pydantic-ai uses when no description is passed.

Crucially, tool_descriptions.py already defines both td.SEMANTIC_SEARCH (line 51–54) and td.GET_FUNCTION_SOURCE (line 56–59) with richer, more descriptive text. They are even mapped in the AGENTIC_TOOLS dict (lines 156–157) but are never passed to the Tool() constructors, which is the real root cause.

The correct fix is to add the description= kwarg to both Tool() calls:

# create_semantic_search_tool (line 143)
return Tool(semantic_search_functions, name=td.AgenticToolName.SEMANTIC_SEARCH, description=td.SEMANTIC_SEARCH)

# create_get_function_source_tool (line 158)
return Tool(get_function_source_by_id, name=td.AgenticToolName.GET_FUNCTION_SOURCE, description=td.GET_FUNCTION_SOURCE)

This removes the docstrings, uses the existing richer constants, and aligns perfectly with how every other tool in the codebase is constructed.

Rule Used: ## Technical Requirements

Agentic Framework

-... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/tools/semantic_search.py
Line: 123

Comment:
**Docstrings violate project standards; use existing `td` descriptions instead**

The project's coding standard explicitly prohibits docstrings ("No Comments or Docstrings — code should be self-documenting"). More importantly, this codebase already has an established pattern for providing tool descriptions: passing `description=` directly to the `Tool()` constructor from constants in `tool_descriptions.py`.

Looking at every other tool in the project — `file_reader.py` uses `description=td.FILE_READER`, `codebase_query.py` uses `description=td.CODEBASE_QUERY`, etc. — the `description` kwarg is what actually avoids the null issue. The docstring is a fallback pydantic-ai uses when no `description` is passed.

Crucially, `tool_descriptions.py` **already defines** both `td.SEMANTIC_SEARCH` (line 51–54) and `td.GET_FUNCTION_SOURCE` (line 56–59) with richer, more descriptive text. They are even mapped in the `AGENTIC_TOOLS` dict (lines 156–157) but are never passed to the `Tool()` constructors, which is the real root cause.

The correct fix is to add the `description=` kwarg to both `Tool()` calls:

```python
# create_semantic_search_tool (line 143)
return Tool(semantic_search_functions, name=td.AgenticToolName.SEMANTIC_SEARCH, description=td.SEMANTIC_SEARCH)

# create_get_function_source_tool (line 158)
return Tool(get_function_source_by_id, name=td.AgenticToolName.GET_FUNCTION_SOURCE, description=td.GET_FUNCTION_SOURCE)
```

This removes the docstrings, uses the existing richer constants, and aligns perfectly with how every other tool in the codebase is constructed.

**Rule Used:** ## Technical Requirements

### Agentic Framework
-... ([source](https://app.greptile.com/review/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and fixes a compatibility issue with LM Studio by ensuring pydantic-ai receives a description for the semantic search tools. However, the current implementation of adding docstrings conflicts with a project-wide rule that prohibits them, which is enforced by a pre-commit hook. A more robust solution that aligns with the project's conventions is to explicitly pass the tool descriptions to the pydantic_ai.Tool constructor. The necessary descriptions are already defined in codebase_rag/tools/tool_descriptions.py. This approach resolves the bug while adhering to the established coding standards.


def create_semantic_search_tool() -> Tool:
async def semantic_search_functions(query: str, top_k: int = 5) -> str:
"""Search for functions and classes in the codebase using semantic similarity."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

While adding a docstring fixes the issue, it violates the project's rule against using docstrings. A better approach is to explicitly provide the description to the Tool constructor, using the existing description from tool_descriptions.py.

Please remove this docstring and modify the Tool creation at line 143 as follows:

return Tool(
    semantic_search_functions,
    name=td.AgenticToolName.SEMANTIC_SEARCH,
    description=td.SEMANTIC_SEARCH,
)

This change resolves the issue without contravening project standards.

References
  1. Docstrings are not allowed in this project, as enforced by a pre-commit hook.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks


def create_get_function_source_tool() -> Tool:
async def get_function_source_by_id(node_id: int) -> str:
"""Retrieves the source code of a function or class by its graph node ID."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

As with the other tool, this docstring violates the project's 'no docstrings' rule. Please remove it and instead pass the description directly to the Tool constructor at line 158:

return Tool(
    get_function_source_by_id,
    name=td.AgenticToolName.GET_FUNCTION_SOURCE,
    description=td.GET_FUNCTION_SOURCE,
)

This maintains consistency and adheres to the project's coding guidelines.

References
  1. Docstrings are not allowed in this project, as enforced by a pre-commit hook.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Use the existing td.SEMANTIC_SEARCH and td.GET_FUNCTION_SOURCE constants
as explicit description= arguments to the Tool() constructor, consistent
with every other tool factory in the codebase. Remove docstrings added
in the previous commit, which violated the project no-docstrings rule.

This ensures LM Studio and other strict OpenAI-compatible backends
receive a valid non-null description field in the tool schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant