fix: add docstrings to semantic search tools to prevent null description#431
fix: add docstrings to semantic search tools to prevent null description#431dj0nes wants to merge 2 commits intovitali87:mainfrom
Conversation
pydantic-ai uses function docstrings as the tool description field. Without a docstring, it sends null, which LM Studio's OpenAI-compatible API rejects with: tools.N.type: invalid_string. Add docstrings to semantic_search_functions and get_function_source_by_id so both tools have a valid description string.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue preventing the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Greptile SummaryThis PR fixes a
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Factory as Tool Factory
participant PydanticAI as pydantic_ai.Tool
participant LLM as LM Studio / OpenAI
Factory->>PydanticAI: Tool(fn, name=...) [no description=]
Note over PydanticAI: Falls back to fn.__doc__
alt docstring present (this PR)
PydanticAI-->>LLM: description = "Search for functions..."
LLM-->>PydanticAI: 200 OK
else no docstring (before PR)
PydanticAI-->>LLM: description = null
LLM-->>PydanticAI: 400 invalid_string
end
Note over Factory,PydanticAI: Correct fix (per codebase pattern)
Factory->>PydanticAI: Tool(fn, name=..., description=td.SEMANTIC_SEARCH)
Note over PydanticAI: Uses explicit description, no docstring needed
PydanticAI-->>LLM: description = "Performs a semantic search..."
LLM-->>PydanticAI: 200 OK
Last reviewed commit: 6c63bdd |
|
|
||
| def create_semantic_search_tool() -> Tool: | ||
| async def semantic_search_functions(query: str, top_k: int = 5) -> str: | ||
| """Search for functions and classes in the codebase using semantic similarity.""" |
There was a problem hiding this comment.
Docstrings violate project standards; use existing td descriptions instead
The project's coding standard explicitly prohibits docstrings ("No Comments or Docstrings — code should be self-documenting"). More importantly, this codebase already has an established pattern for providing tool descriptions: passing description= directly to the Tool() constructor from constants in tool_descriptions.py.
Looking at every other tool in the project — file_reader.py uses description=td.FILE_READER, codebase_query.py uses description=td.CODEBASE_QUERY, etc. — the description kwarg is what actually avoids the null issue. The docstring is a fallback pydantic-ai uses when no description is passed.
Crucially, tool_descriptions.py already defines both td.SEMANTIC_SEARCH (line 51–54) and td.GET_FUNCTION_SOURCE (line 56–59) with richer, more descriptive text. They are even mapped in the AGENTIC_TOOLS dict (lines 156–157) but are never passed to the Tool() constructors, which is the real root cause.
The correct fix is to add the description= kwarg to both Tool() calls:
# create_semantic_search_tool (line 143)
return Tool(semantic_search_functions, name=td.AgenticToolName.SEMANTIC_SEARCH, description=td.SEMANTIC_SEARCH)
# create_get_function_source_tool (line 158)
return Tool(get_function_source_by_id, name=td.AgenticToolName.GET_FUNCTION_SOURCE, description=td.GET_FUNCTION_SOURCE)This removes the docstrings, uses the existing richer constants, and aligns perfectly with how every other tool in the codebase is constructed.
Rule Used: ## Technical Requirements
Agentic Framework
-... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/tools/semantic_search.py
Line: 123
Comment:
**Docstrings violate project standards; use existing `td` descriptions instead**
The project's coding standard explicitly prohibits docstrings ("No Comments or Docstrings — code should be self-documenting"). More importantly, this codebase already has an established pattern for providing tool descriptions: passing `description=` directly to the `Tool()` constructor from constants in `tool_descriptions.py`.
Looking at every other tool in the project — `file_reader.py` uses `description=td.FILE_READER`, `codebase_query.py` uses `description=td.CODEBASE_QUERY`, etc. — the `description` kwarg is what actually avoids the null issue. The docstring is a fallback pydantic-ai uses when no `description` is passed.
Crucially, `tool_descriptions.py` **already defines** both `td.SEMANTIC_SEARCH` (line 51–54) and `td.GET_FUNCTION_SOURCE` (line 56–59) with richer, more descriptive text. They are even mapped in the `AGENTIC_TOOLS` dict (lines 156–157) but are never passed to the `Tool()` constructors, which is the real root cause.
The correct fix is to add the `description=` kwarg to both `Tool()` calls:
```python
# create_semantic_search_tool (line 143)
return Tool(semantic_search_functions, name=td.AgenticToolName.SEMANTIC_SEARCH, description=td.SEMANTIC_SEARCH)
# create_get_function_source_tool (line 158)
return Tool(get_function_source_by_id, name=td.AgenticToolName.GET_FUNCTION_SOURCE, description=td.GET_FUNCTION_SOURCE)
```
This removes the docstrings, uses the existing richer constants, and aligns perfectly with how every other tool in the codebase is constructed.
**Rule Used:** ## Technical Requirements
### Agentic Framework
-... ([source](https://app.greptile.com/review/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Code Review
This pull request correctly identifies and fixes a compatibility issue with LM Studio by ensuring pydantic-ai receives a description for the semantic search tools. However, the current implementation of adding docstrings conflicts with a project-wide rule that prohibits them, which is enforced by a pre-commit hook. A more robust solution that aligns with the project's conventions is to explicitly pass the tool descriptions to the pydantic_ai.Tool constructor. The necessary descriptions are already defined in codebase_rag/tools/tool_descriptions.py. This approach resolves the bug while adhering to the established coding standards.
|
|
||
| def create_semantic_search_tool() -> Tool: | ||
| async def semantic_search_functions(query: str, top_k: int = 5) -> str: | ||
| """Search for functions and classes in the codebase using semantic similarity.""" |
There was a problem hiding this comment.
While adding a docstring fixes the issue, it violates the project's rule against using docstrings. A better approach is to explicitly provide the description to the Tool constructor, using the existing description from tool_descriptions.py.
Please remove this docstring and modify the Tool creation at line 143 as follows:
return Tool(
semantic_search_functions,
name=td.AgenticToolName.SEMANTIC_SEARCH,
description=td.SEMANTIC_SEARCH,
)This change resolves the issue without contravening project standards.
References
- Docstrings are not allowed in this project, as enforced by a pre-commit hook.
|
|
||
| def create_get_function_source_tool() -> Tool: | ||
| async def get_function_source_by_id(node_id: int) -> str: | ||
| """Retrieves the source code of a function or class by its graph node ID.""" |
There was a problem hiding this comment.
As with the other tool, this docstring violates the project's 'no docstrings' rule. Please remove it and instead pass the description directly to the Tool constructor at line 158:
return Tool(
get_function_source_by_id,
name=td.AgenticToolName.GET_FUNCTION_SOURCE,
description=td.GET_FUNCTION_SOURCE,
)This maintains consistency and adheres to the project's coding guidelines.
References
- Docstrings are not allowed in this project, as enforced by a pre-commit hook.
Use the existing td.SEMANTIC_SEARCH and td.GET_FUNCTION_SOURCE constants as explicit description= arguments to the Tool() constructor, consistent with every other tool factory in the codebase. Remove docstrings added in the previous commit, which violated the project no-docstrings rule. This ensures LM Studio and other strict OpenAI-compatible backends receive a valid non-null description field in the tool schema.
Problem
When using
cgrwith LM Studio as the backend, the agent fails immediately with:The root cause is that
semantic_search_functionsandget_function_source_by_idhad no docstrings. pydantic-ai derives the tooldescriptionfield from the function docstring, so without one it sends"description": nullin the tool schema. LM Studio's OpenAI-compatible API layer rejects null descriptions, while the real OpenAI API silently accepts them.Fix
Add docstrings to both inner tool functions in
codebase_rag/tools/semantic_search.py:semantic_search_functions: describes semantic similarity search over the codebaseget_function_source_by_id: describes retrieval of source code by graph node IDThis ensures both tools send a valid non-null description string, resolving the 400 error.
Testing
Run
cgr startwith an LM Studio backend (or any OpenAI-compatible server that strictly validates tool schemas). The agent should initialize and respond without error.