-
-
Notifications
You must be signed in to change notification settings - Fork 349
feat(mcp/tools): add shell_command, document_analyzer, semantic_search, and ask_agent tools to MCP registry #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -4,26 +4,37 @@ | |||||||||
| """ | ||||||||||
|
|
||||||||||
| import itertools | ||||||||||
| import sys | ||||||||||
| from collections.abc import Callable | ||||||||||
| from dataclasses import dataclass | ||||||||||
| from pathlib import Path | ||||||||||
| from typing import Any, cast | ||||||||||
|
|
||||||||||
| from loguru import logger | ||||||||||
| from rich.console import Console | ||||||||||
|
|
||||||||||
| from codebase_rag.graph_updater import GraphUpdater | ||||||||||
| from codebase_rag.parser_loader import load_parsers | ||||||||||
| from codebase_rag.services.graph_service import MemgraphIngestor | ||||||||||
| from codebase_rag.services.llm import CypherGenerator | ||||||||||
| from codebase_rag.services.llm import CypherGenerator, create_rag_orchestrator | ||||||||||
| from codebase_rag.tools.code_retrieval import CodeRetriever, create_code_retrieval_tool | ||||||||||
| from codebase_rag.tools.codebase_query import create_query_tool | ||||||||||
| from codebase_rag.tools.directory_lister import ( | ||||||||||
| DirectoryLister, | ||||||||||
| create_directory_lister_tool, | ||||||||||
| ) | ||||||||||
| from codebase_rag.tools.document_analyzer import ( | ||||||||||
| DocumentAnalyzer, | ||||||||||
| create_document_analyzer_tool, | ||||||||||
| ) | ||||||||||
| from codebase_rag.tools.file_editor import FileEditor, create_file_editor_tool | ||||||||||
| from codebase_rag.tools.file_reader import FileReader, create_file_reader_tool | ||||||||||
| from codebase_rag.tools.file_writer import FileWriter, create_file_writer_tool | ||||||||||
| from codebase_rag.tools.semantic_search import ( | ||||||||||
| create_get_function_source_tool, | ||||||||||
| create_semantic_search_tool, | ||||||||||
| ) | ||||||||||
| from codebase_rag.tools.shell_command import ShellCommander, create_shell_command_tool | ||||||||||
|
|
||||||||||
|
|
||||||||||
| @dataclass | ||||||||||
|
|
@@ -66,10 +77,14 @@ def __init__( | |||||||||
| self.file_reader = FileReader(project_root=project_root) | ||||||||||
| self.file_writer = FileWriter(project_root=project_root) | ||||||||||
| self.directory_lister = DirectoryLister(project_root=project_root) | ||||||||||
| self.shell_commander = ShellCommander(project_root=project_root) | ||||||||||
| self.document_analyzer = DocumentAnalyzer(project_root=project_root) | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||||||||||
|
|
||||||||||
| # Create pydantic-ai tools - we'll call the underlying functions directly | ||||||||||
| # Use a Console that outputs to stderr to avoid corrupting JSONRPC on stdout | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inline comments violate the comment policy The project's comment policy only permits inline comments that are: (1) top-of-file, (2) prefixed with This applies to every inline comment introduced in this PR:
Remove or prefix all of these with Context Used: Rule from Agentic Framework
Prompt To Fix With AIThis is a comment left during a code review.
Path: codebase_rag/mcp/tools.py
Line: 84
Comment:
**Inline comments violate the comment policy**
The project's comment policy only permits inline comments that are: (1) top-of-file, (2) prefixed with `(H)`, or (3) type annotations. None of the newly added inline comments qualify.
This applies to every inline comment introduced in this PR:
- `codebase_rag/mcp/tools.py:84` — `# Use a Console that outputs to stderr to avoid corrupting JSONRPC on stdout`
- `codebase_rag/mcp/tools.py:105` — `# Create RAG orchestrator agent (lazy initialization for testing)`
- `codebase_rag/mcp/tools.py:534` — `# Suppress all logging output during agent execution`
- `codebase_rag/mcp/tools.py:536` — `# Temporarily redirect stdout and stderr to suppress all output`
- `codebase_rag/mcp/tools.py:538` — `# Temporarily disable loguru logging`
- `codebase_rag/mcp/tools.py:541` — `# Run the query using the RAG agent`
- `codebase_rag/mcp/tools.py:544` — `# Re-enable logging`
- `codebase_rag/mcp/tools.py:547` — `# Fail silently without logging or printing error details`
Remove or prefix all of these with `(H)` if the intent is genuinely human-authored explanation.
**Context Used:** Rule from `dashboard` - ## Technical Requirements
### Agentic Framework
- **PydanticAI Only**: This project uses PydanticAI... ([source](https://app.greptile.com/review/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||
| stderr_console = Console(file=sys.stderr, width=None, force_terminal=True) | ||||||||||
| self._query_tool = create_query_tool( | ||||||||||
| ingestor=ingestor, cypher_gen=cypher_gen, console=None | ||||||||||
| ingestor=ingestor, cypher_gen=cypher_gen, console=stderr_console | ||||||||||
| ) | ||||||||||
| self._code_tool = create_code_retrieval_tool(code_retriever=self.code_retriever) | ||||||||||
| self._file_editor_tool = create_file_editor_tool(file_editor=self.file_editor) | ||||||||||
|
|
@@ -78,6 +93,17 @@ def __init__( | |||||||||
| self._directory_lister_tool = create_directory_lister_tool( | ||||||||||
| directory_lister=self.directory_lister | ||||||||||
| ) | ||||||||||
| self._shell_command_tool = create_shell_command_tool( | ||||||||||
| shell_commander=self.shell_commander | ||||||||||
| ) | ||||||||||
| self._document_analyzer_tool = create_document_analyzer_tool( | ||||||||||
| self.document_analyzer | ||||||||||
| ) | ||||||||||
| self._semantic_search_tool = create_semantic_search_tool() | ||||||||||
| self._function_source_tool = create_get_function_source_tool() | ||||||||||
|
|
||||||||||
| # Create RAG orchestrator agent (lazy initialization for testing) | ||||||||||
| self._rag_agent: Any = None | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use
Suggested change
The import This also applies to the property and setter signatures on lines 267 and 290. Context Used: Rule from Agentic Framework
Prompt To Fix With AIThis is a comment left during a code review.
Path: codebase_rag/mcp/tools.py
Line: 106
Comment:
**Use `Agent` type instead of `Any`**
`self._rag_agent: Any = None` loses all type information. `create_rag_orchestrator` returns `Agent` (from `pydantic_ai`), so the field and the property should be typed accordingly:
```suggestion
self._rag_agent: Agent | None = None
```
The import `from pydantic_ai import Agent` should be added at the top of the file, and the `rag_agent` property signature updated to `-> Agent:` and the setter to `value: Agent | None`.
This also applies to the property and setter signatures on lines 267 and 290.
**Context Used:** Rule from `dashboard` - ## Technical Requirements
### Agentic Framework
- **PydanticAI Only**: This project uses PydanticAI... ([source](https://app.greptile.com/review/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||
|
|
||||||||||
| # Build tool registry - single source of truth for all tool metadata | ||||||||||
| self._tools: dict[str, ToolMetadata] = { | ||||||||||
|
|
@@ -214,8 +240,57 @@ def __init__( | |||||||||
| handler=self.list_directory, | ||||||||||
| returns_json=False, | ||||||||||
| ), | ||||||||||
| "ask_agent": ToolMetadata( | ||||||||||
| name="ask_agent", | ||||||||||
| description="Ask the Code Graph RAG agent a question about the codebase. " | ||||||||||
| "Use this tool for general questions about the codebase, architecture, functionality, and code relationships. " | ||||||||||
| "Examples: 'How is the authentication implemented?', " | ||||||||||
| "'What are the main components of the system?', 'Where is the database connection configured?'", | ||||||||||
| input_schema={ | ||||||||||
| "type": "object", | ||||||||||
| "properties": { | ||||||||||
| "question": { | ||||||||||
| "type": "string", | ||||||||||
| "description": "A question about the codebase, architecture, functionality, and code relationships. " | ||||||||||
| "Examples: 'What functions call UserService.create_user?', " | ||||||||||
| "'How is error handling implemented?', 'What are the main entry points?'", | ||||||||||
| } | ||||||||||
| }, | ||||||||||
| "required": ["question"], | ||||||||||
| }, | ||||||||||
| handler=self.ask_agent, | ||||||||||
| returns_json=True, | ||||||||||
| ), | ||||||||||
| } | ||||||||||
|
|
||||||||||
| @property | ||||||||||
| def rag_agent(self) -> Any: | ||||||||||
| """Lazy-initialize the RAG orchestrator agent on first access. | ||||||||||
|
|
||||||||||
| This allows tests to mock the agent without triggering LLM initialization. | ||||||||||
| """ | ||||||||||
| if self._rag_agent is None: | ||||||||||
| self._rag_agent = create_rag_orchestrator( | ||||||||||
| tools=[ | ||||||||||
| self._query_tool, | ||||||||||
| self._code_tool, | ||||||||||
| self._file_reader_tool, | ||||||||||
| self._file_writer_tool, | ||||||||||
| self._file_editor_tool, | ||||||||||
| self._shell_command_tool, | ||||||||||
| self._directory_lister_tool, | ||||||||||
| self._document_analyzer_tool, | ||||||||||
| self._semantic_search_tool, | ||||||||||
| self._function_source_tool, | ||||||||||
| ] | ||||||||||
|
Comment on lines
+274
to
+285
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tools=[
self._query_tool,
self._code_tool,
self._file_reader_tool,
self._directory_lister_tool,
self._document_analyzer_tool,
self._semantic_search_tool,
self._function_source_tool,
]References
|
||||||||||
| ) | ||||||||||
| return self._rag_agent | ||||||||||
|
|
||||||||||
| @rag_agent.setter | ||||||||||
| def rag_agent(self, value: Any) -> None: | ||||||||||
| """Allow setting the RAG agent (useful for testing).""" | ||||||||||
| self._rag_agent = value | ||||||||||
|
|
||||||||||
| async def index_repository(self) -> str: | ||||||||||
| """Parse and ingest the repository into the Memgraph knowledge graph. | ||||||||||
|
|
||||||||||
|
|
@@ -439,6 +514,43 @@ async def list_directory(self, directory_path: str = ".") -> str: | |||||||||
| logger.error(f"[MCP] Error listing directory: {e}") | ||||||||||
| return f"Error: {str(e)}" | ||||||||||
|
|
||||||||||
| async def ask_agent(self, question: str) -> dict[str, Any]: | ||||||||||
| """Ask a single question about the codebase and get an answer. | ||||||||||
|
|
||||||||||
| This tool executes the question using the RAG agent and returns the response | ||||||||||
| in a structured format suitable for MCP clients. | ||||||||||
|
|
||||||||||
| Logging is suppressed during execution to prevent token waste in LLM context. | ||||||||||
|
|
||||||||||
| Args: | ||||||||||
| question: The question to ask about the codebase | ||||||||||
|
|
||||||||||
| Returns: | ||||||||||
| Dictionary with 'output' key containing the answer | ||||||||||
| """ | ||||||||||
| import io | ||||||||||
| from contextlib import redirect_stderr, redirect_stdout | ||||||||||
|
Comment on lines
+531
to
+532
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move imports to module level
Suggested change
The two import lines can be removed from here once they are moved to the module-level import section. Context Used: Rule from Agentic Framework
Prompt To Fix With AIThis is a comment left during a code review.
Path: codebase_rag/mcp/tools.py
Line: 531-532
Comment:
**Move imports to module level**
`import io` and `from contextlib import redirect_stderr, redirect_stdout` are placed inside the function body. Per Python convention (and PEP 8), imports belong at the top of the module. Move both to the top-level import block alongside the other stdlib imports.
```suggestion
# Suppress all logging output during agent execution
try:
```
The two import lines can be removed from here once they are moved to the module-level import section.
**Context Used:** Rule from `dashboard` - ## Technical Requirements
### Agentic Framework
- **PydanticAI Only**: This project uses PydanticAI... ([source](https://app.greptile.com/review/custom-context?memory=d4240b05-b763-467a-a6bf-94f73e8b6859))
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||
|
|
||||||||||
| # Suppress all logging output during agent execution | ||||||||||
| try: | ||||||||||
| # Temporarily redirect stdout and stderr to suppress all output | ||||||||||
| with redirect_stdout(io.StringIO()), redirect_stderr(io.StringIO()): | ||||||||||
| # Temporarily disable loguru logging | ||||||||||
| logger.disable("codebase_rag") | ||||||||||
| try: | ||||||||||
| # Run the query using the RAG agent | ||||||||||
| response = await self.rag_agent.run(question, message_history=[]) | ||||||||||
| return {"output": response.output} | ||||||||||
| finally: | ||||||||||
| # Re-enable logging | ||||||||||
| logger.enable("codebase_rag") | ||||||||||
|
Comment on lines
+537
to
+546
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Global logger suppression is not async-safe
The same applies to A per-handler loguru sink approach or structlog's contextvars-based filtering would be safe for async; the global mutable approach used here is not. Prompt To Fix With AIThis is a comment left during a code review.
Path: codebase_rag/mcp/tools.py
Line: 537-546
Comment:
**Global logger suppression is not async-safe**
`logger.disable("codebase_rag")` modifies loguru's global state, which is shared across all coroutines. In a concurrent async server, if two `ask_agent` calls are in-flight simultaneously, coroutine A's `logger.disable(...)` will suppress logging for coroutine B mid-execution, and coroutine B's `logger.enable(...)` in its `finally` block will re-enable logging while coroutine A still expects it to be disabled.
The same applies to `redirect_stdout(io.StringIO())` and `redirect_stderr(io.StringIO())`: these patch the global `sys.stdout`/`sys.stderr`, which are shared across all concurrently running coroutines. Output from unrelated coroutines running while these redirects are active will be silently dropped into the `StringIO` buffer.
A per-handler loguru sink approach or structlog's contextvars-based filtering would be safe for async; the global mutable approach used here is not.
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||
| except Exception: | ||||||||||
| # Fail silently without logging or printing error details | ||||||||||
| return { | ||||||||||
| "output": "There was an error processing your question", | ||||||||||
| "error": True, | ||||||||||
| } | ||||||||||
|
Comment on lines
+547
to
+552
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Errors swallowed silently after logging is re-enabled By the time the outer This makes production debugging extremely difficult. The exception should at minimum be logged at except Exception as e:
logger.error(f"[MCP] ask_agent failed: {e}", exc_info=True)
return {
"output": "There was an error processing your question",
"error": True,
}Prompt To Fix With AIThis is a comment left during a code review.
Path: codebase_rag/mcp/tools.py
Line: 547-552
Comment:
**Errors swallowed silently after logging is re-enabled**
By the time the outer `except Exception:` block is reached, the `finally` clause on line 544 has already re-enabled logging via `logger.enable("codebase_rag")`. The exception is therefore catchable and loggable, but the block discards it entirely — returning only a generic `"There was an error processing your question"` string with no trace of what went wrong.
This makes production debugging extremely difficult. The exception should at minimum be logged at `logger.error(...)` before returning the fallback response:
```python
except Exception as e:
logger.error(f"[MCP] ask_agent failed: {e}", exc_info=True)
return {
"output": "There was an error processing your question",
"error": True,
}
```
How can I resolve this? If you propose a fix, please make it concise.
Comment on lines
+547
to
+552
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While it's important to return a generic error to the client to avoid leaking implementation details or corrupting the JSON-RPC stream, swallowing the exception without logging it on the server side makes debugging failures nearly impossible. The except Exception as e:
# Log the error for debugging, but return a generic message to the client
# Catching a broad Exception here is pragmatic due to the wide and unpredictable
# variety of exceptions that can be raised by external libraries.
logger.error(f"[MCP] Error in ask_agent: {e}", exc_info=True)
return {
"output": "There was an error processing your question",
"error": True,
}References
|
||||||||||
|
|
||||||||||
| def get_tool_schemas(self) -> list[dict[str, Any]]: | ||||||||||
| """Get MCP tool schemas for all registered tools. | ||||||||||
|
|
||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
ShellCommandertool (initialized here) allows for arbitrary file access. While it checks if the command name is in an allowlist (e.g.,cat,ls), it does not validate or sanitize the arguments. An attacker can provide absolute paths as arguments to read or list files outside the project root (e.g.,execute_shell_command(command="cat /etc/passwd")).