Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 52 additions & 4 deletions codebase_rag/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,11 +159,21 @@ def is_edit_operation_response(response_text: str) -> bool:
return tool_usage or content_indicators or pattern_match


def _setup_common_initialization(repo_path: str) -> Path:
"""Common setup logic for both main and optimize functions."""
def _setup_common_initialization(repo_path: str, question_mode: bool = False) -> Path:
"""Common setup logic for both main and optimize functions.

Args:
repo_path: Path to the repository
question_mode: If True, suppress INFO/DEBUG/WARNING logs (only show errors and direct output)
"""
Comment on lines +163 to +168
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the project's general rules, docstrings are not allowed. Please replace this multi-line docstring with a regular comment to adhere to the project's coding standards.

    # Common setup logic for interactive, optimization, and single-query modes.
    # When question_mode is True, logs are suppressed and only errors are sent to stderr.
References
  1. Docstrings are not allowed in this project, as enforced by a pre-commit hook.

Comment on lines +162 to +168
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring added to existing function

Custom instructions state "No Comments or Docstrings" - code should be self-documenting. While the codebase has existing docstrings, consider whether this addition is necessary or if the code is clear without it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/main.py
Line: 162-168

Comment:
docstring added to existing function

Custom instructions state "No Comments or Docstrings" - code should be self-documenting. While the codebase has existing docstrings, consider whether this addition is necessary or if the code is clear without it.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

# Logger initialization
logger.remove()
logger.add(sys.stdout, format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {message}")
if question_mode:
# In question mode, only show ERROR level logs
logger.add(sys.stderr, level="ERROR", format="{message}")
else:
# In interactive mode, show all logs
logger.add(sys.stdout, format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {message}")

# Temporary directory cleanup
project_root = Path(repo_path).resolve()
Expand Down Expand Up @@ -774,6 +784,29 @@ def _validate_provider_config(role: str, config: Any) -> None:
return rag_agent


async def main_async_single_query(
repo_path: str, batch_size: int, question: str
) -> None:
"""Initializes services and runs a single query in non-interactive mode."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The project's general rules state that docstrings are not allowed. To maintain consistency with the project's standards, please replace this docstring with a regular comment.

    # Initializes services and runs a single query in non-interactive mode.
References
  1. Docstrings are not allowed in this project, as enforced by a pre-commit hook.

Comment on lines +787 to +790
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring added to new function

Custom instructions state "No Comments or Docstrings". While the existing codebase has docstrings, consider if this one is needed or if the function name and signature are self-explanatory.

Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/main.py
Line: 787-790

Comment:
docstring added to new function

Custom instructions state "No Comments or Docstrings". While the existing codebase has docstrings, consider if this one is needed or if the function name and signature are self-explanatory.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

project_root = _setup_common_initialization(repo_path, question_mode=True)

with MemgraphIngestor(
host=settings.MEMGRAPH_HOST,
port=settings.MEMGRAPH_PORT,
batch_size=batch_size,
) as ingestor:
rag_agent = _initialize_services_and_agent(repo_path, ingestor)

# Handle images in the question
question_with_context = _handle_chat_images(question, project_root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The main_async_single_query function passes the user-supplied question to _handle_chat_images, which identifies absolute file paths ending in image extensions and copies them to a temporary directory without any path validation. This could allow an attacker to read arbitrary files on the system if they can be tricked into having an image extension or if the agent is manipulated to read them. It is recommended to validate that any identified file paths are within the allowed project directory in the _handle_chat_images function.


# Run the query
response = await rag_agent.run(question_with_context, message_history=[])

# Output response to stdout
print(response.output)
Comment on lines +804 to +807
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The main_async_single_query function executes the RAG agent with user-supplied input but does not implement any confirmation logic for sensitive operations such as file modifications or shell command execution. Since the agent has access to powerful tools (shell_command_tool, file_writer_tool, etc.), an attacker can use prompt injection via the --ask-agent flag to perform unauthorized actions on the system. In non-interactive mode, the application should either block these operations by default or require an explicit override via the --no-confirm flag. Additionally, ensure that the shell_command tool is included in the EDIT_TOOLS list in config.py to be properly detected by is_edit_operation_response.

Suggested change
response = await rag_agent.run(question_with_context, message_history=[])
# Output response to stdout
print(response.output)
response = await rag_agent.run(question_with_context, message_history=[])
# Check for edit operations in non-interactive mode
if confirm_edits_globally and is_edit_operation_response(response.output):
console.print("\n[bold red]Error: The agent attempted to perform file modifications or sensitive operations in non-interactive mode. Use --no-confirm to allow these changes.[/bold red]")
return
# Output response to stdout
print(response.output)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uses print() instead of logger

Custom instructions prefer loguru over print(). However, this is likely intentional for clean stdout output in non-interactive mode (logs go to stderr). This is a standard CLI pattern for scriptable tools.

Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/main.py
Line: 807

Comment:
uses `print()` instead of logger

Custom instructions prefer `loguru` over `print()`. However, this is likely intentional for clean stdout output in non-interactive mode (logs go to stderr). This is a standard CLI pattern for scriptable tools.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!



async def main_async(repo_path: str, batch_size: int) -> None:
"""Initializes services and runs the main application loop."""
project_root = _setup_common_initialization(repo_path)
Expand Down Expand Up @@ -840,6 +873,12 @@ def start(
min=1,
help="Number of buffered nodes/relationships before flushing to Memgraph",
),
ask_agent: str | None = typer.Option(
None,
"-a",
"--ask-agent",
help="Run a single query and exit (non-interactive mode). Output is sent to stdout.",
),
) -> None:
"""Starts the Codebase RAG CLI."""
global confirm_edits_globally
Expand Down Expand Up @@ -892,7 +931,16 @@ def start(
return

try:
asyncio.run(main_async(target_repo_path, effective_batch_size))
if ask_agent:
# Non-interactive mode: run single query and exit
asyncio.run(
main_async_single_query(
target_repo_path, effective_batch_size, ask_agent
)
)
else:
# Interactive mode: run chat loop
asyncio.run(main_async(target_repo_path, effective_batch_size))
except KeyboardInterrupt:
console.print("\n[bold red]Application terminated by user.[/bold red]")
except ValueError as e:
Expand Down