Skip to content

Three Windows-only bugs in v2.3.1 (CRLF Databricks detection, UTF-8 gitignore, stale FastMCP test API) #239

@azizur100389

Description

@azizur100389

Summary

While investigating v2.3.1 test output on Windows, I found three distinct bugs that all reproduce on origin/main with no modifications. They are not flakiness — each is a deterministic cross-platform bug with a clear root cause.

All three are currently masked by the 8 failing tests on Windows, which are easy to dismiss as "platform noise" but actually reveal real product bugs.


Bug 1 — get_data_dir() writes non-UTF-8 .gitignore on Windows

File: code_review_graph/incremental.py:128-138 (in get_data_dir())

inner_gitignore.write_text(
    "# Auto-generated by code-review-graph — do not commit database files.\n"
    "# The graph.db contains absolute paths and code structure metadata.\n"
    "*\n"
)

write_text() is called without an encoding= argument. The string literal contains an em-dash (, U+2014). On Windows, Path.write_text uses the system default codepage (cp1252 in most locales), which encodes U+2014 as byte 0x97. When anything later reads this file as UTF-8 — including get_data_dir's own test test_default_uses_repo_subdir at tests/test_incremental.py:194 — it raises:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 38: invalid start byte

Reproducible on: Windows 11 / Python 3.10.10 / v2.3.1, clean checkout of origin/main. No modifications required.

Impact: Any Windows user running code-review-graph build generates a .gitignore that cannot be parsed by standard UTF-8 tooling (including the tool's own tests and any editor set to UTF-8 strict mode). Unix systems are unaffected because they default to UTF-8.

Fix: Add encoding="utf-8" to the write_text() call. (Same fix that get_data_dir's caller _ensure_repo_gitignore at line 170 already applies — it uses encoding="utf-8", but this sibling function was missed.)


Bug 2 — Databricks notebook auto-detection fails on CRLF line endings

File: code_review_graph/parser.py:391-394 (in parse_bytes())

if language == "python" and source.startswith(
    b"# Databricks notebook source\n",
):
    return self._parse_databricks_py_notebook(path, source)

The check hard-codes \n. On Windows, git checkouts default to core.autocrlf=true, which converts the Databricks fixture's line endings to \r\n on checkout. The file on disk starts with b"# Databricks notebook source\r\n", so source.startswith(b"# Databricks notebook source\n") returns False, the file is parsed as regular Python, and all Databricks-specific handling is bypassed:

  • notebook_format metadata is never set on the File node
  • # MAGIC %sql cells are not parsed as SQL (no IMPORTS_FROM edges for table references)
  • # MAGIC %md cells are not skipped
  • Per-cell cell_index metadata is missing from function nodes

Reproducible on: Windows 11 / v2.3.1, clean checkout, confirmed with a single python -c statement reading tests/fixtures/sample_databricks_export.py — the file starts with b'# Databricks notebook source\r\n'.

Impact: Every one of the 4 TestDatabricksPyNotebook tests fails on Windows:

  • test_detects_databricks_header — no notebook_format
  • test_extracts_sql_tables — missing bronze.events, silver.users, etc.
  • test_skips_magic_md_cells — extra function from MD cells that should be skipped
  • test_cell_index_trackingcell_index metadata missing

Fix: Parse the first line robustly, stripping any trailing \r:

first_line_end = source.find(b"\n")
first_line = source[:first_line_end].rstrip(b"\r") if first_line_end != -1 else source
if language == "python" and first_line == b"# Databricks notebook source":
    return self._parse_databricks_py_notebook(path, source)

This matches both \n and \r\n endings.


Bug 3 — Stale FastMCP API in async regression guard test

File: tests/test_main.py:68-71 (in TestLongRunningToolsAreAsync::test_heavy_tools_are_coroutines)

async def test_heavy_tools_are_coroutines(self):
    tools = await crg_main.mcp.get_tools()

FastMCP does not expose a get_tools() method. The available API in fastmcp>=2.14.0 (the version pinned in pyproject.toml) is list_tools(). The test dies at AttributeError: 'FastMCP' object has no attribute 'get_tools'. Did you mean: 'list_tools'?.

This is not a cosmetic issue — this test is the regression guard for #46/#136 (v2.3.1 Windows async fix). The PR description for #231 explicitly says:

There are regression tests (test_heavy_tools_are_coroutines + test_heavy_tool_source_uses_to_thread) that will fail at CI collection time if anyone converts one of the 5 tools back to sync in a future refactor, so we shouldn't hit this class of bug again.

But the test has been failing at runtime on every platform since it was written, which means the regression guard is not actually guarding anything. A future refactor that converts build_or_update_graph_tool back to sync would not be caught by CI because this test is already red.

Reproducible on: every platform with the pinned fastmcp>=2.14.0. Not Windows-specific.

Fix: Replace get_tools() with list_tools(). The rest of the test logic (verifying asyncio.iscoroutinefunction(tool.fn) for the heavy 5 tools) is correct.


Follow-up PR

A single PR will fix all three bugs in one focused commit. Each fix is small, each has a direct regression test, and each is verifiable on unchanged origin/main before and after.

Note: there are 2 additional failing tests on Windows (test_returns_none_without_git, test_falls_back_to_start in TestFindRepoRoot / TestFindProjectRoot) that are true test environmental flakiness — they assume no ancestor of tmp_path has a .git directory, which is false on any Windows user whose home directory contains a git repo (e.g. dotfiles). Those are intentionally not addressed here because the fix requires either a product API change (stop_at parameter) or invasive monkeypatching — worth a separate discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions