Skip to content

Commit 81a4010

Browse files
authored
refactor(api): split api/workspaces.py 1,407 → 142 LOC into services/ (closes #25) (#31)
* refactor(api): split api/workspaces.py 1,407 → 142 LOC into services/ (closes #25) The eval flagged api/workspaces.py as a 1,400+ LOC monolith co-locating database access, dict transformation, workspace resolution, project assignment heuristics, CLI session handling, and HTTP response formatting in one file (Section 5.2). The get_workspace_tabs handler alone spanned ~450 lines; _determine_project_for_conversation was 120+ lines of cascading fallbacks. Helpers (_read_json_file) were duplicated across api/workspaces.py and api/composers.py. This refactor extracts the file into focused service modules and leaves three thin route handlers. Behaviour-preserving: same JSON response shapes, same URL routes, same sort orders. All 178 existing tests pass without modification; a small back-compat shim re-exports _infer_invalid_workspace_aliases, _determine_project_for_conversation, and _infer_workspace_name_from_context from api/workspaces so existing test imports continue to work (production callers should import from services/ directly). Extraction layout (matches the order in #25's spec for minimal merge- conflict risk against parallel work): utils/workspace_descriptor.py — pure helpers; one _read_json_file consolidating the duplicate copies from api/workspaces.py and api/composers.py. services/workspace_db.py — _open_global_db, _collect_workspace _entries, _build_composer_id_to_ workspace_id, _collect_invalid_ workspace_ids. services/workspace_resolver.py — _determine_project_for_conversation, _infer_invalid_workspace_aliases, _create_project_name_to_workspace_id _map, _create_workspace_path_to_id_ map, _get_workspace_display_name, _infer_workspace_name_from_context, _get_project_from_file_path. services/cli_tabs.py — _get_cli_workspace_tabs (the CLI-session branch of the tabs route). services/workspace_listing.py — list_workspace_projects (the entire body of GET /api/workspaces). services/workspace_tabs.py — assemble_workspace_tabs (the entire body of GET /api/workspaces/<id>/tabs for non-CLI workspaces). api/workspaces.py is now 142 LOC: three route handlers that parse the request, hand off to the service layer, and JSONify the result. Verified locally: - 178 unit tests pass without modification. - mypy clean on the new modules (no new errors on touched files). - Live smoke on real Cursor workspaceStorage: * GET / -> HTTP 200 * GET /api/workspaces -> HTTP 200, 2 workspaces * GET /api/workspaces/global/tabs -> HTTP 200, 5 tabs * GET /api/search?q=test -> HTTP 200, 7 results No exceptions, no behaviour change observed. * review: address CodeRabbit findings on PR #31 (5 fixes + tests) Five CodeRabbit findings on the api/workspaces.py split, addressed together with regression coverage: 1. services/cli_tabs.py: wrap ``messages_to_bubbles(...)`` in try/except. ``traverse_blobs(...)`` was already guarded; one malformed session from this second call would 500 the whole tabs endpoint instead of being skipped like the traverse failures. Test: tests/test_cli_tabs.py mocks one session to throw, verifies the endpoint returns 200 with the failing session skipped and the healthy session intact. 2. services/workspace_db.py: build SQLite ``file:`` URIs via ``Path(...).resolve().as_uri()`` instead of f-string interpolation. The naive form breaks on paths containing spaces or other reserved characters. The resolver code already used the safe form; aligning the db helpers here. Test: tests/test_workspace_db_special_paths.py creates a workspace inside ``Cursor User/workspaceStorage/`` (path with a space), verifies the mapping and global-DB open both succeed. 3. services/workspace_listing.py + workspace_tabs.py: accept dict-shaped ``projectLayouts`` entries in addition to JSON-string ones. The resolver code already handled both shapes; this brings the listing/tabs paths in line so dict layouts no longer leave ``project_layouts_map`` empty (which silently fell composers back to the "global" bucket). Test: tests/test_project_layouts_dict_shape.py runs ``list_workspace_projects`` against both shapes, asserts the composer is routed to its workspace in both. 4. services/workspace_resolver.py: ``_get_project_from_file_path`` now uses ``os.path.commonpath`` instead of ``startswith`` so a workspace rooted at ``/repo/app`` doesn't sibling-match ``/repo/app2/...``. ``ValueError`` from ``commonpath`` (e.g. mismatched drive on Windows) is caught as "not within". Test: tests/test_project_path_boundary.py covers the sibling-prefix case, the file-outside-any-workspace case, and the regular inside-workspace case. 5. services/workspace_tabs.py: mark synthetic "Tool Action" bubbles with ``synthetic=True`` and skip them in the response-time pass. Synthetic bubbles get a fresh ``datetime.now()`` timestamp; without the flag they were treated as AI responses from the last user message, inflating ``responseTimeMs`` / ``totalResponseTimeMs`` with values unrelated to model latency. The flag stays internal — the tab serializer only copies type/text/timestamp/metadata to the wire payload. 186 tests pass (was 178; +8 new). * review: harden malformed-record paths across services (#31, round 2) Five CodeRabbit findings on PR #31, all the same shape — one bad nested record (missing key, wrong type, transient I/O) was bubbling out to the outermost ``except`` and dropping a whole workspace/tab/composer/CLI project instead of just skipping the bad record. 1. services/cli_tabs.py: guard session.get("session_id") before use. Previously session["session_id"] could KeyError and 500 the whole tabs endpoint. 2. services/workspace_db.py: wrap sqlite3.connect(...) in try/except in _open_global_db. A corrupt DB or transient I/O failure now yields (None, path) — same shape as the missing-file branch — instead of raising into the caller. All existing callers branch on ``if not conn:`` so they degrade to the no-global-storage path. 3. services/workspace_listing.py: CLI-section loop now uses s.get("session_id") + s.get("meta") or {}; the original s["session_id"] / s["meta"] would KeyError on a malformed session and drop every CLI project. Also added isinstance(h, dict) to the has_bubbles generator for the same reason. 4. services/workspace_tabs.py + services/workspace_resolver.py: 6 isinstance(..., dict) guards on nested-record loops (header, terminalFiles, attachedFoldersListDirResults, files inside folder, cursorRules, summarizedComposers, plus the two header loops in _determine_project_for_conversation). One bad nested record now skips that record instead of AttributeError-ing and dropping the entire composer. 5. services/workspace_tabs.py: synthetic "Tool Action" bubbles no longer get datetime.now() as their timestamp — they now use to_epoch_ms(diff.timestamp) or to_epoch_ms(diff.createdAt) or max(real_bubble_timestamps). The previous timestamp made every synthetic bubble look newer than the entire transcript and forced it to sort to the end. Regression coverage: - tests/test_cli_tabs.py: malformed session missing session_id - tests/test_workspace_db_special_paths.py: sqlite3.connect raises - tests/test_workspace_listing_cli.py: CLI listing malformed session - tests/test_workspace_tabs_malformed_nested.py: non-dict header doesn't drop composer; synthetic bubble uses diff timestamp; synthetic bubble falls back to max real-bubble timestamp 192 tests pass (was 186; +6 new). * review: three more malformed-record + SQLite-error guards (#31, round 3) - services/workspace_db.py: isinstance(c, dict) guard in _build_composer_id_to_workspace_id. A non-dict entry in ``allComposers`` would AttributeError on ``c.get("composerId")`` and silently discard the entire workspace's composer mapping. Test: tests/test_workspace_db_special_paths.py — list with None / string / int / valid dict, only the valid one maps. - services/workspace_resolver.py: wrap the per-composer ``messageRequestContext:*`` LIKE query in _infer_workspace_name_from_context with try/except sqlite3.Error. A corrupt cursorDiskKV table previously raised through and crashed listing routes that call this helper. Test: tests/test_workspace_name_db_errors.py — seeds globalStorage with cursorDiskKV missing, asserts helper returns None without propagating. - services/workspace_tabs.py: validate the parsed tool-call payload. ``_parse_tool_call`` may return None (or any non-dict); without the guard, ``tool_calls = [None]`` becomes a poison pill that crashes the display-text fallback (``NoneType.get('name')``) and drops the whole composer. Only store when isinstance(tool_call, dict); also guard the display-text fallback site. Test: tests/test_workspace_tabs_malformed_nested.py — patches _parse_tool_call to return None, asserts the composer still appears in the tabs response. 195 tests pass (was 192; +3 new). * review: wrap cursorDiskKV reads in _safe_fetchall (#31, round 4) CodeRabbit flagged that the five ``global_db.execute(...).fetchall()`` calls in assemble_workspace_tabs could raise sqlite3.Error (e.g. a missing or corrupt cursorDiskKV table) and abort the whole tabs endpoint. Matches the resilience pattern already applied to _infer_workspace_name_from_context in round 3. Add a local ``_safe_fetchall(query, params=()) -> list`` helper inside assemble_workspace_tabs that catches sqlite3.Error and returns []. All five call sites — bubbleId LIKE, codeBlockDiff LIKE, two messageRequestContext LIKEs, and the composerData fetch — route through it. The route now returns ``{"tabs": []}, 200`` instead of 500-ing when the schema is corrupt. Regression: tests/test_workspace_tabs_sql_errors.py seeds a real globalStorage/state.vscdb without cursorDiskKV → asserts the response is ({"tabs": []}, 200) and no sqlite3.Error propagates. 196 tests pass (was 195; +1 new). * review: narrow excepts + drop duplicate queries / synthetic bubbles (#31, round 5) Eight findings from Brad's review, addressed together. Recurring shape: broad ``except Exception`` blocks were swallowing not just the realistic failure (KeyError on missing dict field, sqlite3.Error on corrupt table, OSError on missing file) but also bugs we'd want to know about. Each catch is now narrowed to the actual failure mode the surrounding code intends to handle, with regression coverage to pin the contract. 1. services/workspace_db.py:_build_composer_id_to_workspace_id — narrow sqlite3.Error catch on the per-workspace state.vscdb open, json.JSONDecodeError / ValueError catch on the row parse. Round-3 work; one site Brad flagged in the same pass. Test: tests/test_workspace_db_special_paths.py:: TestBuildComposerMappingCorruptDb. 2. services/workspace_resolver.py:_infer_workspace_name_from_context — same narrowing on the local-DB query (lconn.execute fetchone). Test: tests/test_workspace_name_db_errors.py:: TestLocalQueryErrorSwallowed. 3. services/workspace_listing.py:list_workspace_projects — local _safe_fetchall(query, params=()) helper routes the three cursorDiskKV LIKE queries (composerData, messageRequestContext, bubbleId) through sqlite3.Error → []. Same pattern as workspace_tabs.py round 4. Test: tests/test_workspace_listing_sql_errors.py. 4. api/workspaces.py:get_workspace CLI branch — replaced cp["workspace_name"] / cp["last_updated_ms"] / cp["workspace_path"] with .get(); added isinstance(cp, dict) skip to the lookup generator. Brad's "unsafe dict access in CLI branch" finding. Test: tests/test_get_workspace_cli_malformed.py (two cases). 5. services/workspace_listing.py CLI section — the projects loop now guards cp non-dict, missing project_id, non-list sessions, and non-dict session entries. Brad's "unsafe dict access in CLI project loop" finding. Test: tests/test_workspace_listing_cli.py:: TestMalformedCliProjectRecordSkipped. 6. services/cli_tabs.py:_get_cli_workspace_tabs — lookup generator filters isinstance(cp, dict); project field access switched to .get(); session iteration uses isinstance(session, dict). Brad's "unsafe dict access in project lookup" finding on lines 17/22/25. Test: tests/test_cli_tabs.py::TestMalformedCliProjectsListLookup. 7. services/workspace_tabs.py — diffs no longer double-represented. Previously each diff was emitted both as ``tab.codeBlockDiffs`` (which the frontend at dashboard/static/js/download.js:128,274 and templates/workspace.html:135 actually reads) AND as a synthetic ``**Tool Action:**`` AI bubble in ``tab.bubbles`` (no consumer). Dropped the synthesis loop + the now-unused ``synthetic`` flag and its filter in the response-time pass. Brad's "diffs double-represented" finding. Test: tests/test_workspace_tabs_malformed_nested.py:: TestDiffsEmittedOnlyAsCodeBlockDiffs replaces the prior synthetic-timestamp tests. 8. services/workspace_tabs.py — merged the two ``messageRequestContext:%`` LIKE queries (built message_request_ context_map and project_layouts_map respectively) into a single pass that fills both maps. Brad's "messageRequestContext queried twice" finding. 9. services/workspace_db.py — narrowed the two remaining broad excepts: _collect_workspace_entries → OSError; _collect_invalid_workspace_ids → (OSError, ValueError, KeyError, TypeError). Brad's "broad silent except Exception: pass" finding. 207 tests pass (was 196; +11 across this round). The recurring heuristic across all four rounds on this PR: every nested iteration site needs ``isinstance(..., dict)``, every DB read needs ``sqlite3.Error`` catch, and broad excepts swallow the bugs we want to know about. * fix(types): close mypy gate on services/ after post-#35 strict enforcement Two narrow fixes that resolve all 27 mypy errors exposed by master's stricter typecheck gate (newly enforced when #35 removed continue-on-error): * services/workspace_tabs.py:195 — annotate `bubbles: list[dict[str, Any]] = []` so the iteration variables `b` / `m` get treated as `dict[str, Any]` rather than `dict[str, object]`. That single annotation kills 26 of the 27 errors (all the "object has no attribute X" / "Value of type object is not indexable" cluster around the metadata-aggregation loop). * services/workspace_resolver.py:84 — `# type: ignore[call-overload]` on the `row["value"]` access plus a comment pointing at the underlying cause. sqlite3.Row supports string-key access at runtime when `row_factory = sqlite3.Row` is set (which `_open_global_db` does), but mypy's stdlib stub types Row as `tuple[Any, ...]` which only accepts SupportsIndex. Verified: mypy → Success, no issues found in 52 source files; ruff → All checks passed; unittest → 207 / 207 OK.
1 parent 849f696 commit 81a4010

18 files changed

Lines changed: 2502 additions & 1328 deletions

api/workspaces.py

Lines changed: 43 additions & 1328 deletions
Large diffs are not rendered by default.

services/__init__.py

Whitespace-only changes.

services/cli_tabs.py

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
from __future__ import annotations
2+
3+
from datetime import datetime
4+
5+
from flask import current_app, jsonify
6+
7+
from utils.cli_chat_reader import list_cli_projects, messages_to_bubbles, traverse_blobs
8+
from utils.exclusion_rules import build_searchable_text, is_excluded_by_rules
9+
from utils.workspace_path import get_cli_chats_path
10+
11+
12+
def _get_cli_workspace_tabs(workspace_id: str):
13+
"""Return tabs for a Cursor CLI project (workspace_id starts with "cli:")."""
14+
try:
15+
project_id = workspace_id[4:]
16+
cli_projects = list_cli_projects(get_cli_chats_path())
17+
project = next(
18+
(
19+
cp for cp in cli_projects
20+
if isinstance(cp, dict) and cp.get("project_id") == project_id
21+
),
22+
None,
23+
)
24+
if project is None:
25+
return jsonify({"error": "CLI project not found"}), 404
26+
27+
rules = current_app.config.get("EXCLUSION_RULES") or []
28+
ws_name = project.get("workspace_name") or project_id[:12]
29+
sessions = project.get("sessions") or []
30+
if not isinstance(sessions, list):
31+
sessions = []
32+
tabs = []
33+
34+
for session in sessions:
35+
if not isinstance(session, dict):
36+
continue
37+
session_id = session.get("session_id")
38+
if not session_id:
39+
continue
40+
meta = session.get("meta") or {}
41+
created_ms: int = meta.get("createdAt") or int(datetime.now().timestamp() * 1000)
42+
session_name = meta.get("name") or f"Session {session_id[:8]}"
43+
44+
try:
45+
messages = traverse_blobs(session["db_path"])
46+
except Exception as e:
47+
print(f"CLI: could not read session {session_id}: {e}")
48+
continue
49+
50+
try:
51+
bubbles = messages_to_bubbles(messages, created_ms)
52+
except Exception as e:
53+
print(f"CLI: could not convert session {session_id} to bubbles: {e}")
54+
continue
55+
if not bubbles:
56+
continue
57+
58+
# Derive title from first user bubble when name is generic
59+
title = session_name
60+
if not title or title.startswith("New Agent"):
61+
for b in bubbles:
62+
if b["type"] == "user" and b.get("text"):
63+
first_lines = [ln for ln in b["text"].split("\n") if ln.strip()]
64+
if first_lines:
65+
title = first_lines[0][:100]
66+
if len(title) == 100:
67+
title += "..."
68+
break
69+
70+
searchable = build_searchable_text(project_name=ws_name, chat_title=title)
71+
if is_excluded_by_rules(rules, searchable):
72+
continue
73+
74+
# Aggregate metadata
75+
total_tool_calls = 0
76+
tool_breakdown: dict = {}
77+
for b in bubbles:
78+
tcs = (b.get("metadata") or {}).get("toolCalls") or []
79+
total_tool_calls += len(tcs)
80+
for tc in tcs:
81+
tn = tc.get("name", "unknown")
82+
tool_breakdown[tn] = tool_breakdown.get(tn, 0) + 1
83+
84+
tab_meta: dict | None = None
85+
if total_tool_calls or tool_breakdown:
86+
tab_meta = {"totalToolCalls": total_tool_calls or None}
87+
if tool_breakdown:
88+
tab_meta["toolBreakdown"] = tool_breakdown
89+
90+
tab = {
91+
"id": session_id,
92+
"title": title,
93+
"timestamp": created_ms,
94+
"bubbles": [
95+
{
96+
"type": b["type"],
97+
"text": b.get("text", ""),
98+
"timestamp": b.get("timestamp", created_ms),
99+
**({"metadata": b["metadata"]} if b.get("metadata") else {}),
100+
}
101+
for b in bubbles
102+
],
103+
"source": "cli",
104+
}
105+
if tab_meta:
106+
tab_meta_clean = {k: v for k, v in tab_meta.items() if v is not None}
107+
if tab_meta_clean:
108+
tab["metadata"] = tab_meta_clean
109+
110+
tabs.append(tab)
111+
112+
tabs.sort(key=lambda t: t.get("timestamp") or 0, reverse=True)
113+
return jsonify({"tabs": tabs})
114+
115+
except Exception as e:
116+
print(f"Failed to get CLI workspace tabs: {e}")
117+
return jsonify({"error": "Failed to get CLI workspace tabs"}), 500

services/workspace_db.py

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
from __future__ import annotations
2+
3+
import json
4+
import os
5+
import sqlite3
6+
from contextlib import closing, contextmanager
7+
from pathlib import Path
8+
9+
from utils.path_helpers import get_workspace_folder_paths
10+
from utils.workspace_descriptor import _read_json_file
11+
12+
13+
def _collect_workspace_entries(workspace_path: str) -> list[dict]:
14+
"""Scan workspace directory and return entries with workspace.json."""
15+
entries = []
16+
try:
17+
for name in os.listdir(workspace_path):
18+
full = os.path.join(workspace_path, name)
19+
if os.path.isdir(full):
20+
wj = os.path.join(full, "workspace.json")
21+
if os.path.isfile(wj):
22+
entries.append({"name": name, "workspaceJsonPath": wj})
23+
except OSError:
24+
# workspace_path missing / not readable / not a directory — return what
25+
# we have so far. OSError covers FileNotFoundError, PermissionError,
26+
# and NotADirectoryError.
27+
pass
28+
return entries
29+
30+
31+
def _collect_invalid_workspace_ids(workspace_entries: list[dict]) -> set[str]:
32+
"""Workspace IDs whose descriptors have no resolvable folder paths."""
33+
invalid: set[str] = set()
34+
for entry in workspace_entries:
35+
try:
36+
wd = _read_json_file(entry["workspaceJsonPath"])
37+
folders = get_workspace_folder_paths(wd)
38+
if not folders:
39+
invalid.add(entry["name"])
40+
except (OSError, ValueError, KeyError, TypeError):
41+
# OSError: workspace.json unreadable. ValueError covers
42+
# json.JSONDecodeError. KeyError / TypeError: malformed entry
43+
# dict. Any of these mean we can't resolve folders → mark invalid,
44+
# matching the pre-narrowing behaviour.
45+
invalid.add(entry["name"])
46+
return invalid
47+
48+
49+
def _build_composer_id_to_workspace_id(workspace_path: str, workspace_entries: list) -> dict:
50+
"""Build mapping: composerId -> workspaceId from per-workspace state.vscdb."""
51+
mapping: dict = {}
52+
for entry in workspace_entries:
53+
db_path = os.path.join(workspace_path, entry["name"], "state.vscdb")
54+
if not os.path.isfile(db_path):
55+
continue
56+
# closing() guarantees .close() on scope exit (issue #17).
57+
# Path.as_uri() percent-encodes reserved chars; ``f"file:{path}"``
58+
# breaks sqlite URI parsing on paths with spaces, ``#``, etc.
59+
db_uri = Path(db_path).resolve().as_uri() + "?mode=ro"
60+
row: tuple | None = None
61+
try:
62+
with closing(sqlite3.connect(db_uri, uri=True)) as conn:
63+
row = conn.execute(
64+
"SELECT value FROM ItemTable WHERE [key] = 'composer.composerData'"
65+
).fetchone()
66+
except sqlite3.Error:
67+
continue
68+
if not (row and row[0]):
69+
continue
70+
try:
71+
data = json.loads(row[0])
72+
except (json.JSONDecodeError, ValueError):
73+
continue
74+
all_composers = data.get("allComposers") if isinstance(data, dict) else None
75+
if not isinstance(all_composers, list):
76+
continue
77+
for c in all_composers:
78+
if not isinstance(c, dict):
79+
continue
80+
cid = c.get("composerId")
81+
if cid:
82+
mapping[cid] = entry["name"]
83+
return mapping
84+
85+
86+
@contextmanager
87+
def _open_global_db(workspace_path: str):
88+
"""Yield (conn, path) for the global-storage SQLite db (read-only); (None, path) if the file is missing."""
89+
global_db_path = os.path.join(workspace_path, "..", "globalStorage", "state.vscdb")
90+
global_db_path = os.path.normpath(global_db_path)
91+
if not os.path.isfile(global_db_path):
92+
yield None, global_db_path
93+
return
94+
db_uri = Path(global_db_path).resolve().as_uri() + "?mode=ro"
95+
try:
96+
conn = sqlite3.connect(db_uri, uri=True)
97+
except sqlite3.Error:
98+
yield None, global_db_path
99+
return
100+
conn.row_factory = sqlite3.Row
101+
try:
102+
yield conn, global_db_path
103+
finally:
104+
conn.close()

0 commit comments

Comments
 (0)