Version: codebase-memory-mcp v0.8.1 (Windows, compiled .exe)
Project: real-world repo, 93,315 nodes, 996 Route nodes
Summary
Route extraction classifies many non-route strings as routes. On our repo, 46 of 996 routes are clearly junk, and the empty-source bucket mixes real HTTP endpoints with regex/path string literals.
Repro
search_graph(label="Route") then group by source:
source |
count |
what they actually are |
decorator |
41 |
✅ real Flask/Django route decorators (correct) |
"" (empty) |
909 |
real API paths from frontend HTTP calls + regex/path literals |
graphql |
34 |
❌ entire GraphQL query string literals (query { categoryList {...} }) turned into one Route node each |
infra |
12 |
❌ the entire text of .codex/agents/*.toml files turned into a single Route node |
Empty-source junk samples (method=ANY, degree 0):
/.{2}/g, /<table/i (JS regex literals), /_VBA_PROJECT_CUR/VBA/dir, /::/, /.well-known/openid-configuration.
Minimal example
// a JS regex literal — NOT a route
const re = /<table/i;
# a query string passed to a client — NOT a route
query { products(search: "x") { items { sku } } }
Both currently produce Route nodes.
Expected
- Route extraction should require an actual route declaration (decorator/router registration/framework binding), not "string contains slashes" or "file looks like config".
- GraphQL query literals should not be
Route (a GraphQLOperation label would be fine).
- Config/agent files (
.codex/*.toml, *.md) should not have their whole contents emitted as a single Route.
- At minimum: don't classify regex literals (
/…/[gimsuy]) as routes.
Workaround (consumer side)
Filter WHERE r.source='decorator' OR (r.method IS NOT NULL AND r.method <> 'ANY') → 996 → 347, junk removed. But this loses real decorator-less routes inconsistently; a clean extractor is preferable.
Related
Distinct from the exclude-config issues (#500/#510) — this is mis-classification of content, not indexing of build artifacts.
Version: codebase-memory-mcp v0.8.1 (Windows, compiled .exe)
Project: real-world repo, 93,315 nodes, 996
RoutenodesSummary
Routeextraction classifies many non-route strings as routes. On our repo, 46 of 996 routes are clearly junk, and the empty-sourcebucket mixes real HTTP endpoints with regex/path string literals.Repro
search_graph(label="Route")then group bysource:sourcedecorator""(empty)graphqlquery { categoryList {...} }) turned into one Route node eachinfra.codex/agents/*.tomlfiles turned into a single Route nodeEmpty-source junk samples (
method=ANY, degree 0):/.{2}/g,/<table/i(JS regex literals),/_VBA_PROJECT_CUR/VBA/dir,/::/,/.well-known/openid-configuration.Minimal example
Both currently produce
Routenodes.Expected
Route(aGraphQLOperationlabel would be fine)..codex/*.toml,*.md) should not have their whole contents emitted as a single Route./…/[gimsuy]) as routes.Workaround (consumer side)
Filter
WHERE r.source='decorator' OR (r.method IS NOT NULL AND r.method <> 'ANY')→ 996 → 347, junk removed. But this loses real decorator-less routes inconsistently; a clean extractor is preferable.Related
Distinct from the exclude-config issues (#500/#510) — this is mis-classification of content, not indexing of build artifacts.