How to read, understand, and query the knowledge graph. This document explains what each node means, what its properties tell you, and how everything connects.
The graph represents a fully-analyzed code repository. When you open Neo4j Browser and look at the data, you'll see a hierarchy:
Knowledge (the repo)
├── RepoSummary (bird's-eye view of the whole repo)
│ └── LevelSummary (one per depth level)
│ ├── LevelBatch (chunks of files/folders at this level)
│ └── FolderNode (directories at this level)
├── FolderNode (root folders)
│ ├── FolderNode (subfolders)
│ └── FileNode (source files)
├── FileNode (flat access to all files)
└── OrgKeyword (semantic tags linking to files)
The root node of a repository. Everything in the graph connects back to this.
| Property | Meaning |
|---|---|
knowledge_id |
UUID — the primary identifier for this repo's graph |
org_id |
Organization that owns this repo |
How to find it: MATCH (k:Knowledge {knowledge_id: "..."}) RETURN k
Represents a single source file that was analyzed. This is the most detailed node — it contains structural analysis (classes, functions, imports), semantic analysis (domain concepts, business entities), and AI-generated summaries.
| Property | Type | What it tells you |
|---|---|---|
node_id |
string | Same as relative_path — unique within a knowledge_id |
relative_path |
string | Path from repo root, e.g. "src/services/auth/login.ts" |
name |
string | Just the filename, e.g. "login.ts" (auto-extracted from path) |
knowledge_id |
string | Which Knowledge this file belongs to |
org_id |
string | Organization identifier |
repo_name |
string | Repository name |
branch_name |
string | Git branch that was analyzed |
language |
string | Programming language: "typescript", "python", "go", etc. |
purpose |
string | One-line AI summary of what the file does |
summary |
string | Detailed multi-section summary (JSON stringified if structured) |
section_map |
string | Pipe-separated table of contents: "L1-25: Imports | L26-80: AuthService" |
is_big_file |
string | "isABigFile" or "notABigFile" — whether the file was too large and had to be split into chunks for analysis |
classes |
string[] | Class names defined in this file, e.g. ["AuthService", "SessionManager"] |
functions |
string[] | Function names defined in this file, e.g. ["login", "logout"] |
imports_internal |
string[] | Imports from within the repo: ["./config", "../utils/crypto"] |
imports_external |
string[] | Third-party imports: ["express", "jsonwebtoken"] |
contracts_provided |
string[] | APIs/interfaces this file exposes to other files |
contracts_consumed |
string[] | APIs/interfaces this file depends on from other files |
ontology_concepts |
string[] | Domain concepts: ["authentication", "session_management"] |
business_entities |
string[] | Business objects: ["user", "session", "token"] |
system_capabilities |
string[] | What the system can do: ["user_login", "token_validate"] |
keywords |
string[] | General tags: ["auth", "jwt", "middleware"] |
side_effects |
string[] | Runtime effects: ["writes_session_store", "sends_auth_event"] |
config_dependencies |
string[] | Config the file needs: ["JWT_SECRET", "SESSION_TTL"] |
data_flow_direction |
string | How data moves through this file: "Receives credentials, produces JWT tokens" |
Unique key: (node_id, knowledge_id)
Represents a directory in the repository. Contains aggregated metadata from all files inside it.
| Property | Type | What it tells you |
|---|---|---|
node_id |
string | Unique ID: "{orgId}_{repoName}_L{level}_{path_underscored}" |
relative_path |
string | Directory path from repo root: "src/services/auth" |
name |
string | Display name: "L2_folder_auth" (level prefix + folder name) |
knowledge_id |
string | Which Knowledge this belongs to |
org_id |
string | Organization identifier |
repo_name |
string | Repository name |
commit_hash |
string | Git commit hash |
level |
number | Depth in hierarchy (bottom-up: 1 = deepest leaf folders, max = root) |
purpose |
string | AI-generated summary of what this folder is for |
summary |
string | Detailed summary (JSON stringified) |
classes |
string[] | All classes found in files under this folder |
functions |
string[] | All functions found in files under this folder |
imports_internal |
string[] | All internal imports from files under this folder |
imports_external |
string[] | All external imports from files under this folder |
dependency_graph |
string | Mermaid diagram showing how children depend on each other |
subFileIds |
string[] | Relative paths of all files in this folder |
subFolderIds |
string[] | Relative paths of all subfolders |
Unique key: (node_id, knowledge_id)
Level numbering is bottom-up:
- Level 1 = deepest leaf folders (like
src/services/auth/helpers/) - Level 2 = one level above that
- Max level = root-level folders (like
src/,config/,docs/)
The bird's-eye view of the entire repository. One per branch.
| Property | Type | What it tells you |
|---|---|---|
knowledge_id |
string | Which Knowledge this belongs to |
org_id |
string | Organization identifier |
repo_name |
string | Repository name |
user_name |
string | GitHub owner |
branch_name |
string | Which branch this summary describes |
commit_hash |
string | Which commit was analyzed |
architecture |
string | High-level architecture description |
data_flow |
string | How data flows through the system |
key_patterns |
string[] | Design patterns used: ["Factory", "Builder", "Strategy"] |
major_subsystems |
string | JSON-encoded array of {name, responsibility} objects |
max_depth |
number | Deepest nesting level |
tree |
string | Full tree command output of the repo structure |
Unique key: (knowledge_id, org_id, branch_name) — supports multi-branch repos
A semantic overview of one depth level in the hierarchy. Answers "what kind of code lives at this depth?"
| Property | Type | What it tells you |
|---|---|---|
knowledge_id |
string | Which Knowledge this belongs to |
org_id |
string | Organization |
repo_name |
string | Repository |
user_name |
string | GitHub owner |
branch_name |
string | Git branch |
commit_hash |
string | Git commit |
level |
number | Which depth level (1-based, same as FolderNode) |
name |
string | Display name: "Level 3 Summary" |
summary |
string | AI summary describing what this level contains |
dependency_graph |
string | Concatenated Mermaid diagrams from all batches |
tree |
string | Tree-formatted view of all items at this level |
Unique key: (knowledge_id, org_id, branch_name, level)
When a level has too many files/folders to summarize at once, it's split into token-limited batches. Each LevelBatch is one such chunk.
| Property | Type | What it tells you |
|---|---|---|
node_id |
string | ID format: "{orgId}_{repoName}_L{level}_{itemType}_batch_{batchNumber}" |
knowledge_id |
string | Which Knowledge |
org_id |
string | Organization |
repo_name |
string | Repository |
branch_name |
string | Git branch |
level |
number | Which level this batch belongs to |
batch_number |
number | Batch number (1-based) |
total_batches |
number | Total batches at this level for this item type |
item_type |
string | "files" or "folders" — what this batch contains |
sub_file_ids |
string[] | Relative paths of files in this batch |
sub_folder_ids |
string[] | Relative paths of folders in this batch |
purpose |
string | AI summary of what the items in this batch do |
summary |
string | Detailed summary (JSON stringified) |
classes |
string[] | Important classes found in this batch |
functions |
string[] | Important functions found in this batch |
imports_internal |
string[] | Internal imports used by items in this batch |
imports_external |
string[] | External packages used by items in this batch |
dependency_graph |
string | Mermaid diagram of cross-item dependencies |
Unique key: (node_id, knowledge_id)
A semantic keyword extracted from files via LLM analysis. OrgKeywords are scoped to an organization and represent domain concepts, business terms, capabilities, etc.
| Property | Type | What it tells you |
|---|---|---|
keyword |
string | The keyword itself: "authentication", "jwt", "user_login" |
semantic_type |
string | Category (see table below) |
org_id |
string | Organization scope |
total_frequency |
number | How many times this keyword appears across all files |
Unique key: (keyword, semantic_type, org_id)
Semantic types — each keyword belongs to one category:
| semantic_type | What it represents | Example keywords |
|---|---|---|
keywords |
General tags | "auth", "middleware", "caching" |
ontology_concepts |
Domain concepts | "authentication", "payment_processing" |
business_entities |
Business objects | "user", "order", "invoice" |
system_capabilities |
What the system can do | "user_login", "order_processing" |
side_effects |
Runtime effects | "writes_to_disk", "sends_email" |
config_dependencies |
Config keys needed | "DATABASE_URL", "JWT_SECRET" |
data_flow_direction |
Data movement patterns | "receives_http_produces_events" |
contracts_provided |
APIs/interfaces exposed | "authservice", "userapi" |
contracts_consumed |
APIs/interfaces consumed | "databaseclient", "rediscache" |
These are small, reusable nodes that appear in multiple contexts:
| Property | Meaning |
|---|---|
name |
Class name: "AuthService", "Deployment" |
knowledge_id |
Which Knowledge |
org_id |
Organization |
repo_name |
Repository |
description |
What the class does (extracted from "ClassName: description" format) |
Unique key: (name, org_id, repo_name) from FileNode context; (name, knowledge_id, org_id) from LevelSummary context
| Property | Meaning |
|---|---|
name |
Function name: "login", "createDeployment" |
knowledge_id |
Which Knowledge |
org_id |
Organization |
repo_name |
Repository |
description |
What the function does |
Unique key: Same pattern as Class
| Property | Meaning |
|---|---|
path |
Import path: "express", "./config" |
type |
"internal" or "external" |
org_id |
Organization |
repo_name |
Repository |
knowledge_id |
Which Knowledge |
| Property | Meaning |
|---|---|
path |
Import path |
knowledge_id |
Which Knowledge |
org_id |
Organization |
Knowledge
│
├──[:HAS_FILE]──────────→ FileNode (flat access to any file)
├──[:HAS_FOLDER]─────────→ FolderNode (flat access to any folder)
├──[:HAS_ROOT_FOLDER]────→ FolderNode (root-level folders only, max level)
├──[:HAS_REPO_SUMMARY]──→ RepoSummary (one per branch)
└──[:HAS_LEVEL_BATCH]───→ LevelBatch (flat access to any batch)
FolderNode (parent)
├──[:CONTAINS_FOLDER]──→ FolderNode (child) parent directory → subdirectory
└──[:CONTAINS_FILE]────→ FileNode directory → file inside it
The folder hierarchy is computed from relative_path — a FolderNode at "src/services" contains FolderNode "src/services/auth" and FileNode "src/services/index.ts".
RepoSummary
└──[:HAS_LEVEL_SUMMARY]──→ LevelSummary (one per depth level)
│
├──[:HAS_FOLDER]──────────→ FolderNode (folders at this level)
├──[:INCLUDES_FILE_AT_LEVEL]→ FileNode (files at this level)
├──[:HAS_FILE_BATCH]──────→ LevelBatch (item_type='files')
├──[:HAS_FOLDER_BATCH]────→ LevelBatch (item_type='folders')
├──[:HAS_CLASS]───────────→ Class
├──[:HAS_FUNCTION]────────→ Function
├──[:HAS_INTERNAL_IMPORT]─→ ImportInternal
└──[:HAS_EXTERNAL_IMPORT]─→ ImportExternal
LevelBatch
├──[:INCLUDES_FILE]────────→ FileNode (files in this batch)
├──[:INCLUDES_FOLDER]──────→ FolderNode (folders in this batch)
├──[:HAS_CLASS]────────────→ Class
├──[:HAS_FUNCTION]─────────→ Function
├──[:HAS_INTERNAL_IMPORT]──→ ImportInternal
└──[:HAS_EXTERNAL_IMPORT]──→ ImportExternal
FileNode
├──[:DEFINES_CLASS]────────→ Class classes defined in this file
├──[:DEFINES_FUNCTION]─────→ Function functions defined in this file
├──[:IMPORTS_INTERNAL]─────→ Import (type:"internal") internal imports
└──[:IMPORTS_EXTERNAL]─────→ Import (type:"external") third-party imports
OrgKeyword ──[:APPEARS_IN_FILE]──→ FileNode
This is the only relationship with properties:
| Property | Type | Meaning |
|---|---|---|
frequency |
number | How many times this keyword appears in this specific file |
created_at |
datetime | When this link was first created |
updated_at |
datetime | When last updated |
MATCH (k:Knowledge {knowledge_id: $id})-[:HAS_REPO_SUMMARY]->(rs:RepoSummary)
RETURN rs.architecture, rs.data_flow, rs.key_patterns, rs.total_files, rs.total_classesMATCH (k:Knowledge {knowledge_id: $id})-[:HAS_ROOT_FOLDER]->(root:FolderNode)
RETURN root.relative_path, root.purpose, root.total_file_count
ORDER BY root.total_file_count DESCMATCH (parent:FolderNode {knowledge_id: $id, relative_path: "src"})
-[:CONTAINS_FOLDER*1..3]->(child:FolderNode)
RETURN child.relative_path, child.level, child.purpose, child.direct_file_countMATCH (folder:FolderNode {knowledge_id: $id, relative_path: "src/services/auth"})
-[:CONTAINS_FILE]->(f:FileNode)
RETURN f.relative_path, f.purpose, f.languageMATCH (f:FileNode {knowledge_id: $id, relative_path: "src/services/auth/login.ts"})
RETURN f.purpose, f.summary, f.classes, f.functions,
f.ontology_concepts, f.business_entities, f.contracts_providedMATCH (kw:OrgKeyword {keyword: "authentication", org_id: $orgId})
-[r:APPEARS_IN_FILE]->(f:FileNode)
RETURN f.relative_path, f.purpose, r.frequency
ORDER BY r.frequency DESCMATCH (kw:OrgKeyword {semantic_type: "ontology_concepts", org_id: $orgId})
RETURN kw.keyword, kw.total_frequency, kw.file_count
ORDER BY kw.total_frequency DESC LIMIT 20MATCH (kw:OrgKeyword {semantic_type: "config_dependencies", org_id: $orgId})
RETURN kw.keyword, kw.file_count
ORDER BY kw.file_count DESCMATCH (f:FileNode {knowledge_id: $id})
WHERE ANY(se IN f.side_effects WHERE se CONTAINS "writes" OR se CONTAINS "sends")
RETURN f.relative_path, f.side_effects, f.purposeMATCH (f:FileNode {knowledge_id: $id, is_big_file: "isABigFile"})
RETURN f.relative_pathMATCH (ls:LevelSummary {knowledge_id: $id, level: 3})
RETURN ls.summary, ls.folderCount, ls.fileCount, ls.totalTokenCountMATCH (ls:LevelSummary {knowledge_id: $id, level: 2})
-[:HAS_FILE_BATCH]->(batch:LevelBatch)
RETURN batch.batch_number, batch.item_count, batch.purpose, batch.classes
ORDER BY batch.batch_numberMATCH (f:FileNode)-[:DEFINES_CLASS]->(c:Class {name: "AuthService"})
RETURN f.relative_path, f.purposeMATCH (f:FileNode {knowledge_id: $id})-[:IMPORTS_EXTERNAL]->(i:Import)
RETURN i.path, count(f) AS used_by_files
ORDER BY used_by_files DESC LIMIT 20For MCP clients:
graph_searchis your primary search tool. One call fires 7 parallel channel queries and returns paginated results per channel. Use the decision tree below to reach the target file in 1–2 calls.
A single call returns 7 independent channels, each with its own { data, pagination }:
| Channel | What it searches | Best for |
|---|---|---|
| purpose | FileNode.purpose + ontology_concepts |
Natural-language questions ("where is auth handled?") |
| classes | FileNode.classes[] |
Finding a known class (AuthService, UserModel) |
| functions | FileNode.functions[] |
Finding a known function (login, parseConfig) |
| imports | FileNode.imports_internal[] + imports_external[] |
Finding who uses a library or internal module |
| keywords | OrgKeyword nodes linked via APPEARS_IN_FILE |
Domain/concept search (jwt, caching, webhook) |
| paths | FileNode.relative_path + FolderNode.relative_path |
Partial path match (auth, middleware, utils) |
| glob | Regex match on relative_path (only when glob is set) |
File pattern matching (**/*.test.ts, src/api/**) |
What do you know about the target file?
│
├─ Exact or partial path/filename?
│ → query: the path fragment, read `paths` channel first
│ → or set `glob` param (e.g. "**/*.controller.ts") and read `glob` channel
│
├─ Class or function name?
│ → query: the name, read `classes` or `functions` channel first
│
├─ Library or import it uses?
│ → query: the import name (e.g. "express", "./config"), read `imports` channel
│
├─ Domain concept or business term?
│ → query: the concept (e.g. "authentication session"), read `purpose` + `keywords` channels
│
└─ Vague / exploratory?
→ query: best guess terms, scan ALL channels — the channel with results tells you which dimension matched
- Use specific, lowercase terms. The tool splits your query on whitespace and matches each term with
CONTAINS(case-insensitive)."auth login"searches for files where properties contain"auth"AND/OR"login". - Multi-word queries cast a wider net. Each term is matched independently — more terms = more potential hits but also more noise. Use 1–3 precise terms.
- Narrow with filters before paginating. Use optional params to reduce result sets:
knowledgeId— scope to a single repopath— prefix filter (e.g."src/services"only searches within that subtree)fileRole— filter by role if you know itglob— structural pattern matching (activates the glob channel)
- Read the highest-scoring channels first. Channels with
totalItems > 0are the relevant ones. Ifclassesreturns 2 hits andpurposereturns 40, the class match is likely more precise.
Call 1: graph_search — returns path + purpose per hit across all channels.
- If you see the file you need → done. You have its
pathandpurpose. - If multiple candidates → read the
purposefield to disambiguate. - If too many results → re-call with a narrower
pathprefix or more specific terms.
Call 2 (if needed): getFileDetails — pass the path from the search result to get full metadata: summary, section_map, classes, functions, imports, contracts, ontology_concepts, etc.
| Query type | Read first | Then check |
|---|---|---|
| "Find the UserService class" | classes |
purpose |
| "Where is login handled?" | purpose |
functions, keywords |
| "Files using express" | imports |
paths |
| "All test files" | glob (**/*.test.*) |
paths |
| "Authentication system" | keywords |
purpose |
| "src/services/auth" | paths |
— |
| "Payment processing" | purpose |
keywords |
Each channel is independently paginated. Default: page=1, pageSize=20.
pagination.totalItems— total matches for that channelpagination.hasNextPage— whether more results exist- Increment
pageto fetch the next batch (same query, same filters)
| Goal | Tool sequence |
|---|---|
| Find a file by concept | graph_search → done (path + purpose in results) |
| Get full file analysis | graph_search → getFileDetails(path) |
| Browse folder structure | getRepoOverview → listFolders → listFiles |
| Understand repo architecture | getRepoOverview (reads RepoSummary) |
| Cross-repo search | graph_search without knowledgeId (org-scoped) |
Levels count bottom-up from the deepest folders:
| Level | Meaning | Example for a 9-level repo |
|---|---|---|
| 1 | Deepest leaf folders | src/services/auth/helpers/ |
| 2 | One above deepest | src/services/auth/ |
| ... | ... | ... |
| 9 (max) | Root-level folders | src/, config/, docs/ |
Root folders (the ones connected via [:HAS_ROOT_FOLDER]) always have the maximum level number for that repo.
| Relationship | From | To | Properties | Meaning |
|---|---|---|---|---|
HAS_FILE |
Knowledge | FileNode | — | Repo contains this file |
HAS_FOLDER |
Knowledge | FolderNode | — | Repo contains this folder |
HAS_ROOT_FOLDER |
Knowledge | FolderNode | — | Top-level directory |
HAS_REPO_SUMMARY |
Knowledge | RepoSummary | — | Repo overview (per branch) |
HAS_LEVEL_BATCH |
Knowledge | LevelBatch | — | Flat access to batch |
HAS_LEVEL_SUMMARY |
RepoSummary | LevelSummary | — | Summary for one depth level |
HAS_FOLDER |
LevelSummary | FolderNode | — | Folders at this level |
INCLUDES_FILE_AT_LEVEL |
LevelSummary | FileNode | — | Files at this level |
HAS_FILE_BATCH |
LevelSummary | LevelBatch | — | File batch at this level |
HAS_FOLDER_BATCH |
LevelSummary | LevelBatch | — | Folder batch at this level |
CONTAINS_FOLDER |
FolderNode | FolderNode | — | Parent → child directory |
CONTAINS_FILE |
FolderNode | FileNode | — | Directory → file inside it |
DEFINES_CLASS |
FileNode | Class | — | File defines this class |
DEFINES_FUNCTION |
FileNode | Function | — | File defines this function |
IMPORTS_INTERNAL |
FileNode | Import | — | File imports from within repo |
IMPORTS_EXTERNAL |
FileNode | Import | — | File imports a third-party package |
INCLUDES_FILE |
LevelBatch | FileNode | — | Batch contains this file |
INCLUDES_FOLDER |
LevelBatch | FolderNode | — | Batch contains this folder |
HAS_CLASS |
LevelSummary/LevelBatch | Class | — | Level/batch references this class |
HAS_FUNCTION |
LevelSummary/LevelBatch | Function | — | Level/batch references this function |
HAS_INTERNAL_IMPORT |
LevelSummary/LevelBatch | ImportInternal | — | Level/batch uses this internal import |
HAS_EXTERNAL_IMPORT |
LevelSummary/LevelBatch | ImportExternal | — | Level/batch uses this external import |
APPEARS_IN_FILE |
OrgKeyword | FileNode | frequency, created_at, updated_at |
Keyword found in this file N times |
When an MCP client (typically an LLM agent) explores a repo today, the typical flow looks like this:
list_knowledge → graph_search → graph_search → graph_search →
getFileDetails → getFileDetails → graph_search (restart) → getFileDetails → ...
That's 8-12 tool calls to reach the right files. The agent searches by keywords one at a time, fetches files one by one, sometimes restarts from scratch because earlier results weren't relevant enough. Each graph_search returns results scored only by local text matching (CONTAINS on keywords, class names, etc.) — there's no notion of which files are structurally important in the codebase. So the agent has to do multiple exploratory searches to figure out what matters.
PageRank pre-computes a global importance score for every FileNode based on how connected it is in the graph. Files that are heavily imported, referenced by many folders, or linked to many semantic keywords naturally rank higher.
This means when a graph_search returns 40 results, the agent can immediately see which 3-5 files are the structural hubs — and fetch those first. These hub files typically reference or import the remaining files the agent would have searched for anyway, so the information cascades from a single fetch.
The result: 2-4 tool calls instead of 8-12.
getRepoHubs → graph_search (results pre-sorted by importance) →
getFileDetails (on top 2-3 hits — the actual hubs)
The edges that matter for file importance:
| Edge Type | Signal |
|---|---|
IMPORTS_INTERNAL |
FileA imports FileB → strongest signal (direct code dependency) |
APPEARS_IN_FILE |
OrgKeyword → FileNode (high-frequency keywords boost a file) |
CONTAINS_FILE |
FolderNode → FileNode (files in root-level folders get slight boost) |
DEFINES_CLASS |
Reverse signal — files defining widely-used classes are important |
DEFINES_FUNCTION |
Reverse signal — same logic for widely-called functions |
The key insight: IMPORTS_INTERNAL already forms a directed graph between files. A file imported by many other files is a hub — exactly what PageRank was designed to find.
Run as a batch job whenever a repo is analyzed (or re-analyzed). Using Neo4j's Graph Data Science library:
-- Project the file-imports-file graph
CALL gds.graph.project(
'file-import-graph',
'FileNode',
{
IMPORTS_INTERNAL: {
type: 'IMPORTS_INTERNAL',
orientation: 'REVERSE' -- files that ARE imported rank high
}
},
{
nodeProperties: ['knowledge_id'],
relationshipProperties: {}
}
)-- Run PageRank
CALL gds.pageRank.write('file-import-graph', {
maxIterations: 20,
dampingFactor: 0.85,
writeProperty: 'pagerank'
})
YIELD nodePropertiesWritten, ranIterationsAfter this, every FileNode has a pagerank property — a float between 0 and 1.
PageRank on imports alone misses files that are conceptually central but don't have many direct importers (config files, type definitions, orchestration files). Add a second score based on keyword connectivity:
MATCH (f:FileNode {knowledge_id: $kid})
OPTIONAL MATCH (kw:OrgKeyword)-[r:APPEARS_IN_FILE]->(f)
WITH f,
sum(r.frequency) AS keyword_weight,
count(DISTINCT kw) AS keyword_diversity
SET f.semantic_centrality = keyword_diversity * 0.7 + keyword_weight * 0.3Then combine into a single composite score:
MATCH (f:FileNode {knowledge_id: $kid})
SET f.importance = 0.6 * f.pagerank + 0.4 * (f.semantic_centrality / max_semantic)The 0.6 / 0.4 weighting means structural connectivity (imports) matters more than semantic tagging, but both contribute.
Every channel now uses the composite importance score as a sort tiebreaker:
-- Before (current behavior):
MATCH (f:FileNode) WHERE f.purpose CONTAINS $term
RETURN f.relative_path, f.purpose
LIMIT 20
-- After:
MATCH (f:FileNode) WHERE f.purpose CONTAINS $term
RETURN f.relative_path, f.purpose, f.importance, f.pagerank
ORDER BY f.importance DESC
LIMIT 20Now the first 3-5 results in any channel are almost always the files the agent actually needs.
A new MCP tool that returns the top-N files by PageRank for a given knowledge_id:
MATCH (f:FileNode {knowledge_id: $kid})
RETURN f.relative_path, f.purpose, f.pagerank,
f.classes, f.functions, f.imports_internal
ORDER BY f.pagerank DESC
LIMIT 10This gives the agent a "table of contents" of the most structurally important files in one call. From any of these hub files, imports_internal already tells you what other files are reachable — so the agent can plan which files to fetch without additional searches.
For multi-repo scenarios, extend this with personalized PageRank. If OrgKeyword nodes are shared across repos (same org_id), they act as bridges:
FileA (repo1) <--APPEARS_IN_FILE-- OrgKeyword("authentication") --APPEARS_IN_FILE--> FileB (repo2)
Project a bipartite graph of FileNode and OrgKeyword across all repos in the org, then run PageRank on that. Files that share many high-frequency keywords with files in other repos become cross-repo hubs.
| Property | Type | What it tells you |
|---|---|---|
pagerank |
float | Import-graph PageRank score (0-1). Higher = more imported. |
semantic_centrality |
float | Keyword connectivity score. Higher = more concepts linked. |
importance |
float | Composite score (0.6 * pagerank + 0.4 * semantic_centrality) |
Starting a new exploration?
│
├─ Want the structural backbone of a repo?
│ → getRepoHubs → read top 5-10 hub files' imports_internal
│ → You now know the critical files AND what they connect to (1 call)
│
├─ Searching for a concept/feature?
│ → graph_search (results now sorted by importance)
│ → Top 3 results are hub files that match your query (1 call)
│ → getFileDetails on the top hit if you need full metadata (2 calls total)
│
└─ Exploring across multiple repos?
→ graph_search without knowledgeId (cross-repo, importance-sorted)
→ Cross-repo hub files surface first thanks to shared OrgKeyword PageRank
| Metric | Before PageRank | After PageRank |
|---|---|---|
| Tool calls to find key files | 8-12 | 2-4 |
| Agent restarts / dead ends | Frequent | Rare |
| Cross-repo discovery | Manual | Automatic |
| Result quality (first page) | Noisy | Hub-first |