feat:Hierarchical Knowledge Aggregation for Graphify by ljinshuan · Pull Request #264 · safishamsi/graphify

ljinshuan · 2026-04-12T15:46:51Z

Close #265

Hierarchical Knowledge Aggregation for Graphify

Issue Origin: Single flat knowledge graph cannot represent hierarchical project structure. Large-scale codebases need layered knowledge with bottom-up aggregation and intelligent query routing.

1. Problem Statement: Why Hierarchical Knowledge Aggregation?

Graphify originally supported only a single flat knowledge graph — all source files (code, docs, images) are extracted, merged into one graph.json, and served via MCP as a flat graph.

This reveals three core problems in large-scale projects:

Information Overload 🤯: A 500+ file project produces a graph with thousands of nodes. LLM query token budgets are wasted on irrelevant details, and key architectural information is buried.
Lack of Abstraction Levels 🏗️: Microservice architectures are naturally hierarchical (services → domains → system), but flat graphs cannot represent this. Asking "what is the system architecture?" vs. "how does auth function call work?" requires completely different abstraction levels.
Inefficient Queries 🐌: Every query searches the entire graph, unable to leverage hierarchical structure to narrow scope.

Core Insight: Knowledge should be layered like geographic maps — bottom layers are street-level detail, upper layers are city-level overviews. Upper layer graph = own content + lower layer summary, forming a strict layered DAG.

Phase 1 - layer-config-foundation: - Add pyyaml as optional dependency (layers extras group) - Create layer_config.py: LayerConfig dataclass, load_layers(), DAG validation (duplicate IDs, unknown parents, cycle detection), topological sort, level computation, LayerRegistry - Add merge_graphs() in build.py with summary: prefix + provenance - Add aggregate.py stub with strategy='none' Phase 2 - aggregation-engine: - Implement topk_filter strategy: top-K nodes by degree, hub exclusion, confidence filtering - Implement community_collapse strategy: community detection, collapsed nodes with _collapsed/_community_id attrs, bridge edge preservation - Implement llm_summary strategy: LLM-powered summarization with structured prompt, JSON parsing, fallback to topk_filter - Implement composite strategy: community_collapse -> llm_summary pipeline - Update aggregate() dispatcher for all 5 strategies Phase 3 - query-routing: - Create query_router.py: QueryRouter with keyword scoring, level-weighted abstraction heuristics, CJK substring matching - Implement auto-zoom: sparse result drill-down to child layers - Add layer_info/drill_down MCP tools in serve.py - Add --layers/--layer/--auto-zoom flags to CLI query command - Multi-layer mode in serve() with QueryRouter integration Phase 4 - cli-polish: - Add graphify layer-info --layers <path> command (table format) - Add graphify layer-tree --layers <path> command (ASCII tree) - Add graphify layer-diff <id1> <id2> --layers <path> command - Add graph_diff() in build.py for structural comparison - Save aggregation provenance as from_<parent>.json - Parallel same-depth layer builds with ProcessPoolExecutor - Auto-detect multi-layer mode in MCP server from output directory - Update CLI help text with all new commands Tests: 109 new tests across 8 test files, all passing Ref: safishamsi#263

- README_TEAM.md: English documentation for hierarchical knowledge aggregation - README_TEAM.zh-CN.md: Chinese version - worked_team/: real data validation with 3 corpora (example, httpx, mixed-corpus) - layers.yaml: 3-layer config (Code → Docs → Overview) - graphify-out/: build output with 5.3x compression from L0 to L2 Ref: safishamsi#263

- Move pyyaml from optional 'layers' extras to core dependencies - Fix typo: 'graphifyy' -> 'graphify' in ImportError message - Update error message for missing pyyaml (now a required dep) - This fixes CI test failures where pyyaml was not installed Ref: safishamsi#263

…into feature/team_version

- pyproject.toml: keep both pyyaml (ours) and tree-sitter-verilog (theirs) - __main__.py: keep --layers query routing (ours) + upstream's try/except error handling for graph loading - aggregate.py: fix god_nodes() key change (edges → degree) from upstream Ref: safishamsi#263

ljinshuan · 2026-04-16T15:08:25Z

👋 Friendly reminder — this PR has been sitting here for a while and we've already resolved merge conflicts multiple times to keep it in sync with upstream/v4.

At this point, we'd really appreciate a decision on whether this feature aligns with the project's direction:

✅ If this is something you want — a quick review and merge would be amazing. The feature is ready and has been kept up-to-date through multiple conflict resolutions.

❌ If this isn't the right fit — no hard feelings at all! Just let us know and we'll gladly close this PR to keep the backlog clean.

Either way, a response would be greatly appreciated so we can stop burning cycles on conflict resolution and move forward. 🙏

Thanks for your time!

#265

lijinshuan and others added 5 commits April 12, 2026 23:14

close safishamsi#265

204f946

Merge branch 'v4' into feature/team_version

a365732

ljinshuan changed the title ~~Feature/team version~~ feat:Hierarchical Knowledge Aggregation for Graphify Apr 12, 2026

lijinshuan added 2 commits April 13, 2026 11:13

rm worked_team dir

95bfca0

Merge branch 'feature/team_version' of github.com:ljinshuan/graphify …

86d1b36

…into feature/team_version

This was referenced Apr 13, 2026

Add hierarchical clustering: L0 topics → L1 communities → L2 nodes #297

Open

feat: Hierarchical Knowledge Aggregation — layered DAG build, multi-strategy summarization, and intelligent query routing #265

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat:Hierarchical Knowledge Aggregation for Graphify#264

feat:Hierarchical Knowledge Aggregation for Graphify#264
ljinshuan wants to merge 8 commits intosafishamsi:v4from
ljinshuan:feature/team_version

ljinshuan commented Apr 12, 2026 •

edited

Loading

Uh oh!

ljinshuan commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ljinshuan commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hierarchical Knowledge Aggregation for Graphify

1. Problem Statement: Why Hierarchical Knowledge Aggregation?

Uh oh!

ljinshuan commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ljinshuan commented Apr 12, 2026 •

edited

Loading