feat:Hierarchical Knowledge Aggregation for Graphify#264
Open
ljinshuan wants to merge 8 commits intosafishamsi:v4from
Open
feat:Hierarchical Knowledge Aggregation for Graphify#264ljinshuan wants to merge 8 commits intosafishamsi:v4from
ljinshuan wants to merge 8 commits intosafishamsi:v4from
Conversation
Phase 1 - layer-config-foundation: - Add pyyaml as optional dependency (layers extras group) - Create layer_config.py: LayerConfig dataclass, load_layers(), DAG validation (duplicate IDs, unknown parents, cycle detection), topological sort, level computation, LayerRegistry - Add merge_graphs() in build.py with summary: prefix + provenance - Add aggregate.py stub with strategy='none' Phase 2 - aggregation-engine: - Implement topk_filter strategy: top-K nodes by degree, hub exclusion, confidence filtering - Implement community_collapse strategy: community detection, collapsed nodes with _collapsed/_community_id attrs, bridge edge preservation - Implement llm_summary strategy: LLM-powered summarization with structured prompt, JSON parsing, fallback to topk_filter - Implement composite strategy: community_collapse -> llm_summary pipeline - Update aggregate() dispatcher for all 5 strategies Phase 3 - query-routing: - Create query_router.py: QueryRouter with keyword scoring, level-weighted abstraction heuristics, CJK substring matching - Implement auto-zoom: sparse result drill-down to child layers - Add layer_info/drill_down MCP tools in serve.py - Add --layers/--layer/--auto-zoom flags to CLI query command - Multi-layer mode in serve() with QueryRouter integration Phase 4 - cli-polish: - Add graphify layer-info --layers <path> command (table format) - Add graphify layer-tree --layers <path> command (ASCII tree) - Add graphify layer-diff <id1> <id2> --layers <path> command - Add graph_diff() in build.py for structural comparison - Save aggregation provenance as from_<parent>.json - Parallel same-depth layer builds with ProcessPoolExecutor - Auto-detect multi-layer mode in MCP server from output directory - Update CLI help text with all new commands Tests: 109 new tests across 8 test files, all passing Ref: safishamsi#263
- README_TEAM.md: English documentation for hierarchical knowledge aggregation - README_TEAM.zh-CN.md: Chinese version - worked_team/: real data validation with 3 corpora (example, httpx, mixed-corpus) - layers.yaml: 3-layer config (Code → Docs → Overview) - graphify-out/: build output with 5.3x compression from L0 to L2 Ref: safishamsi#263
- Move pyyaml from optional 'layers' extras to core dependencies - Fix typo: 'graphifyy' -> 'graphify' in ImportError message - Update error message for missing pyyaml (now a required dep) - This fixes CI test failures where pyyaml was not installed Ref: safishamsi#263
added 2 commits
April 13, 2026 11:13
…into feature/team_version
- pyproject.toml: keep both pyyaml (ours) and tree-sitter-verilog (theirs) - __main__.py: keep --layers query routing (ours) + upstream's try/except error handling for graph loading - aggregate.py: fix god_nodes() key change (edges → degree) from upstream Ref: safishamsi#263
Contributor
Author
|
👋 Friendly reminder — this PR has been sitting here for a while and we've already resolved merge conflicts multiple times to keep it in sync with At this point, we'd really appreciate a decision on whether this feature aligns with the project's direction: ✅ If this is something you want — a quick review and merge would be amazing. The feature is ready and has been kept up-to-date through multiple conflict resolutions. ❌ If this isn't the right fit — no hard feelings at all! Just let us know and we'll gladly close this PR to keep the backlog clean. Either way, a response would be greatly appreciated so we can stop burning cycles on conflict resolution and move forward. 🙏 Thanks for your time! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Close #265
Hierarchical Knowledge Aggregation for Graphify
1. Problem Statement: Why Hierarchical Knowledge Aggregation?
Graphify originally supported only a single flat knowledge graph — all source files (code, docs, images) are extracted, merged into one
graph.json, and served via MCP as a flat graph.This reveals three core problems in large-scale projects:
Information Overload 🤯: A 500+ file project produces a graph with thousands of nodes. LLM query token budgets are wasted on irrelevant details, and key architectural information is buried.
Lack of Abstraction Levels 🏗️: Microservice architectures are naturally hierarchical (services → domains → system), but flat graphs cannot represent this. Asking "what is the system architecture?" vs. "how does auth function call work?" requires completely different abstraction levels.
Inefficient Queries 🐌: Every query searches the entire graph, unable to leverage hierarchical structure to narrow scope.
Core Insight: Knowledge should be layered like geographic maps — bottom layers are street-level detail, upper layers are city-level overviews. Upper layer graph = own content + lower layer summary, forming a strict layered DAG.