Skip to content

feat(codebase): wiki-engine + code knowledge extraction pipeline#54

Open
m0Nst3r873 wants to merge 4 commits into
Tencent:mainfrom
m0Nst3r873:feat/codebase-extract
Open

feat(codebase): wiki-engine + code knowledge extraction pipeline#54
m0Nst3r873 wants to merge 4 commits into
Tencent:mainfrom
m0Nst3r873:feat/codebase-extract

Conversation

@m0Nst3r873

Copy link
Copy Markdown
Contributor

Summary

Deterministic code extraction pipeline + knowledge graph building, with CLI integration.

Part 1 of 3 — each PR is functionally complete, no dead code.

  • Vendor wiki-engine modules: interface scanner (HTTP/MQ/RPC, 5 languages), dependency path tracer, code-graph overlay, doc-graph extractor
  • codebase-extract.ts: full pipeline (collect → extract facts → build graph → write evidence pages + overview.md)
  • CLI: teamai codebase --extract [path] with --incremental, --project, --max-files
  • Output: evidence pages, overview.md, graph-index.json, router.md, index.md, gaps/detected.md

Quality fixes included

Fix Description
B1 Unified graph-index path to .indices/ (was .teamwiki/.indices/)
B2 Fixed router.md wiki-links (evidence/code/ prefix)
B3 Added teamwiki to safeIgnore (prevents re-scanning output)
B9 Unified graph schema to GraphIndex (removed incompatible CodeGraphIndex)
B13 Filter third-party npm imports from relation facts
B15 Key file priority sorting in collection
B16 Generate deterministic overview.md without AI
B17 Rename call-chains → dependency-paths (static deps, not runtime calls)
B18 Python extractor: only service-pattern functions as components
B19 Facts deduplication by kind:name
B21 Restrict config extraction to SCREAMING_SNAKE_CASE
B22 API path detection no longer requires /v\d*/ prefix

Test plan

  • npx tsc --noEmit — zero errors
  • npx vitest run — 1515 tests passed
  • teamai codebase --extract . produces teamwiki/evidence/code/<project>/

Dependency chain

PR 1 (this) → PR 2 (AI enrichment) → PR 3 (deep-enrich + recall)

Replaces #50. Addresses review feedback: all code is now reachable via CLI.

jaelgeng and others added 4 commits June 26, 2026 19:31
Vendored from team-wiki by @lurkacai (git.woa.com/lurkacai/team-wiki).
Import paths adjusted for teamai-cli project structure.

Files copied (all pure deterministic, no AI dependency):
- core/graph-index.schema.ts: graph node/edge types, merge, save/load
- core/wiki-protocol.ts: wiki category/confidence types, slugify
- code-knowledge/code-collector.ts: file collection with git-aware filtering
- code-knowledge/code-extractors.ts: multi-language fact extraction dispatch
- code-knowledge/code-graph.ts: build CodeGraphIndex from facts
- code-knowledge/code-incremental.ts: detect changed files via manifest
- code-knowledge/extractors/*: TS/Python/Go/Java/Rust/Config extractors
- interface-scanner.ts: HTTP/MQ/RPC endpoint detection (5 languages)
- call-chain-tracer.ts: 4-layer call chain tracing
- code-graph-overlay.ts: directory-level architecture nodes
- doc-graph-extractor.ts: extract API/config/error nodes from docs
- manifest-schema.ts: V2 manifest types (entrypoints, responsibilities)
Wire up vendored modules into the teamai extraction flow:

- adapters/index.ts: unified export layer for all wiki-engine modules
- adapters/templates.ts: router.md + index.md generation templates
- codebase-extract.ts: full extraction pipeline
  collectCode → extractCodeFacts → scanInterfaces → traceCallChains
  → buildEvidencePages (interfaces.md + call-chains.md)
  → buildIndexHubOverlay → mergedGraph → graph-index.json
  → buildModuleSummaries → detectKnowledgeGaps → router/index/hot/gaps
- utils/hook-output.ts: multi-tool Stop hook output formatting
- interface-scanner: HTTP/MQ/RPC detection across languages (12 tests)
- call-chain-tracer: entry detection, layer classification (8 tests)
- code-graph-overlay: buildIndexHubOverlay node/edge generation (5 tests)
- doc-graph-extractor: structure + entity extraction (8 tests)
- hook-output: formatStopHookOutput multi-tool format (6 tests)

All tests use in-memory data, no filesystem/network dependencies.
Bug fixes applied:
- B1: unify graph-index path to .indices/ (was .teamwiki/.indices/)
- B2: fix router.md links (evidence/code/ prefix)
- B3: add teamwiki to safeIgnore
- B4: remove stale .teamwiki/evidence check
- B5: use saveGraphIndex() instead of manual writeFile
- B9: unify graph schema to GraphIndex (remove CodeGraphIndex)
- B13: filter third-party npm imports from relation facts
- B15: priority sort: key files first, then shallow dirs
- B16: generate deterministic overview.md
- B17: rename call-chains to dependency-paths (not runtime calls)
- B18: Python extractor: only service-pattern functions as components
- B19: facts deduplication by kind:name
- B21: doc-graph config pattern restricted to SCREAMING_SNAKE_CASE
- B22: API path pattern no longer requires /v\d*/ prefix

CLI integration:
- Add --extract, --incremental, --project, --max-files to codebase command
- Add extract branch to codebase-cmd.ts
- Add teamwiki/ to .gitignore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant