reqstool · jimisola · Mar 19, 2026 · Mar 19, 2026 · Mar 19, 2026 · Mar 19, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -47,12 +47,9 @@ The pipeline flows: **Location** → **parse** → **RawDataset** (transient)
 Abstractions for where source data lives. Implementations: `LocalLocation`, `GitLocation`, `MavenLocation`, `PypiLocation`. Each implements `_make_available_on_localdisk(dst_path)` to download/copy the source to a temp dir. `LocationResolver` (`location_resolver/`) handles relative path resolution when an import's location is relative to its parent.
 
 ### Data Ingestion (`model_generators/`, `requirements_indata/`)
-`CombinedRawDatasetsGenerator` is the top-level parser. It:
-1. Resolves the initial location to a local temp path (`TempDirectoryUtil`)
-2. Parses `requirements.yml` → `RequirementsModelGenerator` → `RequirementsData`
-3. Recursively follows `imports` (other system URNs) and `implementations` (microservice URNs)
-4. For each SYSTEM/MICROSERVICE source also parses: `svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML test results
-5. Each parsed `RawDataset` is immediately inserted into the in-memory SQLite database via `DatabasePopulator`
+`CombinedRawDatasetsGenerator` is the top-level parser. It runs in two phases:
+1. **Import chain** (recursive DFS): resolves location → parses `requirements.yml` → parses all auxiliary files (`svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML) → full insert into SQLite. Follows each node's own `imports:` section recursively. Cycle detection via visited set (`CircularImportError`).
+2. **Implementation chain** (recursive): follows `implementations:` sections recursively (library-uses-library model, not system→microservice). Parses all files; inserts **metadata only** for `requirements.yml` — requirement rows are excluded post-parse by `DatabaseFilterProcessor`. Cycle detection via separate visited set (`CircularImplementationError`).
 
 ### Storage Layer (`storage/`)
 In-memory SQLite is the single source of truth after parsing:
@@ -75,7 +72,7 @@ All domain objects are frozen/plain `@dataclass`s:
 - `TestsData` / `TestData` — JUnit XML test results
 - `CombinedRawDataset` — flat dict of all raw datasets + parsing graph (used during population and by `SemanticValidator`)
 
-Variants (defined in `requirements.yml` metadata): `SYSTEM`, `MICROSERVICE`, `EXTERNAL`.
+`variant` field in `requirements.yml` metadata is optional advisory metadata (`system`, `microservice`, `external`). It is NOT a behavioral gate — parsing is presence-based. See `docs/DESIGN.md`.
 
 ### Services (`services/`)
 Business logic layer querying the database via `RequirementsRepository`:
@@ -145,6 +142,19 @@ diff /tmp/baseline-report-demo.txt /tmp/feature-report-demo.txt
 
 If a diff is expected (e.g. the PR intentionally changes output), note it in the PR description.
 
+## Design Decisions
+
+Key architectural decisions that affect how to read and modify this codebase.
+Full rationale in `docs/DESIGN.md`.
+
+- **Traversal is two-phase**: import chain (full insert) then implementation chain (metadata-only). Do not collapse these into a single pass.
+- **Implementation chains are recursive**: a library can have its own implementations (lib-a → lib-b → lib-c). Do not treat implementations as leaf nodes.
+- **`variant` is not a behavioral gate**: parsing is presence-based. Do not add `if variant == X` guards anywhere in the ingestion pipeline.
+- **Implementation-child requirements are excluded post-parse**: `DatabaseFilterProcessor` deletes them via SQL after ingestion. Do not filter at ingest time.
+- **Cycle detection covers both chains**: `CircularImportError` for the import chain, `CircularImplementationError` for the implementation chain.
+- **FK constraints scope evidence from implementation children**: SVCs/MVRs/annotations referencing out-of-scope requirements are rejected by SQLite FK checks on insert — no explicit filtering needed.
+- **Test results need explicit scoping**: no FK (keyed by FQN), so a scope check is required when inserting test results from implementation children.
+
 ## Key Conventions
 
 - **URN format**: `some:urn:string` — the separator is `:`. `UrnId` is the canonical composite key used throughout indexes.

diff --git a/docs/DESIGN.md b/docs/DESIGN.md
@@ -0,0 +1,112 @@
+# Design: Graph Traversal and Data Ingestion
+
+Captures architectural decisions for how `CombinedRawDatasetsGenerator` traverses the URN graph
+and what data is inserted into SQLite for each node role.
+
+Related code: `src/reqstool/model_generators/combined_raw_datasets_generator.py`,
+`src/reqstool/storage/database_filter_processor.py`
+
+---
+
+## The graph
+
+A reqstool graph is a directed graph of URNs connected by two edge types:
+
+- **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions)
+- **`implementation`** — "this URN implements my requirements" (downward, toward evidence providers)
+
+Example:
+
+```
+A1 (defines requirements)
+  ← imported by B1
+    ← imported by C1 (initial URN — the one being reported on)
+      ← implemented by lib-a
+        ← implemented by lib-b
+          ← implemented by lib-c
+```
+
+---
+
+## Two-phase traversal
+
+### Phase 1 — import chain (DFS, recursive)
+
+Traverses `imports:` sections recursively. For each node, all five data types are fully inserted:
+`requirements`, `svcs`, `mvrs`, `annotations`, `test_results`.
+
+Order: depth-first so ancestors are inserted before their children. This matters for FK constraints
+(SVCs reference requirements that must exist first).
+
+Cycle detection: visited set seeded with the initial URN. `CircularImportError` raised on re-entry.
+
+### Phase 2 — implementation chain (recursive)
+
+Traverses `implementations:` sections recursively. Think library-uses-library, not
+system→microservice. lib-a can have its own implementations (lib-b → lib-c).
+
+For each node:
+
+| File | Action |
+|------|--------|
+| `requirements.yml` | Parse fully (validation runs); insert **metadata only** — skip `insert_requirement` |
+| `svcs.yml` | Insert normally — FK on `req_urn/req_id` rejects rows referencing out-of-scope requirements |
+| `mvrs.yml` | Insert normally — FK on `svc_urn/svc_id` rejects rows referencing out-of-scope SVCs |
+| `annotations.yml` | Insert normally — FK on `req_urn/req_id` rejects out-of-scope rows |
+| test results | Insert with explicit scope check — no FK, keyed by FQN |
+
+Cycle detection: separate visited set. `CircularImplementationError` raised on re-entry.
+
+Note: `imports:` sections of implementation nodes are NOT followed. An implementation's own imports
+point to a different requirement scope.
+
+---
+
+## Post-parse cleanup
+
+After both phases complete, `DatabaseFilterProcessor._remove_implementation_requirements()` deletes
+requirement rows for nodes that are only reachable via `implementation` edges:
+
+```sql
+DELETE FROM requirements WHERE urn IN (
+    SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation'
+    EXCEPT
+    SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import'
+    EXCEPT
+    SELECT value FROM metadata WHERE key = 'initial_urn'
+)
+```
+
+CASCADE handles SVCs/MVRs/annotations that only linked to those deleted requirements.
+SVCs/annotations that link to in-scope requirements (from Phase 1) survive.
+
+**Why post-parse and not ingest-time?** ~30 lines in the filter processor vs ~150 lines
+restructuring the generator and populator. The result is identical for an ephemeral in-memory DB.
+The filter processor already runs a post-parse cleanup pass for user-defined `filters:` blocks —
+adding structural cleanup there is consistent.
+
+---
+
+## Why recursive implementations?
+
+The original design (pre-#324) treated implementations as leaf nodes based on the
+system→microservice mental model. This was revised because:
+
+- `variant` is no longer a behavioral gate (see #324)
+- A library (`lib-a`) can depend on another library (`lib-b`) which itself has implementations
+- All nodes in the implementation subtree can have annotations/tests pointing to in-scope requirements
+- Flat traversal silently misses evidence from lib-b, lib-c, etc.
+
+---
+
+## Why `variant` is not a behavioral gate
+
+Pre-#324, `variant: system/microservice/external` controlled which YAML sections were parsed and
+which files were read. This was removed because:
+
+- It encoded relationship role as an intrinsic property (a URN is not inherently a "microservice")
+- It created a confusing 3×N matrix of allowed/disallowed sections
+- It silently ignored files when the variant didn't match, causing hard-to-debug data loss
+- Presence-based parsing is simpler, more predictable, and more general
+
+`variant` remains in the schema as optional advisory metadata for display/tooling purposes.
diff --git a/docs/modules/ROOT/pages/how_it_works.adoc b/docs/modules/ROOT/pages/how_it_works.adoc
@@ -1,81 +1,105 @@
 = How it Works
 
-This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts].
+This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts]. For detailed architectural decisions, see the link:https://github.com/reqstool/reqstool-client/blob/main/docs/DESIGN.md[DESIGN.md] in the repository.
 
-== Template generation
+== Pipeline overview
 
-The CombinedIndexedDatasetGenerator prepares the data provided from the CombinedRawDatasetsGenerator for rendering with the Jinja2 templates and is used by the ReportCommand and the StatusCommand components.
+----
+Location → parse → RawDataset → INSERT into SQLite → Repository/Services → Command output
+----
 
-== Overview of central components
+Each command calls `build_database()` which:
+
+1. Parses all sources into the in-memory SQLite database (two-phase traversal, see below)
+2. Applies filters (`DatabaseFilterProcessor`) — removes out-of-scope requirements and applies user-defined `filters:` blocks
+3. Runs lifecycle validation (warns on DEPRECATED/OBSOLETE references)
+4. Commands then query via `RequirementsRepository` and service layer
+
+== Two-phase graph traversal
+
+The requirement graph is a directed graph of URNs connected by two edge types:
+
+* **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions)
+* **`implementation`** — "this URN provides evidence for my requirements" (downward, toward code/tests)
+
+`CombinedRawDatasetsGenerator` traverses this graph in two phases:
+
+=== Phase 1 — import chain (recursive DFS, full insert)
+
+Follows `imports:` sections recursively, depth-first. For each node, all five data types are fully inserted into SQLite: requirements, SVCs, MVRs, annotations, and test results. Cycle detection raises `CircularImportError`.
+
+=== Phase 2 — implementation chain (recursive, metadata-only insert)
 
-Below is a breakdown of the central components of reqstool:
+Follows `implementations:` sections recursively. Think library-uses-library — lib-a can itself have implementations (lib-b → lib-c), all of which may contribute test evidence for the initial URN's requirements.
+
+For each implementation node, `requirements.yml` is parsed (validation runs) but only the URN metadata is inserted — the requirement rows are excluded. All other files (SVCs, MVRs, annotations, test results) are inserted normally; SQLite FK constraints automatically discard rows that reference requirements outside the current scope. Cycle detection raises `CircularImplementationError`.
+
+NOTE: `imports:` sections of implementation nodes are not followed — an implementation's own imports belong to a different scope.
+
+== The `variant` field
+
+`variant: system/microservice/external` in `requirements.yml` is optional advisory metadata. It is not a behavioral gate — parsing is entirely presence-based. If a file exists, it is read. If a section exists in YAML, it is parsed.
+
+== Overview of central components
 
 [plantuml,format=svg]
 ....
 @startuml
 !include <C4/C4_Component>
 
-Component(StatusCommand, "StatusCommand", "Processes status command")
-Component(GenerateJsonCommand, "GenerateJsonCommand", "Generates JSON from imported Models")
-Component(ReportCommand, "ReportCommand", "Generates reports")
-Component(SemanticValidator, "SemanticValidator", "Validates data read from source")
-Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Generates imported models")
-Component(reqstoolConfig, "reqstoolConfig", "Resolves paths to yaml files")
-Component(CombinedIndexedDatasetGenerator, "CombinedIndexedDatasetGenerator", "Prepares data for rendering of Jinja2 templates")
-Component(Command, "Command", "Handles user commands")
-
-Rel(Command, StatusCommand, "Uses")
-Rel(Command, GenerateJsonCommand, "Uses")
-Rel(Command, ReportCommand, "Uses")
-Rel(CombinedRawDatasetsGenerator, SemanticValidator, "Depends on")
-Rel_Right(CombinedRawDatasetsGenerator, reqstoolConfig, "Uses")
-Rel(StatusCommand, CombinedRawDatasetsGenerator, "Uses")
-Rel(GenerateJsonCommand, CombinedRawDatasetsGenerator, "Uses")
-Rel(ReportCommand, CombinedRawDatasetsGenerator, "Uses")
-Rel(ReportCommand, CombinedIndexedDatasetGenerator, "Uses")
-Rel(StatusCommand, CombinedIndexedDatasetGenerator, "Uses")
-
-Rel_Down(CombinedRawDatasetsGenerator, CombinedIndexedDatasetGenerator, "Provides data to")
+Component(Command, "Command", "Handles user commands (status, report, export)")
+Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Two-phase graph traversal and SQLite population")
+Component(DatabaseFilterProcessor, "DatabaseFilterProcessor", "Post-parse requirement/SVC filtering")
+Component(RequirementsRepository, "RequirementsRepository", "Data access layer over SQLite")
+Component(StatisticsService, "StatisticsService", "Computes per-requirement status and totals")
+Component(SemanticValidator, "SemanticValidator", "Cross-reference validation")
+Component(SQLiteDB, "SQLite (in-memory)", "Single source of truth after parsing")
+
+Rel(Command, CombinedRawDatasetsGenerator, "build_database()")
+Rel(CombinedRawDatasetsGenerator, SQLiteDB, "INSERT")
+Rel(CombinedRawDatasetsGenerator, SemanticValidator, "validate_post_parsing()")
+Rel(DatabaseFilterProcessor, SQLiteDB, "DELETE (filters)")
+Rel(RequirementsRepository, SQLiteDB, "SELECT")
+Rel(StatisticsService, RequirementsRepository, "queries")
+Rel(Command, StatisticsService, "Uses")
+Rel(Command, RequirementsRepository, "Uses")
 
 @enduml
 ....
 
-== Sequence diagram of the program execution
+== Sequence diagram
 
-Below is an example to illustrate how reqstool parses data from the initial source.
+Below illustrates how reqstool processes the `status` command against an initial source that imports a parent system.
 
 [plantuml,format=svg]
 ....
 @startuml
 !include <C4/C4_Sequence>
 
-Person(user, "User", "", "")
-
+Person(user, "User")
 Container(reqsTool, "reqstool")
 
-Container_Boundary(b, "Requirement files")
-  Container_Boundary(b1, "MS-001")
-    Component(reqs, "Requirements", "Requirements.yml")
-    Component(svcs, "SVCS", "software_verification_cases.yml")
-    Component(mvrs, "MVRS", "manual_verification_results.yml")
-    Component(annot_impls,"Implementations", "requirements_annotations.yml")
-    Component(annot_tests,"Automated tests", "svcs_annotations.yml")
-  Boundary_End()
-  Container_Boundary(b2, "Ext-001")
-    Component(reqs_ext, "Requirements", "Requirements.yml")
-  Boundary_End()
+Container_Boundary(phase1, "Phase 1 — import chain")
+  Component(initial_reqs, "initial/requirements.yml")
+  Component(initial_svcs, "initial/svcs.yml")
+  Component(parent_reqs, "parent/requirements.yml")
+Boundary_End()
+
+Container_Boundary(phase2, "Phase 2 — implementation chain")
+  Component(impl_reqs, "lib-a/requirements.yml")
+  Component(impl_svcs, "lib-a/svcs.yml")
+  Component(impl_tests, "lib-a/test results")
 Boundary_End()
 
-Rel(user, reqsTool, "Submit command", "bash")
-Rel(reqsTool, reqs, "Reads requirements")
-Rel(reqsTool, svcs, "Reads svcs")
-Rel(reqsTool, mvrs, "Reads mvrs")
-Rel(reqsTool, annot_impls, "Reads impls annotations")
-Rel(reqsTool, annot_tests, "Reads test annotations")
-Rel(reqsTool, reqsTool, "Create imported model")
-Rel(reqsTool, reqs_ext, "Reads imported requirements")
-Rel(reqsTool, reqsTool, "Create imported model")
-Rel(reqsTool, user, "Returns combined data based on imported")
+Rel(user, reqsTool, "reqstool status local -p ./initial")
+Rel(reqsTool, initial_reqs, "parse + full insert")
+Rel(reqsTool, initial_svcs, "parse + full insert")
+Rel(reqsTool, parent_reqs, "parse + full insert (recursive)")
+Rel(reqsTool, impl_reqs, "parse + metadata only")
+Rel(reqsTool, impl_svcs, "parse + FK-scoped insert")
+Rel(reqsTool, impl_tests, "parse + scoped insert")
+Rel(reqsTool, reqsTool, "post-parse: delete impl-child requirements")
+Rel(reqsTool, user, "status table (exit code = unmet requirements)")
 
 @enduml
 ....
diff --git a/src/reqstool/common/exceptions.py b/src/reqstool/common/exceptions.py
@@ -7,3 +7,19 @@ class MissingRequirementsFileError(Exception):
     def __init__(self, path: str):
         self.path = path
         super().__init__(f"Missing requirements file: {path}")
+
+
+class CircularImportError(Exception):
+    """Raised when a circular import is detected in the requirements graph."""
+
+    def __init__(self, urn: str, chain: list[str]):
+        self.urn = urn
+        super().__init__(f"Circular import detected: {' -> '.join(chain)} -> {urn}")
+
+
+class CircularImplementationError(Exception):
+    """Raised when a circular implementation chain is detected in the requirements graph."""
+
+    def __init__(self, urn: str, chain: list[str]):
+        self.urn = urn
+        super().__init__(f"Circular implementation detected: {' -> '.join(chain)} -> {urn}")
diff --git a/src/reqstool/common/utils.py b/src/reqstool/common/utils.py
@@ -16,7 +16,7 @@
 
 from reqstool.common.models.urn_id import UrnId
 from reqstool.models.raw_datasets import RawDataset
-from reqstool.models.requirements import VARIANTS, RequirementData
+from reqstool.models.requirements import RequirementData
 from reqstool.models.svcs import SVCData
 
 
@@ -131,8 +131,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]:
         all_svcs = {}
 
         for model_id, model_info in raw_datasets.items():
-            if Utils.model_is_external(raw_datasets=model_info):
-                continue
             if model_info.svcs_data is not None:
                 for svc_id, svc in model_info.svcs_data.cases.items():
                     if svc_id not in all_svcs:
@@ -144,10 +142,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]:
     def flatten_list(list_to_flatten: Iterable) -> List[any]:
         return list(chain.from_iterable(list_to_flatten))
 
-    @staticmethod
-    def model_is_external(raw_datasets: RawDataset) -> bool:
-        return raw_datasets.requirements_data.metadata.variant.value == VARIANTS.EXTERNAL.value
-
     @staticmethod
     def string_contains_delimiter(string: str, delimiter: str) -> bool:
         return delimiter in string