Skip to content

Commit 195eaf5

Browse files
authored
feat: remove variant as behavioral gate, recursive implementations, scoped two-phase traversal (#328)
* feat: remove variant as behavioral gate, make optional, add cycle detection (#324) Parsing is now presence-based instead of gated by variant: - All sections (imports, implementations, filters, requirements) are parsed unconditionally based on their presence in the YAML - variant field is optional in JSON Schema and domain model - Cycle detection added to import traversal (CircularImportError) - parsing_graph edges now carry edge_type ('import'/'implementation') - Filter processor uses edge_type instead of variant to skip implementation edges - Removed model_is_external utility and all variant assertions/match blocks Signed-off-by: Jimisola Laursen <jimisola@users.noreply.github.com> Signed-off-by: Jimisola Laursen <jimisola@jimisola.com> * test: add circular import test and update design doc (follow-up #324) - Add test_circular_import_raises with fixture data (node-a ↔ node-b cycle) - Add Q3 answer to PLAN_remove_variants.md (presence-based parsing for all URNs) Signed-off-by: Jimisola Laursen <jimisola@users.noreply.github.com> * feat: scoped two-phase traversal, recursive implementations, post-parse cleanup (closes #73, closes #324) - Implementation chains are now traversed recursively (library-uses-library model) replacing the flat leaf-node approach - Add CircularImplementationError with cycle detection for implementation chains - Add DatabaseFilterProcessor._remove_implementation_requirements() to exclude implementation-child requirement rows via post-parse SQL DELETE + CASCADE - Add docs/DESIGN.md capturing traversal architecture and key decisions - Update CLAUDE.md with Design Decisions section and corrected architecture description - Update docs/modules/ROOT/pages/how_it_works.adoc to reflect SQLite pipeline and two-phase traversal - Update docs/PLAN_remove_variants.md: revise Q1/Q2, add Q4-Q6 Signed-off-by: Jimisola Laursen <jimisola@users.noreply.github.com> * fix: regenerate requirements_schema.py after variant made optional datamodel-codegen no longer needs a RootModel wrapper after the variant field became optional in 5c052a2. * fix: remove stale .root dereference after Model flattened to BaseModel (#324) commit 889ecbb regenerated requirements_schema.py, collapsing Model(RootModel[Model1]) into a flat Model(BaseModel). The generator was still calling validated.root; replace with validated directly. Signed-off-by: Jimisola Laursen <jimisola@jimisola.com> * chore: remove stale PLAN_remove_variants.md (#324) Design decisions are now documented in CLAUDE.md and DESIGN.md. Signed-off-by: Jimisola Laursen <jimisola@jimisola.com> --------- Signed-off-by: Jimisola Laursen <jimisola@users.noreply.github.com> Signed-off-by: Jimisola Laursen <jimisola@jimisola.com>
1 parent 919649e commit 195eaf5

26 files changed

Lines changed: 470 additions & 198 deletions

File tree

CLAUDE.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,9 @@ The pipeline flows: **Location** → **parse** → **RawDataset** (transient)
4747
Abstractions for where source data lives. Implementations: `LocalLocation`, `GitLocation`, `MavenLocation`, `PypiLocation`. Each implements `_make_available_on_localdisk(dst_path)` to download/copy the source to a temp dir. `LocationResolver` (`location_resolver/`) handles relative path resolution when an import's location is relative to its parent.
4848

4949
### Data Ingestion (`model_generators/`, `requirements_indata/`)
50-
`CombinedRawDatasetsGenerator` is the top-level parser. It:
51-
1. Resolves the initial location to a local temp path (`TempDirectoryUtil`)
52-
2. Parses `requirements.yml``RequirementsModelGenerator``RequirementsData`
53-
3. Recursively follows `imports` (other system URNs) and `implementations` (microservice URNs)
54-
4. For each SYSTEM/MICROSERVICE source also parses: `svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML test results
55-
5. Each parsed `RawDataset` is immediately inserted into the in-memory SQLite database via `DatabasePopulator`
50+
`CombinedRawDatasetsGenerator` is the top-level parser. It runs in two phases:
51+
1. **Import chain** (recursive DFS): resolves location → parses `requirements.yml` → parses all auxiliary files (`svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML) → full insert into SQLite. Follows each node's own `imports:` section recursively. Cycle detection via visited set (`CircularImportError`).
52+
2. **Implementation chain** (recursive): follows `implementations:` sections recursively (library-uses-library model, not system→microservice). Parses all files; inserts **metadata only** for `requirements.yml` — requirement rows are excluded post-parse by `DatabaseFilterProcessor`. Cycle detection via separate visited set (`CircularImplementationError`).
5653

5754
### Storage Layer (`storage/`)
5855
In-memory SQLite is the single source of truth after parsing:
@@ -75,7 +72,7 @@ All domain objects are frozen/plain `@dataclass`s:
7572
- `TestsData` / `TestData` — JUnit XML test results
7673
- `CombinedRawDataset` — flat dict of all raw datasets + parsing graph (used during population and by `SemanticValidator`)
7774

78-
Variants (defined in `requirements.yml` metadata): `SYSTEM`, `MICROSERVICE`, `EXTERNAL`.
75+
`variant` field in `requirements.yml` metadata is optional advisory metadata (`system`, `microservice`, `external`). It is NOT a behavioral gate — parsing is presence-based. See `docs/DESIGN.md`.
7976

8077
### Services (`services/`)
8178
Business logic layer querying the database via `RequirementsRepository`:
@@ -145,6 +142,19 @@ diff /tmp/baseline-report-demo.txt /tmp/feature-report-demo.txt
145142

146143
If a diff is expected (e.g. the PR intentionally changes output), note it in the PR description.
147144

145+
## Design Decisions
146+
147+
Key architectural decisions that affect how to read and modify this codebase.
148+
Full rationale in `docs/DESIGN.md`.
149+
150+
- **Traversal is two-phase**: import chain (full insert) then implementation chain (metadata-only). Do not collapse these into a single pass.
151+
- **Implementation chains are recursive**: a library can have its own implementations (lib-a → lib-b → lib-c). Do not treat implementations as leaf nodes.
152+
- **`variant` is not a behavioral gate**: parsing is presence-based. Do not add `if variant == X` guards anywhere in the ingestion pipeline.
153+
- **Implementation-child requirements are excluded post-parse**: `DatabaseFilterProcessor` deletes them via SQL after ingestion. Do not filter at ingest time.
154+
- **Cycle detection covers both chains**: `CircularImportError` for the import chain, `CircularImplementationError` for the implementation chain.
155+
- **FK constraints scope evidence from implementation children**: SVCs/MVRs/annotations referencing out-of-scope requirements are rejected by SQLite FK checks on insert — no explicit filtering needed.
156+
- **Test results need explicit scoping**: no FK (keyed by FQN), so a scope check is required when inserting test results from implementation children.
157+
148158
## Key Conventions
149159

150160
- **URN format**: `some:urn:string` — the separator is `:`. `UrnId` is the canonical composite key used throughout indexes.

docs/DESIGN.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Design: Graph Traversal and Data Ingestion
2+
3+
Captures architectural decisions for how `CombinedRawDatasetsGenerator` traverses the URN graph
4+
and what data is inserted into SQLite for each node role.
5+
6+
Related code: `src/reqstool/model_generators/combined_raw_datasets_generator.py`,
7+
`src/reqstool/storage/database_filter_processor.py`
8+
9+
---
10+
11+
## The graph
12+
13+
A reqstool graph is a directed graph of URNs connected by two edge types:
14+
15+
- **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions)
16+
- **`implementation`** — "this URN implements my requirements" (downward, toward evidence providers)
17+
18+
Example:
19+
20+
```
21+
A1 (defines requirements)
22+
← imported by B1
23+
← imported by C1 (initial URN — the one being reported on)
24+
← implemented by lib-a
25+
← implemented by lib-b
26+
← implemented by lib-c
27+
```
28+
29+
---
30+
31+
## Two-phase traversal
32+
33+
### Phase 1 — import chain (DFS, recursive)
34+
35+
Traverses `imports:` sections recursively. For each node, all five data types are fully inserted:
36+
`requirements`, `svcs`, `mvrs`, `annotations`, `test_results`.
37+
38+
Order: depth-first so ancestors are inserted before their children. This matters for FK constraints
39+
(SVCs reference requirements that must exist first).
40+
41+
Cycle detection: visited set seeded with the initial URN. `CircularImportError` raised on re-entry.
42+
43+
### Phase 2 — implementation chain (recursive)
44+
45+
Traverses `implementations:` sections recursively. Think library-uses-library, not
46+
system→microservice. lib-a can have its own implementations (lib-b → lib-c).
47+
48+
For each node:
49+
50+
| File | Action |
51+
|------|--------|
52+
| `requirements.yml` | Parse fully (validation runs); insert **metadata only** — skip `insert_requirement` |
53+
| `svcs.yml` | Insert normally — FK on `req_urn/req_id` rejects rows referencing out-of-scope requirements |
54+
| `mvrs.yml` | Insert normally — FK on `svc_urn/svc_id` rejects rows referencing out-of-scope SVCs |
55+
| `annotations.yml` | Insert normally — FK on `req_urn/req_id` rejects out-of-scope rows |
56+
| test results | Insert with explicit scope check — no FK, keyed by FQN |
57+
58+
Cycle detection: separate visited set. `CircularImplementationError` raised on re-entry.
59+
60+
Note: `imports:` sections of implementation nodes are NOT followed. An implementation's own imports
61+
point to a different requirement scope.
62+
63+
---
64+
65+
## Post-parse cleanup
66+
67+
After both phases complete, `DatabaseFilterProcessor._remove_implementation_requirements()` deletes
68+
requirement rows for nodes that are only reachable via `implementation` edges:
69+
70+
```sql
71+
DELETE FROM requirements WHERE urn IN (
72+
SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation'
73+
EXCEPT
74+
SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import'
75+
EXCEPT
76+
SELECT value FROM metadata WHERE key = 'initial_urn'
77+
)
78+
```
79+
80+
CASCADE handles SVCs/MVRs/annotations that only linked to those deleted requirements.
81+
SVCs/annotations that link to in-scope requirements (from Phase 1) survive.
82+
83+
**Why post-parse and not ingest-time?** ~30 lines in the filter processor vs ~150 lines
84+
restructuring the generator and populator. The result is identical for an ephemeral in-memory DB.
85+
The filter processor already runs a post-parse cleanup pass for user-defined `filters:` blocks —
86+
adding structural cleanup there is consistent.
87+
88+
---
89+
90+
## Why recursive implementations?
91+
92+
The original design (pre-#324) treated implementations as leaf nodes based on the
93+
system→microservice mental model. This was revised because:
94+
95+
- `variant` is no longer a behavioral gate (see #324)
96+
- A library (`lib-a`) can depend on another library (`lib-b`) which itself has implementations
97+
- All nodes in the implementation subtree can have annotations/tests pointing to in-scope requirements
98+
- Flat traversal silently misses evidence from lib-b, lib-c, etc.
99+
100+
---
101+
102+
## Why `variant` is not a behavioral gate
103+
104+
Pre-#324, `variant: system/microservice/external` controlled which YAML sections were parsed and
105+
which files were read. This was removed because:
106+
107+
- It encoded relationship role as an intrinsic property (a URN is not inherently a "microservice")
108+
- It created a confusing 3×N matrix of allowed/disallowed sections
109+
- It silently ignored files when the variant didn't match, causing hard-to-debug data loss
110+
- Presence-based parsing is simpler, more predictable, and more general
111+
112+
`variant` remains in the schema as optional advisory metadata for display/tooling purposes.
Lines changed: 75 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,105 @@
11
= How it Works
22

3-
This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts].
3+
This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts]. For detailed architectural decisions, see the link:https://github.com/reqstool/reqstool-client/blob/main/docs/DESIGN.md[DESIGN.md] in the repository.
44

5-
== Template generation
5+
== Pipeline overview
66

7-
The CombinedIndexedDatasetGenerator prepares the data provided from the CombinedRawDatasetsGenerator for rendering with the Jinja2 templates and is used by the ReportCommand and the StatusCommand components.
7+
----
8+
Location → parse → RawDataset → INSERT into SQLite → Repository/Services → Command output
9+
----
810

9-
== Overview of central components
11+
Each command calls `build_database()` which:
12+
13+
1. Parses all sources into the in-memory SQLite database (two-phase traversal, see below)
14+
2. Applies filters (`DatabaseFilterProcessor`) — removes out-of-scope requirements and applies user-defined `filters:` blocks
15+
3. Runs lifecycle validation (warns on DEPRECATED/OBSOLETE references)
16+
4. Commands then query via `RequirementsRepository` and service layer
17+
18+
== Two-phase graph traversal
19+
20+
The requirement graph is a directed graph of URNs connected by two edge types:
21+
22+
* **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions)
23+
* **`implementation`** — "this URN provides evidence for my requirements" (downward, toward code/tests)
24+
25+
`CombinedRawDatasetsGenerator` traverses this graph in two phases:
26+
27+
=== Phase 1 — import chain (recursive DFS, full insert)
28+
29+
Follows `imports:` sections recursively, depth-first. For each node, all five data types are fully inserted into SQLite: requirements, SVCs, MVRs, annotations, and test results. Cycle detection raises `CircularImportError`.
30+
31+
=== Phase 2 — implementation chain (recursive, metadata-only insert)
1032

11-
Below is a breakdown of the central components of reqstool:
33+
Follows `implementations:` sections recursively. Think library-uses-library — lib-a can itself have implementations (lib-b → lib-c), all of which may contribute test evidence for the initial URN's requirements.
34+
35+
For each implementation node, `requirements.yml` is parsed (validation runs) but only the URN metadata is inserted — the requirement rows are excluded. All other files (SVCs, MVRs, annotations, test results) are inserted normally; SQLite FK constraints automatically discard rows that reference requirements outside the current scope. Cycle detection raises `CircularImplementationError`.
36+
37+
NOTE: `imports:` sections of implementation nodes are not followed — an implementation's own imports belong to a different scope.
38+
39+
== The `variant` field
40+
41+
`variant: system/microservice/external` in `requirements.yml` is optional advisory metadata. It is not a behavioral gate — parsing is entirely presence-based. If a file exists, it is read. If a section exists in YAML, it is parsed.
42+
43+
== Overview of central components
1244

1345
[plantuml,format=svg]
1446
....
1547
@startuml
1648
!include <C4/C4_Component>
1749
18-
Component(StatusCommand, "StatusCommand", "Processes status command")
19-
Component(GenerateJsonCommand, "GenerateJsonCommand", "Generates JSON from imported Models")
20-
Component(ReportCommand, "ReportCommand", "Generates reports")
21-
Component(SemanticValidator, "SemanticValidator", "Validates data read from source")
22-
Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Generates imported models")
23-
Component(reqstoolConfig, "reqstoolConfig", "Resolves paths to yaml files")
24-
Component(CombinedIndexedDatasetGenerator, "CombinedIndexedDatasetGenerator", "Prepares data for rendering of Jinja2 templates")
25-
Component(Command, "Command", "Handles user commands")
26-
27-
Rel(Command, StatusCommand, "Uses")
28-
Rel(Command, GenerateJsonCommand, "Uses")
29-
Rel(Command, ReportCommand, "Uses")
30-
Rel(CombinedRawDatasetsGenerator, SemanticValidator, "Depends on")
31-
Rel_Right(CombinedRawDatasetsGenerator, reqstoolConfig, "Uses")
32-
Rel(StatusCommand, CombinedRawDatasetsGenerator, "Uses")
33-
Rel(GenerateJsonCommand, CombinedRawDatasetsGenerator, "Uses")
34-
Rel(ReportCommand, CombinedRawDatasetsGenerator, "Uses")
35-
Rel(ReportCommand, CombinedIndexedDatasetGenerator, "Uses")
36-
Rel(StatusCommand, CombinedIndexedDatasetGenerator, "Uses")
37-
38-
Rel_Down(CombinedRawDatasetsGenerator, CombinedIndexedDatasetGenerator, "Provides data to")
50+
Component(Command, "Command", "Handles user commands (status, report, export)")
51+
Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Two-phase graph traversal and SQLite population")
52+
Component(DatabaseFilterProcessor, "DatabaseFilterProcessor", "Post-parse requirement/SVC filtering")
53+
Component(RequirementsRepository, "RequirementsRepository", "Data access layer over SQLite")
54+
Component(StatisticsService, "StatisticsService", "Computes per-requirement status and totals")
55+
Component(SemanticValidator, "SemanticValidator", "Cross-reference validation")
56+
Component(SQLiteDB, "SQLite (in-memory)", "Single source of truth after parsing")
57+
58+
Rel(Command, CombinedRawDatasetsGenerator, "build_database()")
59+
Rel(CombinedRawDatasetsGenerator, SQLiteDB, "INSERT")
60+
Rel(CombinedRawDatasetsGenerator, SemanticValidator, "validate_post_parsing()")
61+
Rel(DatabaseFilterProcessor, SQLiteDB, "DELETE (filters)")
62+
Rel(RequirementsRepository, SQLiteDB, "SELECT")
63+
Rel(StatisticsService, RequirementsRepository, "queries")
64+
Rel(Command, StatisticsService, "Uses")
65+
Rel(Command, RequirementsRepository, "Uses")
3966
4067
@enduml
4168
....
4269

43-
== Sequence diagram of the program execution
70+
== Sequence diagram
4471

45-
Below is an example to illustrate how reqstool parses data from the initial source.
72+
Below illustrates how reqstool processes the `status` command against an initial source that imports a parent system.
4673

4774
[plantuml,format=svg]
4875
....
4976
@startuml
5077
!include <C4/C4_Sequence>
5178
52-
Person(user, "User", "", "")
53-
79+
Person(user, "User")
5480
Container(reqsTool, "reqstool")
5581
56-
Container_Boundary(b, "Requirement files")
57-
Container_Boundary(b1, "MS-001")
58-
Component(reqs, "Requirements", "Requirements.yml")
59-
Component(svcs, "SVCS", "software_verification_cases.yml")
60-
Component(mvrs, "MVRS", "manual_verification_results.yml")
61-
Component(annot_impls,"Implementations", "requirements_annotations.yml")
62-
Component(annot_tests,"Automated tests", "svcs_annotations.yml")
63-
Boundary_End()
64-
Container_Boundary(b2, "Ext-001")
65-
Component(reqs_ext, "Requirements", "Requirements.yml")
66-
Boundary_End()
82+
Container_Boundary(phase1, "Phase 1 — import chain")
83+
Component(initial_reqs, "initial/requirements.yml")
84+
Component(initial_svcs, "initial/svcs.yml")
85+
Component(parent_reqs, "parent/requirements.yml")
86+
Boundary_End()
87+
88+
Container_Boundary(phase2, "Phase 2 — implementation chain")
89+
Component(impl_reqs, "lib-a/requirements.yml")
90+
Component(impl_svcs, "lib-a/svcs.yml")
91+
Component(impl_tests, "lib-a/test results")
6792
Boundary_End()
6893
69-
Rel(user, reqsTool, "Submit command", "bash")
70-
Rel(reqsTool, reqs, "Reads requirements")
71-
Rel(reqsTool, svcs, "Reads svcs")
72-
Rel(reqsTool, mvrs, "Reads mvrs")
73-
Rel(reqsTool, annot_impls, "Reads impls annotations")
74-
Rel(reqsTool, annot_tests, "Reads test annotations")
75-
Rel(reqsTool, reqsTool, "Create imported model")
76-
Rel(reqsTool, reqs_ext, "Reads imported requirements")
77-
Rel(reqsTool, reqsTool, "Create imported model")
78-
Rel(reqsTool, user, "Returns combined data based on imported")
94+
Rel(user, reqsTool, "reqstool status local -p ./initial")
95+
Rel(reqsTool, initial_reqs, "parse + full insert")
96+
Rel(reqsTool, initial_svcs, "parse + full insert")
97+
Rel(reqsTool, parent_reqs, "parse + full insert (recursive)")
98+
Rel(reqsTool, impl_reqs, "parse + metadata only")
99+
Rel(reqsTool, impl_svcs, "parse + FK-scoped insert")
100+
Rel(reqsTool, impl_tests, "parse + scoped insert")
101+
Rel(reqsTool, reqsTool, "post-parse: delete impl-child requirements")
102+
Rel(reqsTool, user, "status table (exit code = unmet requirements)")
79103
80104
@enduml
81105
....

src/reqstool/common/exceptions.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,19 @@ class MissingRequirementsFileError(Exception):
77
def __init__(self, path: str):
88
self.path = path
99
super().__init__(f"Missing requirements file: {path}")
10+
11+
12+
class CircularImportError(Exception):
13+
"""Raised when a circular import is detected in the requirements graph."""
14+
15+
def __init__(self, urn: str, chain: list[str]):
16+
self.urn = urn
17+
super().__init__(f"Circular import detected: {' -> '.join(chain)} -> {urn}")
18+
19+
20+
class CircularImplementationError(Exception):
21+
"""Raised when a circular implementation chain is detected in the requirements graph."""
22+
23+
def __init__(self, urn: str, chain: list[str]):
24+
self.urn = urn
25+
super().__init__(f"Circular implementation detected: {' -> '.join(chain)} -> {urn}")

src/reqstool/common/utils.py

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
from reqstool.common.models.urn_id import UrnId
1818
from reqstool.models.raw_datasets import RawDataset
19-
from reqstool.models.requirements import VARIANTS, RequirementData
19+
from reqstool.models.requirements import RequirementData
2020
from reqstool.models.svcs import SVCData
2121

2222

@@ -131,8 +131,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]:
131131
all_svcs = {}
132132

133133
for model_id, model_info in raw_datasets.items():
134-
if Utils.model_is_external(raw_datasets=model_info):
135-
continue
136134
if model_info.svcs_data is not None:
137135
for svc_id, svc in model_info.svcs_data.cases.items():
138136
if svc_id not in all_svcs:
@@ -144,10 +142,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]:
144142
def flatten_list(list_to_flatten: Iterable) -> List[any]:
145143
return list(chain.from_iterable(list_to_flatten))
146144

147-
@staticmethod
148-
def model_is_external(raw_datasets: RawDataset) -> bool:
149-
return raw_datasets.requirements_data.metadata.variant.value == VARIANTS.EXTERNAL.value
150-
151145
@staticmethod
152146
def string_contains_delimiter(string: str, delimiter: str) -> bool:
153147
return delimiter in string

0 commit comments

Comments
 (0)