From 5c052a237ba7de205c44f8cdc6ebc96222cf85ad Mon Sep 17 00:00:00 2001 From: Jimisola Laursen Date: Thu, 19 Mar 2026 08:49:59 +0100 Subject: [PATCH 1/6] feat: remove variant as behavioral gate, make optional, add cycle detection (#324) Parsing is now presence-based instead of gated by variant: - All sections (imports, implementations, filters, requirements) are parsed unconditionally based on their presence in the YAML - variant field is optional in JSON Schema and domain model - Cycle detection added to import traversal (CircularImportError) - parsing_graph edges now carry edge_type ('import'/'implementation') - Filter processor uses edge_type instead of variant to skip implementation edges - Removed model_is_external utility and all variant assertions/match blocks Signed-off-by: Jimisola Laursen Signed-off-by: Jimisola Laursen --- docs/PLAN_remove_variants.md | 147 ++++++++++++++++++ src/reqstool/common/exceptions.py | 8 + src/reqstool/common/utils.py | 8 +- .../combined_raw_datasets_generator.py | 93 +++++------ .../requirements_model_generator.py | 24 +-- .../models/generated/requirements_schema.py | 2 +- src/reqstool/models/raw_datasets.py | 4 +- src/reqstool/models/requirements.py | 2 +- .../schemas/v1/requirements.schema.json | 30 ---- src/reqstool/storage/database.py | 14 +- .../storage/database_filter_processor.py | 17 +- src/reqstool/storage/schema.py | 3 +- tests/unit/reqstool/storage/test_database.py | 8 +- .../storage/test_database_filter_processor.py | 2 +- .../storage/test_requirements_repository.py | 2 +- 15 files changed, 231 insertions(+), 133 deletions(-) create mode 100644 docs/PLAN_remove_variants.md diff --git a/docs/PLAN_remove_variants.md b/docs/PLAN_remove_variants.md new file mode 100644 index 00000000..f006a6ef --- /dev/null +++ b/docs/PLAN_remove_variants.md @@ -0,0 +1,147 @@ +# Open Questions: Remove `variant` as Behavioral Gate (#324) + +## Q1: Implementation import traversal + +When an implementation (microservice) is loaded, should its own `imports` section also be traversed? + +### Answer: No — implementations are leaf nodes (Option A) + +The traversal is directional from the initial URN's perspective: + +- **Imports = parents (upward)**: "whose requirements do I reference?" → follow recursively +- **Implementations = children (downward)**: "who implements my requirements?" → load annotations/SVCs/tests only + +An implementation's own imports point to requirements *outside* the initial URN's parent chain. +Those are a different scope — relevant only if that other system is parsed as its own initial source. + +``` +initial-source (C1) + imports/ → recurse upward (parents: B1 → A1, A4) + implementations/ → load flat, no sub-traversal + D1 (leaf — provides annotations/SVCs/tests for C1's requirements) + D1's import of C2 is IRRELEVANT from C1's perspective +``` + +--- + +## Discussion Graph + +4 layers × 4 nodes for reasoning about traversal scenarios. + +```mermaid +graph TD + subgraph A["Layer A"] + A1 + A2 + A3 + A4 + end + + subgraph B["Layer B"] + B1 + B2 + B3 + B4 + end + + subgraph C["Layer C"] + C1 + C2 + C3 + C4 + end + + subgraph D["Layer D"] + D1 + D2 + D3 + D4 + end + + A1 --> B1 + A1 --> B2 + A2 --> B2 + A2 --> B3 + A3 --> B3 + A4 --> B1 + + B1 --> C1 + B1 --> C2 + B2 --> C2 + B2 --> C3 + B3 --> C3 + B3 --> C4 + B4 --> C4 + + C1 --> D1 + C2 --> D1 + C2 --> D2 + C3 --> D2 + C3 --> D3 + C4 --> D3 + C4 --> D4 +``` + +Edges are unlabelled — overlay `import` / `implementation` semantics per scenario. + +Notable properties: +- **Shared nodes**: B2 (A1+A2), B3 (A2+A3), C2 (B1+B2), C3 (B2+B3), D1 (C1+C2), D2 (C2+C3), D3 (C3+C4) +- **Isolated paths**: A4→B1 (A4 shares B1 with A1 but has no other children); B4→C4 (B4 only reachable if explicitly listed) +- **Diamond patterns**: A1→B1→C2→D2 and A1→B2→C2→D2 (two paths to D2 via C2) + +### Scenario 1 — A1 is initial, all edges are `import` + +A1 imports B1, B2. B1 imports C1, C2. B2 imports C2, C3. Etc. + +- Traversal reaches: B1, B2, C1, C2, C3, D1, D2, D3 +- C2 is reached twice (via B1 and B2) — visited-set prevents re-parsing +- **Not reached**: A2, A3, A4, B3, B4, C4, D4 + +### Scenario 2 — B2 is initial (microservice as initial source) + +A microservice CAN have imports — currently it calls `__import_systems` on them. +B2 imports C2, C3. Those recursively import D1, D2, D3. + +- Traversal reaches: C2, C3, D1, D2, D3 +- B2 has no implementations listed +- **Not reached**: anything in layer A, B1, B3, B4, C1, C4, D4 + +This scenario confirms microservices import systems and that imports must be followed regardless of who is initial. + +### Scenario 3 — A1 is initial, B4 is an `implementation` + +A1 imports B1, B2 (→ C1-C3, D1-D3 as in Scenario 1). +A1 also lists B4 as an implementation (microservice implementing A1's requirements). + +B4 is loaded as a leaf node: +- B4's annotations/SVCs/tests are checked against A1's (+ parents') requirements +- B4 may also import C4, but that import is irrelevant from A1's perspective +- C4, D4 are not reached — and shouldn't be. They are outside A1's requirement scope. + +### Scenario 4 — A2 is initial, B2/B3 are shared with A1's graph + +If A2 is parsed after A1 in a multi-root scenario: +- A2 imports B2 (already visited), B3 (new) +- B3 imports C3 (already visited), C4 (new) +- C4 imports D3 (already visited), D4 (new) + +Visited-set handles re-entry into already-parsed nodes cleanly. + +### Scenario 5 — Cycle + +Hypothetical: D1 has an import back to A1. + +- Without detection: A1→B1→C1→D1→A1→… infinite +- With visited-set on imports: after A1 is added to visited on first entry, D1→A1 triggers `CircularImportError` +- Same logic applies if the back-edge is via an implementation edge (Q2 scope question) + +--- + +## Q2: Cycle detection scope + +Where should circular import detection trigger? + +### Answer: Import chain only (Option A) + +Cycles can only occur going up the import chain (A imports B imports A). +Implementation edges point downward and are not recursed into, so they cannot form cycles. diff --git a/src/reqstool/common/exceptions.py b/src/reqstool/common/exceptions.py index b0662dc3..8f7b8e88 100644 --- a/src/reqstool/common/exceptions.py +++ b/src/reqstool/common/exceptions.py @@ -7,3 +7,11 @@ class MissingRequirementsFileError(Exception): def __init__(self, path: str): self.path = path super().__init__(f"Missing requirements file: {path}") + + +class CircularImportError(Exception): + """Raised when a circular import is detected in the requirements graph.""" + + def __init__(self, urn: str, chain: list[str]): + self.urn = urn + super().__init__(f"Circular import detected: {' -> '.join(chain)} -> {urn}") diff --git a/src/reqstool/common/utils.py b/src/reqstool/common/utils.py index fd2ac8b5..c7502188 100644 --- a/src/reqstool/common/utils.py +++ b/src/reqstool/common/utils.py @@ -16,7 +16,7 @@ from reqstool.common.models.urn_id import UrnId from reqstool.models.raw_datasets import RawDataset -from reqstool.models.requirements import VARIANTS, RequirementData +from reqstool.models.requirements import RequirementData from reqstool.models.svcs import SVCData @@ -131,8 +131,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]: all_svcs = {} for model_id, model_info in raw_datasets.items(): - if Utils.model_is_external(raw_datasets=model_info): - continue if model_info.svcs_data is not None: for svc_id, svc in model_info.svcs_data.cases.items(): if svc_id not in all_svcs: @@ -144,10 +142,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]: def flatten_list(list_to_flatten: Iterable) -> List[any]: return list(chain.from_iterable(list_to_flatten)) - @staticmethod - def model_is_external(raw_datasets: RawDataset) -> bool: - return raw_datasets.requirements_data.metadata.variant.value == VARIANTS.EXTERNAL.value - @staticmethod def string_contains_delimiter(string: str, delimiter: str) -> bool: return delimiter in string diff --git a/src/reqstool/model_generators/combined_raw_datasets_generator.py b/src/reqstool/model_generators/combined_raw_datasets_generator.py index 4b9bfda8..c50c17fd 100644 --- a/src/reqstool/model_generators/combined_raw_datasets_generator.py +++ b/src/reqstool/model_generators/combined_raw_datasets_generator.py @@ -2,11 +2,11 @@ import logging from collections import defaultdict -from typing import Dict, List, Optional +from typing import Dict, List, Optional, Set, Tuple from reqstool_python_decorators.decorators.decorators import Requirements -from reqstool.common.exceptions import MissingRequirementsFileError +from reqstool.common.exceptions import CircularImportError, MissingRequirementsFileError from reqstool.common.utils import TempDirectoryUtil, Utils from reqstool.common.validators.semantic_validator import SemanticValidator from reqstool.location_resolver.location_resolver import LocationResolver @@ -20,7 +20,7 @@ from reqstool.models.implementations import ImplementationDataInterface from reqstool.models.mvrs import MVRsData from reqstool.models.raw_datasets import CombinedRawDataset, RawDataset -from reqstool.models.requirements import VARIANTS, RequirementsData +from reqstool.models.requirements import RequirementsData from reqstool.models.svcs import SVCsData from reqstool.models.test_data import TestsData from reqstool.requirements_indata.requirements_indata import RequirementsIndata @@ -36,13 +36,12 @@ def __init__( database: Optional[RequirementsDatabase] = None, ): self.__level: int = 0 - self.__initial_source_type: VARIANTS = None self.__initial_location_handler: LocationResolver = LocationResolver( parent=None, current_unresolved=initial_location ) self.semantic_validator = semantic_validator self._parsing_order: List[str] = [] - self._parsing_graph: Dict[str, List[str]] = defaultdict(list) + self._parsing_graph: Dict[str, List[Tuple[str, str]]] = defaultdict(list) self._database = database self.combined_raw_datasets = self.__generate() @@ -135,31 +134,32 @@ def __populate_test_results(self, crd: CombinedRawDataset) -> None: def __populate_parsing_graph(self, crd: CombinedRawDataset) -> None: for parent_urn, children in crd.parsing_graph.items(): - for child_urn in children: - self._database.insert_parsing_graph_edge(parent_urn, child_urn) + for child_urn, edge_type in children: + self._database.insert_parsing_graph_edge(parent_urn, child_urn, edge_type) def __handle_initial_imports(self, raw_datasets: Dict[str, RawDataset], rd: RequirementsData): - match self.__initial_source_type: - case VARIANTS.SYSTEM: - parsed_systems = self.__import_systems(raw_datasets, parent_rd=rd) - parsed_microservices = self.__import_implementations(raw_datasets, implementations=rd.implementations) - self._parsing_graph[rd.metadata.urn].extend(parsed_systems) - self._parsing_graph[rd.metadata.urn].extend(parsed_microservices) - - # add current urn as parent to all microservices - for ms_urn in parsed_microservices: - self._parsing_graph[ms_urn].append(rd.metadata.urn) - - case VARIANTS.MICROSERVICE: - parsed_systems = self.__import_systems(raw_datasets, parent_rd=rd) - self._parsing_graph[rd.metadata.urn].extend(parsed_systems) - case _: - raise RuntimeError("Unsupported initial source system type (this should not happen)") - - def __import_systems(self, raw_datasets: Dict[str, RawDataset], parent_rd: RequirementsData) -> List[str]: - if parent_rd.imports is None: + if rd.imports: + parsed_systems = self.__import_systems(raw_datasets, parent_rd=rd, visited={rd.metadata.urn}) + self._parsing_graph[rd.metadata.urn].extend([(u, "import") for u in parsed_systems]) + + if rd.implementations: + parsed_microservices = self.__import_implementations(raw_datasets, implementations=rd.implementations) + self._parsing_graph[rd.metadata.urn].extend([(u, "implementation") for u in parsed_microservices]) + for ms_urn in parsed_microservices: + self._parsing_graph[ms_urn].append((rd.metadata.urn, "implementation")) + + def __import_systems( + self, + raw_datasets: Dict[str, RawDataset], + parent_rd: RequirementsData, + visited: Optional[Set[str]] = None, + ) -> List[str]: + if not parent_rd.imports: return [] + if visited is None: + visited = set() + self.__level += 1 parsed_urns: List[str] = [] @@ -167,27 +167,23 @@ def __import_systems(self, raw_datasets: Dict[str, RawDataset], parent_rd: Requi current_imported_model = self.__parse_source(current_location_handler=system) current_urn = current_imported_model.requirements_data.metadata.urn + if current_urn in visited: + raise CircularImportError(current_urn, list(visited)) + + visited.add(current_urn) + # add urn to parsing_order_list self._parsing_order.append(current_urn) parsed_urns.append(current_urn) raw_datasets[current_urn] = current_imported_model - assert ( - current_imported_model.requirements_data.metadata.variant == VARIANTS.SYSTEM - or current_imported_model.requirements_data.metadata.variant == VARIANTS.EXTERNAL + # recursively import systems + imported_systems = self.__import_systems( + raw_datasets=raw_datasets, parent_rd=current_imported_model.requirements_data, visited=visited ) - # if current source type is system or external import systems recursively - if ( - current_imported_model.requirements_data.metadata.variant == VARIANTS.SYSTEM - or current_imported_model.requirements_data.metadata.variant == VARIANTS.EXTERNAL - ): - imported_systems = self.__import_systems( - raw_datasets=raw_datasets, parent_rd=current_imported_model.requirements_data - ) - - self._parsing_graph[current_urn].extend(imported_systems) + self._parsing_graph[current_urn].extend([(u, "import") for u in imported_systems]) self.__level -= 1 @@ -243,17 +239,10 @@ def __parse_source(self, current_location_handler: LocationResolver) -> RawDatas else: logging.info(f"{requirements_indata.dst_path}") - if self.__initial_location_handler is current_location_handler: - self.__initial_source_type = rmg.requirements_data.metadata.variant - - if ( - rmg.requirements_data.metadata.variant == VARIANTS.SYSTEM - or rmg.requirements_data.metadata.variant == VARIANTS.MICROSERVICE - ): - # parse file sources other than requirements.yml - annotations_data, svcs_data, automated_tests, mvrs_data = self.__parse_source_other( - actual_tmp_path, requirements_indata, rmg - ) + # parse file sources other than requirements.yml + annotations_data, svcs_data, automated_tests, mvrs_data = self.__parse_source_other( + actual_tmp_path, requirements_indata, rmg + ) raw_dataset = RawDataset( requirements_data=rmg.requirements_data, @@ -309,8 +298,4 @@ def __parse_source_other( uri=requirements_indata.requirements_indata_paths.annotations_yml.path, urn=current_urn ).model - # requirement annotations (impls) - only for microservices - if rmg.requirements_data.metadata.variant != VARIANTS.MICROSERVICE: - assert not annotations_data.implementations - return annotations_data, svcs_data, automated_tests, mvrs_data diff --git a/src/reqstool/model_generators/requirements_model_generator.py b/src/reqstool/model_generators/requirements_model_generator.py index 288b318c..5a93199d 100644 --- a/src/reqstool/model_generators/requirements_model_generator.py +++ b/src/reqstool/model_generators/requirements_model_generator.py @@ -99,23 +99,11 @@ def __generate( r_requirements: Dict[str, RequirementData] = {} r_filters: Dict[str, RequirementFilter] = {} - match r_metadata.variant: - case VARIANTS.SYSTEM: - self.prefix_with_urn = False - r_imports = self.__parse_imports(validated.root) - r_filters = self.__parse_requirement_filters(data=data) - r_implementations = self.__parse_implementations(validated.root) - r_requirements = self.__parse_requirements(validated.root, data=data) - case VARIANTS.MICROSERVICE: - self.prefix_with_urn = False - r_imports = self.__parse_imports(validated.root) - r_filters = self.__parse_requirement_filters(data=data) - r_requirements = self.__parse_requirements(validated.root, data=data) - case VARIANTS.EXTERNAL: - self.prefix_with_urn = False - r_requirements = self.__parse_requirements(validated.root, data=data) - case _: - raise RuntimeError("Unsupported system type") + self.prefix_with_urn = False + r_imports = self.__parse_imports(validated.root) + r_filters = self.__parse_requirement_filters(data=data) + r_implementations = self.__parse_implementations(validated.root) + r_requirements = self.__parse_requirements(validated.root, data=data) return RequirementsData( metadata=r_metadata, @@ -127,7 +115,7 @@ def __generate( def __parse_metadata(self, model): r_urn: str = model.metadata.urn - r_variant: VARIANTS = VARIANTS(model.metadata.variant.value) + r_variant = VARIANTS(model.metadata.variant.value) if model.metadata.variant else None r_title: str = model.metadata.title r_url: str = model.metadata.url diff --git a/src/reqstool/models/generated/requirements_schema.py b/src/reqstool/models/generated/requirements_schema.py index a786d786..c696cc13 100644 --- a/src/reqstool/models/generated/requirements_schema.py +++ b/src/reqstool/models/generated/requirements_schema.py @@ -33,7 +33,7 @@ class Metadata(BaseModel): """ Unique resource name """ - variant: Variant + variant: Variant | None = None """ Enum of system, microservice, or external """ diff --git a/src/reqstool/models/raw_datasets.py b/src/reqstool/models/raw_datasets.py index 7d1b7117..b3a10cc3 100644 --- a/src/reqstool/models/raw_datasets.py +++ b/src/reqstool/models/raw_datasets.py @@ -1,6 +1,6 @@ # Copyright © LFV -from typing import Dict, List, Optional +from typing import Dict, List, Optional, Tuple from pydantic import BaseModel, ConfigDict, Field @@ -30,5 +30,5 @@ class CombinedRawDataset(BaseModel): initial_model_urn: str urn_parsing_order: List[str] = Field(default_factory=list) - parsing_graph: Dict[str, List[str]] = Field(default_factory=dict) + parsing_graph: Dict[str, List[Tuple[str, str]]] = Field(default_factory=dict) raw_datasets: Dict[str, RawDataset] = Field(default_factory=dict) diff --git a/src/reqstool/models/requirements.py b/src/reqstool/models/requirements.py index c46c326b..b1b8b334 100644 --- a/src/reqstool/models/requirements.py +++ b/src/reqstool/models/requirements.py @@ -101,7 +101,7 @@ class MetaData(BaseModel): model_config = ConfigDict(frozen=True) urn: str - variant: VARIANTS + variant: Optional[VARIANTS] = None title: str url: Optional[str] = None diff --git a/src/reqstool/resources/schemas/v1/requirements.schema.json b/src/reqstool/resources/schemas/v1/requirements.schema.json index e406b006..2a3459b6 100644 --- a/src/reqstool/resources/schemas/v1/requirements.schema.json +++ b/src/reqstool/resources/schemas/v1/requirements.schema.json @@ -30,35 +30,6 @@ "required": [ "metadata" ], - "anyOf": [ - { - "if": { - "properties": { - "metadata": { - "properties": { - "variant": { - "anyOf": [ - { - "const": "microservice" - }, - { - "const": "external" - } - ] - } - } - } - } - }, - "then": { - "not": { - "required": [ - "implementations" - ] - } - } - } - ], "$defs": { "metadata": { "type": "object", @@ -89,7 +60,6 @@ }, "required": [ "urn", - "variant", "title" ] }, diff --git a/src/reqstool/storage/database.py b/src/reqstool/storage/database.py index d46ee34c..149b0e07 100644 --- a/src/reqstool/storage/database.py +++ b/src/reqstool/storage/database.py @@ -146,16 +146,22 @@ def insert_test_result(self, urn: str, fqn: str, status: TEST_RUN_STATUS) -> Non (urn, fqn, status.value), ) - def insert_parsing_graph_edge(self, parent_urn: str, child_urn: str) -> None: + def insert_parsing_graph_edge(self, parent_urn: str, child_urn: str, edge_type: str) -> None: self._conn.execute( - "INSERT OR IGNORE INTO parsing_graph (parent_urn, child_urn) VALUES (?, ?)", - (parent_urn, child_urn), + "INSERT OR IGNORE INTO parsing_graph (parent_urn, child_urn, edge_type) VALUES (?, ?, ?)", + (parent_urn, child_urn, edge_type), ) def insert_urn_metadata(self, metadata: MetaData) -> None: self._conn.execute( "INSERT INTO urn_metadata (urn, variant, title, url, parse_position) VALUES (?, ?, ?, ?, ?)", - (metadata.urn, metadata.variant.value, metadata.title, metadata.url, self._next_parse_position), + ( + metadata.urn, + metadata.variant.value if metadata.variant else None, + metadata.title, + metadata.url, + self._next_parse_position, + ), ) self._next_parse_position += 1 diff --git a/src/reqstool/storage/database_filter_processor.py b/src/reqstool/storage/database_filter_processor.py index ee1a2d4a..4c1b2267 100644 --- a/src/reqstool/storage/database_filter_processor.py +++ b/src/reqstool/storage/database_filter_processor.py @@ -7,7 +7,6 @@ from reqstool.common.models.urn_id import UrnId from reqstool.filters.id_filters import IDFilters from reqstool.models.raw_datasets import RawDataset -from reqstool.models.requirements import VARIANTS from reqstool.storage.database import RequirementsDatabase from reqstool.storage.el_to_sql_compiler import ELToSQLCompiler @@ -50,8 +49,8 @@ def _process_req_filters_per_urn(self, urn: str) -> tuple[set[UrnId], set[UrnId] kept_imports: set[UrnId] = set() filtered_out_imports: set[UrnId] = set() - for import_urn in self._parsing_graph.get(urn, []): - if self._raw_datasets[import_urn].requirements_data.metadata.variant == VARIANTS.MICROSERVICE: + for import_urn, edge_type in self._parsing_graph.get(urn, []): + if edge_type == "implementation": continue kept_per_import, filtered_per_import = self._process_req_filters_per_urn(import_urn) @@ -101,8 +100,8 @@ def _process_svc_filters_per_urn(self, urn: str) -> tuple[set[UrnId], set[UrnId] kept_imports: set[UrnId] = set() filtered_out_imports: set[UrnId] = set() - for import_urn in self._parsing_graph.get(urn, []): - if self._raw_datasets[import_urn].requirements_data.metadata.variant == VARIANTS.MICROSERVICE: + for import_urn, edge_type in self._parsing_graph.get(urn, []): + if edge_type == "implementation": continue kept_per_import, filtered_per_import = self._process_svc_filters_per_urn(import_urn) @@ -243,13 +242,13 @@ def _check_filter_refs(self, id_filter: IDFilters, accessible: set[UrnId]) -> No if uid not in accessible: logger.warning(f"Cannot exclude: {uid} does not exist or is not accessible") - def _load_parsing_graph(self) -> dict[str, list[str]]: - graph: dict[str, list[str]] = {} - rows = self._db.connection.execute("SELECT parent_urn, child_urn FROM parsing_graph").fetchall() + def _load_parsing_graph(self) -> dict[str, list[tuple[str, str]]]: + graph: dict[str, list[tuple[str, str]]] = {} + rows = self._db.connection.execute("SELECT parent_urn, child_urn, edge_type FROM parsing_graph").fetchall() # Initialize all URNs as keys (including leaves with no children) all_urns = {row["urn"] for row in self._db.connection.execute("SELECT urn FROM urn_metadata").fetchall()} for urn in all_urns: graph[urn] = [] for row in rows: - graph.setdefault(row["parent_urn"], []).append(row["child_urn"]) + graph.setdefault(row["parent_urn"], []).append((row["child_urn"], row["edge_type"])) return graph diff --git a/src/reqstool/storage/schema.py b/src/reqstool/storage/schema.py index add89de4..197d4d77 100644 --- a/src/reqstool/storage/schema.py +++ b/src/reqstool/storage/schema.py @@ -114,12 +114,13 @@ CREATE TABLE IF NOT EXISTS parsing_graph ( parent_urn TEXT NOT NULL, child_urn TEXT NOT NULL, + edge_type TEXT NOT NULL CHECK (edge_type IN ('import', 'implementation')), PRIMARY KEY (parent_urn, child_urn) ); CREATE TABLE IF NOT EXISTS urn_metadata ( urn TEXT NOT NULL PRIMARY KEY, - variant TEXT NOT NULL CHECK (variant IN ('system', 'microservice', 'external')), + variant TEXT CHECK (variant IN ('system', 'microservice', 'external')), title TEXT NOT NULL, url TEXT, parse_position INTEGER NOT NULL UNIQUE diff --git a/tests/unit/reqstool/storage/test_database.py b/tests/unit/reqstool/storage/test_database.py index 960c0810..72210d10 100644 --- a/tests/unit/reqstool/storage/test_database.py +++ b/tests/unit/reqstool/storage/test_database.py @@ -254,8 +254,8 @@ def test_insert_test_result(db): def test_insert_parsing_graph_edge(db): - db.insert_parsing_graph_edge("sys-001", "ms-001") - db.insert_parsing_graph_edge("sys-001", "ms-002") + db.insert_parsing_graph_edge("sys-001", "ms-001", "implementation") + db.insert_parsing_graph_edge("sys-001", "ms-002", "implementation") rows = db.connection.execute("SELECT child_urn FROM parsing_graph WHERE parent_urn = 'sys-001'").fetchall() children = {row["child_urn"] for row in rows} @@ -263,8 +263,8 @@ def test_insert_parsing_graph_edge(db): def test_insert_duplicate_parsing_graph_edge_ignored(db): - db.insert_parsing_graph_edge("sys-001", "ms-001") - db.insert_parsing_graph_edge("sys-001", "ms-001") + db.insert_parsing_graph_edge("sys-001", "ms-001", "implementation") + db.insert_parsing_graph_edge("sys-001", "ms-001", "implementation") count = db.connection.execute("SELECT COUNT(*) FROM parsing_graph").fetchone()[0] assert count == 1 diff --git a/tests/unit/reqstool/storage/test_database_filter_processor.py b/tests/unit/reqstool/storage/test_database_filter_processor.py index 57c74534..397f4b90 100644 --- a/tests/unit/reqstool/storage/test_database_filter_processor.py +++ b/tests/unit/reqstool/storage/test_database_filter_processor.py @@ -66,7 +66,7 @@ def _setup_db_with_raw_datasets(raw_datasets, parsing_graph, initial_urn): for parent, children in parsing_graph.items(): for child in children: - db.insert_parsing_graph_edge(parent, child) + db.insert_parsing_graph_edge(parent, child, "import") return db diff --git a/tests/unit/reqstool/storage/test_requirements_repository.py b/tests/unit/reqstool/storage/test_requirements_repository.py index 57ae942c..155b4d8e 100644 --- a/tests/unit/reqstool/storage/test_requirements_repository.py +++ b/tests/unit/reqstool/storage/test_requirements_repository.py @@ -99,7 +99,7 @@ def test_get_import_graph(db): _setup_metadata(db, "ms-001") m2 = MetaData(urn="sys-001", variant=VARIANTS.SYSTEM, title="System") db.insert_urn_metadata(m2) - db.insert_parsing_graph_edge("ms-001", "sys-001") + db.insert_parsing_graph_edge("ms-001", "sys-001", "import") db.commit() repo = RequirementsRepository(db) From 039f9c40be04d8eacfcb577fa59f81866f147d4b Mon Sep 17 00:00:00 2001 From: Jimisola Laursen Date: Thu, 19 Mar 2026 13:16:16 +0100 Subject: [PATCH 2/6] test: add circular import test and update design doc (follow-up #324) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add test_circular_import_raises with fixture data (node-a ↔ node-b cycle) - Add Q3 answer to PLAN_remove_variants.md (presence-based parsing for all URNs) Signed-off-by: Jimisola Laursen --- docs/PLAN_remove_variants.md | 10 ++++++++++ .../test_circular_import/node-a/requirements.yml | 7 +++++++ .../test_circular_import/node-b/requirements.yml | 7 +++++++ .../test_combined_raw_datasets_generator.py | 14 +++++++++++++- 4 files changed, 37 insertions(+), 1 deletion(-) create mode 100644 tests/resources/test_data/data/local/test_circular_import/node-a/requirements.yml create mode 100644 tests/resources/test_data/data/local/test_circular_import/node-b/requirements.yml diff --git a/docs/PLAN_remove_variants.md b/docs/PLAN_remove_variants.md index f006a6ef..4fb2aa99 100644 --- a/docs/PLAN_remove_variants.md +++ b/docs/PLAN_remove_variants.md @@ -145,3 +145,13 @@ Where should circular import detection trigger? Cycles can only occur going up the import chain (A imports B imports A). Implementation edges point downward and are not recursed into, so they cannot form cycles. + +--- + +## Q3: Should reqs, SVCs, MVRs, annotations, and test results be parsed for ALL URNs? + +**Answer: Yes — presence-based, regardless of role.** + +Current `main` already parses all auxiliary files unconditionally. If a URN provides `svcs.yml`, +it gets parsed whether the URN is the initial source, an import parent, or an implementation child. +The filter processor and reporting layer decide what's relevant for the initial URN's scope. diff --git a/tests/resources/test_data/data/local/test_circular_import/node-a/requirements.yml b/tests/resources/test_data/data/local/test_circular_import/node-a/requirements.yml new file mode 100644 index 00000000..79663aa3 --- /dev/null +++ b/tests/resources/test_data/data/local/test_circular_import/node-a/requirements.yml @@ -0,0 +1,7 @@ +metadata: + urn: node-a + title: Node A Requirements + +imports: + local: + - path: ../node-b diff --git a/tests/resources/test_data/data/local/test_circular_import/node-b/requirements.yml b/tests/resources/test_data/data/local/test_circular_import/node-b/requirements.yml new file mode 100644 index 00000000..4342a8a2 --- /dev/null +++ b/tests/resources/test_data/data/local/test_circular_import/node-b/requirements.yml @@ -0,0 +1,7 @@ +metadata: + urn: node-b + title: Node B Requirements + +imports: + local: + - path: ../node-a diff --git a/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py b/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py index f64ede7d..1e7dbc7e 100644 --- a/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py +++ b/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py @@ -3,7 +3,7 @@ import pytest from reqstool_python_decorators.decorators.decorators import SVCs -from reqstool.common.exceptions import MissingRequirementsFileError +from reqstool.common.exceptions import CircularImportError, MissingRequirementsFileError from reqstool.common.validator_error_holder import ValidationErrorHolder from reqstool.common.validators.semantic_validator import SemanticValidator from reqstool.locations.local_location import LocalLocation @@ -84,3 +84,15 @@ def test_missing_requirements_file(local_testdata_resources_rootdir_w_path): semantic_validator=semantic_validator, ) assert "this/path/does/not/have/a/requirements/file" in str(excinfo.value) + + +@SVCs("SVC_020") +def test_circular_import_raises(local_testdata_resources_rootdir_w_path): + semantic_validator = SemanticValidator(validation_error_holder=ValidationErrorHolder()) + with pytest.raises(CircularImportError) as excinfo: + combined_raw_datasets_generator.CombinedRawDatasetsGenerator( + initial_location=LocalLocation(path=local_testdata_resources_rootdir_w_path("test_circular_import/node-a")), + semantic_validator=semantic_validator, + ) + assert "node-a" in str(excinfo.value) + assert "Circular import detected" in str(excinfo.value) From 3d37f0fc266a4152da01bf7eb06a6b129a4ed850 Mon Sep 17 00:00:00 2001 From: Jimisola Laursen Date: Thu, 19 Mar 2026 22:48:37 +0100 Subject: [PATCH 3/6] feat: scoped two-phase traversal, recursive implementations, post-parse cleanup (closes #73, closes #324) - Implementation chains are now traversed recursively (library-uses-library model) replacing the flat leaf-node approach - Add CircularImplementationError with cycle detection for implementation chains - Add DatabaseFilterProcessor._remove_implementation_requirements() to exclude implementation-child requirement rows via post-parse SQL DELETE + CASCADE - Add docs/DESIGN.md capturing traversal architecture and key decisions - Update CLAUDE.md with Design Decisions section and corrected architecture description - Update docs/modules/ROOT/pages/how_it_works.adoc to reflect SQLite pipeline and two-phase traversal - Update docs/PLAN_remove_variants.md: revise Q1/Q2, add Q4-Q6 Signed-off-by: Jimisola Laursen --- CLAUDE.md | 24 +++- docs/DESIGN.md | 112 ++++++++++++++++ docs/PLAN_remove_variants.md | 95 ++++++++++--- docs/modules/ROOT/pages/how_it_works.adoc | 126 +++++++++++------- src/reqstool/common/exceptions.py | 8 ++ .../combined_raw_datasets_generator.py | 19 ++- .../storage/database_filter_processor.py | 23 ++++ .../test_circular_impl/lib-a/requirements.yml | 7 + .../test_circular_impl/lib-b/requirements.yml | 7 + .../lib-a/requirements.yml | 15 +++ .../lib-b/requirements.yml | 15 +++ .../lib-c/requirements.yml | 11 ++ .../test_recursive_impl/root/requirements.yml | 15 +++ .../test_combined_raw_datasets_generator.py | 37 ++++- 14 files changed, 435 insertions(+), 79 deletions(-) create mode 100644 docs/DESIGN.md create mode 100644 tests/resources/test_data/data/local/test_circular_impl/lib-a/requirements.yml create mode 100644 tests/resources/test_data/data/local/test_circular_impl/lib-b/requirements.yml create mode 100644 tests/resources/test_data/data/local/test_recursive_impl/lib-a/requirements.yml create mode 100644 tests/resources/test_data/data/local/test_recursive_impl/lib-b/requirements.yml create mode 100644 tests/resources/test_data/data/local/test_recursive_impl/lib-c/requirements.yml create mode 100644 tests/resources/test_data/data/local/test_recursive_impl/root/requirements.yml diff --git a/CLAUDE.md b/CLAUDE.md index a5240eba..e9a94e14 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -47,12 +47,9 @@ The pipeline flows: **Location** → **parse** → **RawDataset** (transient) Abstractions for where source data lives. Implementations: `LocalLocation`, `GitLocation`, `MavenLocation`, `PypiLocation`. Each implements `_make_available_on_localdisk(dst_path)` to download/copy the source to a temp dir. `LocationResolver` (`location_resolver/`) handles relative path resolution when an import's location is relative to its parent. ### Data Ingestion (`model_generators/`, `requirements_indata/`) -`CombinedRawDatasetsGenerator` is the top-level parser. It: -1. Resolves the initial location to a local temp path (`TempDirectoryUtil`) -2. Parses `requirements.yml` → `RequirementsModelGenerator` → `RequirementsData` -3. Recursively follows `imports` (other system URNs) and `implementations` (microservice URNs) -4. For each SYSTEM/MICROSERVICE source also parses: `svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML test results -5. Each parsed `RawDataset` is immediately inserted into the in-memory SQLite database via `DatabasePopulator` +`CombinedRawDatasetsGenerator` is the top-level parser. It runs in two phases: +1. **Import chain** (recursive DFS): resolves location → parses `requirements.yml` → parses all auxiliary files (`svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML) → full insert into SQLite. Follows each node's own `imports:` section recursively. Cycle detection via visited set (`CircularImportError`). +2. **Implementation chain** (recursive): follows `implementations:` sections recursively (library-uses-library model, not system→microservice). Parses all files; inserts **metadata only** for `requirements.yml` — requirement rows are excluded post-parse by `DatabaseFilterProcessor`. Cycle detection via separate visited set (`CircularImplementationError`). ### Storage Layer (`storage/`) In-memory SQLite is the single source of truth after parsing: @@ -75,7 +72,7 @@ All domain objects are frozen/plain `@dataclass`s: - `TestsData` / `TestData` — JUnit XML test results - `CombinedRawDataset` — flat dict of all raw datasets + parsing graph (used during population and by `SemanticValidator`) -Variants (defined in `requirements.yml` metadata): `SYSTEM`, `MICROSERVICE`, `EXTERNAL`. +`variant` field in `requirements.yml` metadata is optional advisory metadata (`system`, `microservice`, `external`). It is NOT a behavioral gate — parsing is presence-based. See `docs/DESIGN.md`. ### Services (`services/`) Business logic layer querying the database via `RequirementsRepository`: @@ -145,6 +142,19 @@ diff /tmp/baseline-report-demo.txt /tmp/feature-report-demo.txt If a diff is expected (e.g. the PR intentionally changes output), note it in the PR description. +## Design Decisions + +Key architectural decisions that affect how to read and modify this codebase. +Full rationale in `docs/DESIGN.md`. + +- **Traversal is two-phase**: import chain (full insert) then implementation chain (metadata-only). Do not collapse these into a single pass. +- **Implementation chains are recursive**: a library can have its own implementations (lib-a → lib-b → lib-c). Do not treat implementations as leaf nodes. +- **`variant` is not a behavioral gate**: parsing is presence-based. Do not add `if variant == X` guards anywhere in the ingestion pipeline. +- **Implementation-child requirements are excluded post-parse**: `DatabaseFilterProcessor` deletes them via SQL after ingestion. Do not filter at ingest time. +- **Cycle detection covers both chains**: `CircularImportError` for the import chain, `CircularImplementationError` for the implementation chain. +- **FK constraints scope evidence from implementation children**: SVCs/MVRs/annotations referencing out-of-scope requirements are rejected by SQLite FK checks on insert — no explicit filtering needed. +- **Test results need explicit scoping**: no FK (keyed by FQN), so a scope check is required when inserting test results from implementation children. + ## Key Conventions - **URN format**: `some:urn:string` — the separator is `:`. `UrnId` is the canonical composite key used throughout indexes. diff --git a/docs/DESIGN.md b/docs/DESIGN.md new file mode 100644 index 00000000..96e05f99 --- /dev/null +++ b/docs/DESIGN.md @@ -0,0 +1,112 @@ +# Design: Graph Traversal and Data Ingestion + +Captures architectural decisions for how `CombinedRawDatasetsGenerator` traverses the URN graph +and what data is inserted into SQLite for each node role. + +Related code: `src/reqstool/model_generators/combined_raw_datasets_generator.py`, +`src/reqstool/storage/database_filter_processor.py` + +--- + +## The graph + +A reqstool graph is a directed graph of URNs connected by two edge types: + +- **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions) +- **`implementation`** — "this URN implements my requirements" (downward, toward evidence providers) + +Example: + +``` +A1 (defines requirements) + ← imported by B1 + ← imported by C1 (initial URN — the one being reported on) + ← implemented by lib-a + ← implemented by lib-b + ← implemented by lib-c +``` + +--- + +## Two-phase traversal + +### Phase 1 — import chain (DFS, recursive) + +Traverses `imports:` sections recursively. For each node, all five data types are fully inserted: +`requirements`, `svcs`, `mvrs`, `annotations`, `test_results`. + +Order: depth-first so ancestors are inserted before their children. This matters for FK constraints +(SVCs reference requirements that must exist first). + +Cycle detection: visited set seeded with the initial URN. `CircularImportError` raised on re-entry. + +### Phase 2 — implementation chain (recursive) + +Traverses `implementations:` sections recursively. Think library-uses-library, not +system→microservice. lib-a can have its own implementations (lib-b → lib-c). + +For each node: + +| File | Action | +|------|--------| +| `requirements.yml` | Parse fully (validation runs); insert **metadata only** — skip `insert_requirement` | +| `svcs.yml` | Insert normally — FK on `req_urn/req_id` rejects rows referencing out-of-scope requirements | +| `mvrs.yml` | Insert normally — FK on `svc_urn/svc_id` rejects rows referencing out-of-scope SVCs | +| `annotations.yml` | Insert normally — FK on `req_urn/req_id` rejects out-of-scope rows | +| test results | Insert with explicit scope check — no FK, keyed by FQN | + +Cycle detection: separate visited set. `CircularImplementationError` raised on re-entry. + +Note: `imports:` sections of implementation nodes are NOT followed. An implementation's own imports +point to a different requirement scope. + +--- + +## Post-parse cleanup + +After both phases complete, `DatabaseFilterProcessor._remove_implementation_requirements()` deletes +requirement rows for nodes that are only reachable via `implementation` edges: + +```sql +DELETE FROM requirements WHERE urn IN ( + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation' + EXCEPT + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import' + EXCEPT + SELECT value FROM metadata WHERE key = 'initial_urn' +) +``` + +CASCADE handles SVCs/MVRs/annotations that only linked to those deleted requirements. +SVCs/annotations that link to in-scope requirements (from Phase 1) survive. + +**Why post-parse and not ingest-time?** ~30 lines in the filter processor vs ~150 lines +restructuring the generator and populator. The result is identical for an ephemeral in-memory DB. +The filter processor already runs a post-parse cleanup pass for user-defined `filters:` blocks — +adding structural cleanup there is consistent. + +--- + +## Why recursive implementations? + +The original design (pre-#324) treated implementations as leaf nodes based on the +system→microservice mental model. This was revised because: + +- `variant` is no longer a behavioral gate (see #324) +- A library (`lib-a`) can depend on another library (`lib-b`) which itself has implementations +- All nodes in the implementation subtree can have annotations/tests pointing to in-scope requirements +- Flat traversal silently misses evidence from lib-b, lib-c, etc. + +--- + +## Why `variant` is not a behavioral gate + +Pre-#324, `variant: system/microservice/external` controlled which YAML sections were parsed and +which files were read. This was removed because: + +- It encoded relationship role as an intrinsic property (a URN is not inherently a "microservice") +- It created a confusing 3×N matrix of allowed/disallowed sections +- It silently ignored files when the variant didn't match, causing hard-to-debug data loss +- Presence-based parsing is simpler, more predictable, and more general + +`variant` remains in the schema as optional advisory metadata for display/tooling purposes. diff --git a/docs/PLAN_remove_variants.md b/docs/PLAN_remove_variants.md index 4fb2aa99..1a2e3dd6 100644 --- a/docs/PLAN_remove_variants.md +++ b/docs/PLAN_remove_variants.md @@ -1,25 +1,26 @@ # Open Questions: Remove `variant` as Behavioral Gate (#324) -## Q1: Implementation import traversal +## Q1: Implementation traversal depth -When an implementation (microservice) is loaded, should its own `imports` section also be traversed? +Should `implementations:` sections be traversed recursively, or are implementation nodes leaves? -### Answer: No — implementations are leaf nodes (Option A) +### Answer: Recursive (revised — library-uses-library model) -The traversal is directional from the initial URN's perspective: +Initial assumption was "leaf nodes" based on a system→microservice mental model. This was revised +after removing `variant` as a behavioral gate. -- **Imports = parents (upward)**: "whose requirements do I reference?" → follow recursively -- **Implementations = children (downward)**: "who implements my requirements?" → load annotations/SVCs/tests only +Think library-uses-library: lib-a can have its own implementations (lib-b → lib-c). All nodes in +the implementation subtree can have annotations and test results pointing to in-scope requirements. +Flat traversal silently misses their evidence. -An implementation's own imports point to requirements *outside* the initial URN's parent chain. -Those are a different scope — relevant only if that other system is parsed as its own initial source. +The one constraint that remains: `imports:` sections of implementation nodes are NOT followed. +An implementation's own imports point to a different requirement scope. ``` initial-source (C1) - imports/ → recurse upward (parents: B1 → A1, A4) - implementations/ → load flat, no sub-traversal - D1 (leaf — provides annotations/SVCs/tests for C1's requirements) - D1's import of C2 is IRRELEVANT from C1's perspective + imports/ → recurse upward (parents: B1 → A1, A4) — full insert + implementations/ → recurse downward (lib-a → lib-b → lib-c) — metadata-only insert + lib-a's own imports: NOT followed (different scope) ``` --- @@ -139,12 +140,17 @@ Hypothetical: D1 has an import back to A1. ## Q2: Cycle detection scope -Where should circular import detection trigger? +Where should circular dependency detection trigger? -### Answer: Import chain only (Option A) +### Answer: Both import and implementation chains (revised) -Cycles can only occur going up the import chain (A imports B imports A). -Implementation edges point downward and are not recursed into, so they cannot form cycles. +Originally: import chain only, since implementations were leaves. + +After revising Q1 to recursive implementations: implementation chains can also cycle +(lib-a → lib-b → lib-a). Both chains need independent visited sets and raise distinct errors: + +- `CircularImportError` — detected in `__import_systems` +- `CircularImplementationError` — detected in `__import_implementations` --- @@ -152,6 +158,57 @@ Implementation edges point downward and are not recursed into, so they cannot fo **Answer: Yes — presence-based, regardless of role.** -Current `main` already parses all auxiliary files unconditionally. If a URN provides `svcs.yml`, -it gets parsed whether the URN is the initial source, an import parent, or an implementation child. -The filter processor and reporting layer decide what's relevant for the initial URN's scope. +All auxiliary files are parsed for every node based purely on file presence. The insertion rules +differ by phase (see Q4), but parsing always runs — including validation. + +--- + +## Q4: What data is inserted for implementation children? + +**Answer: Metadata only for requirements; all other files via FK-scoped insert.** + +| File | Import parent | Implementation child | +|------|--------------|----------------------| +| `requirements.yml` | full insert | metadata only (skip `insert_requirement`) | +| `svcs.yml` | full insert | insert — FK rejects rows referencing out-of-scope requirements | +| `mvrs.yml` | full insert | insert — FK rejects rows referencing out-of-scope SVCs | +| `annotations.yml` | full insert | insert — FK rejects rows referencing out-of-scope requirements | +| test results | full insert | insert with explicit scope check (no FK, keyed by FQN) | + +Validation still runs on `requirements.yml` for all nodes. Syntax errors in an implementation +child's file still surface — only the DB insertion is skipped. + +--- + +## Q5: How are implementation-child requirements excluded from the final scope? + +**Answer: Post-parse DELETE in `DatabaseFilterProcessor`.** + +`_remove_implementation_requirements()` runs at the start of `apply_filters()`: + +```sql +DELETE FROM requirements WHERE urn IN ( + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation' + EXCEPT + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import' + EXCEPT + SELECT value FROM metadata WHERE key = 'initial_urn' +) +``` + +CASCADE handles SVCs/MVRs/annotations that only linked to those deleted requirements. +Evidence rows linking to in-scope requirements (from Phase 1) survive. + +--- + +## Q6: Why post-parse and not ingest-time? + +**Answer: ~30 lines vs ~150 lines; identical result for an ephemeral in-memory DB.** + +Ingest-time filtering would require restructuring `CombinedRawDatasetsGenerator` into two explicit +phases with scope-aware population logic (~150 lines across 3–4 files). Post-parse is a single SQL +DELETE in the filter processor (~30 lines). The filter processor already runs a post-parse cleanup +pass for user-defined `filters:` blocks — adding structural cleanup there is consistent. + +Since the DB is in-memory and ephemeral (never persisted), the transient presence of +implementation-child requirements has no observable effect beyond the filter step. diff --git a/docs/modules/ROOT/pages/how_it_works.adoc b/docs/modules/ROOT/pages/how_it_works.adoc index 4c92f39f..2eaa117e 100644 --- a/docs/modules/ROOT/pages/how_it_works.adoc +++ b/docs/modules/ROOT/pages/how_it_works.adoc @@ -1,81 +1,105 @@ = How it Works -This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts]. +This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts]. For detailed architectural decisions, see the link:https://github.com/reqstool/reqstool-client/blob/main/docs/DESIGN.md[DESIGN.md] in the repository. -== Template generation +== Pipeline overview -The CombinedIndexedDatasetGenerator prepares the data provided from the CombinedRawDatasetsGenerator for rendering with the Jinja2 templates and is used by the ReportCommand and the StatusCommand components. +---- +Location → parse → RawDataset → INSERT into SQLite → Repository/Services → Command output +---- -== Overview of central components +Each command calls `build_database()` which: + +1. Parses all sources into the in-memory SQLite database (two-phase traversal, see below) +2. Applies filters (`DatabaseFilterProcessor`) — removes out-of-scope requirements and applies user-defined `filters:` blocks +3. Runs lifecycle validation (warns on DEPRECATED/OBSOLETE references) +4. Commands then query via `RequirementsRepository` and service layer + +== Two-phase graph traversal + +The requirement graph is a directed graph of URNs connected by two edge types: + +* **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions) +* **`implementation`** — "this URN provides evidence for my requirements" (downward, toward code/tests) + +`CombinedRawDatasetsGenerator` traverses this graph in two phases: + +=== Phase 1 — import chain (recursive DFS, full insert) + +Follows `imports:` sections recursively, depth-first. For each node, all five data types are fully inserted into SQLite: requirements, SVCs, MVRs, annotations, and test results. Cycle detection raises `CircularImportError`. + +=== Phase 2 — implementation chain (recursive, metadata-only insert) -Below is a breakdown of the central components of reqstool: +Follows `implementations:` sections recursively. Think library-uses-library — lib-a can itself have implementations (lib-b → lib-c), all of which may contribute test evidence for the initial URN's requirements. + +For each implementation node, `requirements.yml` is parsed (validation runs) but only the URN metadata is inserted — the requirement rows are excluded. All other files (SVCs, MVRs, annotations, test results) are inserted normally; SQLite FK constraints automatically discard rows that reference requirements outside the current scope. Cycle detection raises `CircularImplementationError`. + +NOTE: `imports:` sections of implementation nodes are not followed — an implementation's own imports belong to a different scope. + +== The `variant` field + +`variant: system/microservice/external` in `requirements.yml` is optional advisory metadata. It is not a behavioral gate — parsing is entirely presence-based. If a file exists, it is read. If a section exists in YAML, it is parsed. + +== Overview of central components [plantuml,format=svg] .... @startuml !include -Component(StatusCommand, "StatusCommand", "Processes status command") -Component(GenerateJsonCommand, "GenerateJsonCommand", "Generates JSON from imported Models") -Component(ReportCommand, "ReportCommand", "Generates reports") -Component(SemanticValidator, "SemanticValidator", "Validates data read from source") -Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Generates imported models") -Component(reqstoolConfig, "reqstoolConfig", "Resolves paths to yaml files") -Component(CombinedIndexedDatasetGenerator, "CombinedIndexedDatasetGenerator", "Prepares data for rendering of Jinja2 templates") -Component(Command, "Command", "Handles user commands") - -Rel(Command, StatusCommand, "Uses") -Rel(Command, GenerateJsonCommand, "Uses") -Rel(Command, ReportCommand, "Uses") -Rel(CombinedRawDatasetsGenerator, SemanticValidator, "Depends on") -Rel_Right(CombinedRawDatasetsGenerator, reqstoolConfig, "Uses") -Rel(StatusCommand, CombinedRawDatasetsGenerator, "Uses") -Rel(GenerateJsonCommand, CombinedRawDatasetsGenerator, "Uses") -Rel(ReportCommand, CombinedRawDatasetsGenerator, "Uses") -Rel(ReportCommand, CombinedIndexedDatasetGenerator, "Uses") -Rel(StatusCommand, CombinedIndexedDatasetGenerator, "Uses") - -Rel_Down(CombinedRawDatasetsGenerator, CombinedIndexedDatasetGenerator, "Provides data to") +Component(Command, "Command", "Handles user commands (status, report, export)") +Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Two-phase graph traversal and SQLite population") +Component(DatabaseFilterProcessor, "DatabaseFilterProcessor", "Post-parse requirement/SVC filtering") +Component(RequirementsRepository, "RequirementsRepository", "Data access layer over SQLite") +Component(StatisticsService, "StatisticsService", "Computes per-requirement status and totals") +Component(SemanticValidator, "SemanticValidator", "Cross-reference validation") +Component(SQLiteDB, "SQLite (in-memory)", "Single source of truth after parsing") + +Rel(Command, CombinedRawDatasetsGenerator, "build_database()") +Rel(CombinedRawDatasetsGenerator, SQLiteDB, "INSERT") +Rel(CombinedRawDatasetsGenerator, SemanticValidator, "validate_post_parsing()") +Rel(DatabaseFilterProcessor, SQLiteDB, "DELETE (filters)") +Rel(RequirementsRepository, SQLiteDB, "SELECT") +Rel(StatisticsService, RequirementsRepository, "queries") +Rel(Command, StatisticsService, "Uses") +Rel(Command, RequirementsRepository, "Uses") @enduml .... -== Sequence diagram of the program execution +== Sequence diagram -Below is an example to illustrate how reqstool parses data from the initial source. +Below illustrates how reqstool processes the `status` command against an initial source that imports a parent system. [plantuml,format=svg] .... @startuml !include -Person(user, "User", "", "") - +Person(user, "User") Container(reqsTool, "reqstool") -Container_Boundary(b, "Requirement files") - Container_Boundary(b1, "MS-001") - Component(reqs, "Requirements", "Requirements.yml") - Component(svcs, "SVCS", "software_verification_cases.yml") - Component(mvrs, "MVRS", "manual_verification_results.yml") - Component(annot_impls,"Implementations", "requirements_annotations.yml") - Component(annot_tests,"Automated tests", "svcs_annotations.yml") - Boundary_End() - Container_Boundary(b2, "Ext-001") - Component(reqs_ext, "Requirements", "Requirements.yml") - Boundary_End() +Container_Boundary(phase1, "Phase 1 — import chain") + Component(initial_reqs, "initial/requirements.yml") + Component(initial_svcs, "initial/svcs.yml") + Component(parent_reqs, "parent/requirements.yml") +Boundary_End() + +Container_Boundary(phase2, "Phase 2 — implementation chain") + Component(impl_reqs, "lib-a/requirements.yml") + Component(impl_svcs, "lib-a/svcs.yml") + Component(impl_tests, "lib-a/test results") Boundary_End() -Rel(user, reqsTool, "Submit command", "bash") -Rel(reqsTool, reqs, "Reads requirements") -Rel(reqsTool, svcs, "Reads svcs") -Rel(reqsTool, mvrs, "Reads mvrs") -Rel(reqsTool, annot_impls, "Reads impls annotations") -Rel(reqsTool, annot_tests, "Reads test annotations") -Rel(reqsTool, reqsTool, "Create imported model") -Rel(reqsTool, reqs_ext, "Reads imported requirements") -Rel(reqsTool, reqsTool, "Create imported model") -Rel(reqsTool, user, "Returns combined data based on imported") +Rel(user, reqsTool, "reqstool status local -p ./initial") +Rel(reqsTool, initial_reqs, "parse + full insert") +Rel(reqsTool, initial_svcs, "parse + full insert") +Rel(reqsTool, parent_reqs, "parse + full insert (recursive)") +Rel(reqsTool, impl_reqs, "parse + metadata only") +Rel(reqsTool, impl_svcs, "parse + FK-scoped insert") +Rel(reqsTool, impl_tests, "parse + scoped insert") +Rel(reqsTool, reqsTool, "post-parse: delete impl-child requirements") +Rel(reqsTool, user, "status table (exit code = unmet requirements)") @enduml .... diff --git a/src/reqstool/common/exceptions.py b/src/reqstool/common/exceptions.py index 8f7b8e88..48a294e5 100644 --- a/src/reqstool/common/exceptions.py +++ b/src/reqstool/common/exceptions.py @@ -15,3 +15,11 @@ class CircularImportError(Exception): def __init__(self, urn: str, chain: list[str]): self.urn = urn super().__init__(f"Circular import detected: {' -> '.join(chain)} -> {urn}") + + +class CircularImplementationError(Exception): + """Raised when a circular implementation chain is detected in the requirements graph.""" + + def __init__(self, urn: str, chain: list[str]): + self.urn = urn + super().__init__(f"Circular implementation detected: {' -> '.join(chain)} -> {urn}") diff --git a/src/reqstool/model_generators/combined_raw_datasets_generator.py b/src/reqstool/model_generators/combined_raw_datasets_generator.py index c50c17fd..63b9b04c 100644 --- a/src/reqstool/model_generators/combined_raw_datasets_generator.py +++ b/src/reqstool/model_generators/combined_raw_datasets_generator.py @@ -6,7 +6,7 @@ from reqstool_python_decorators.decorators.decorators import Requirements -from reqstool.common.exceptions import CircularImportError, MissingRequirementsFileError +from reqstool.common.exceptions import CircularImplementationError, CircularImportError, MissingRequirementsFileError from reqstool.common.utils import TempDirectoryUtil, Utils from reqstool.common.validators.semantic_validator import SemanticValidator from reqstool.location_resolver.location_resolver import LocationResolver @@ -193,7 +193,11 @@ def __import_implementations( self, raw_datasets: Dict[str, RawDataset], implementations: List[ImplementationDataInterface], + visited: Optional[Set[str]] = None, ) -> List[str]: + if visited is None: + visited = set() + parsed_urns: List[str] = [] self.__level += 1 @@ -201,12 +205,25 @@ def __import_implementations( parsed_model = self.__parse_source(current_location_handler=implementation) current_urn = parsed_model.requirements_data.metadata.urn + if current_urn in visited: + raise CircularImplementationError(current_urn, list(visited)) + + visited.add(current_urn) + # add urn to parsing_order_list self._parsing_order.append(current_urn) parsed_urns.append(current_urn) raw_datasets[current_urn] = parsed_model + # recurse into this implementation's own implementations + sub_impls = parsed_model.requirements_data.implementations + if sub_impls: + sub_urns = self.__import_implementations(raw_datasets, sub_impls, visited) + self._parsing_graph[current_urn].extend([(u, "implementation") for u in sub_urns]) + for sub_urn in sub_urns: + self._parsing_graph[sub_urn].append((current_urn, "implementation")) + self.__level -= 1 return parsed_urns diff --git a/src/reqstool/storage/database_filter_processor.py b/src/reqstool/storage/database_filter_processor.py index 4c1b2267..c9c656f4 100644 --- a/src/reqstool/storage/database_filter_processor.py +++ b/src/reqstool/storage/database_filter_processor.py @@ -20,6 +20,8 @@ def __init__(self, db: RequirementsDatabase, raw_datasets: dict[str, RawDataset] self._parsing_graph = self._load_parsing_graph() def apply_filters(self) -> None: + self._remove_implementation_requirements() + initial_urn = self._db.get_metadata("initial_urn") self._apply_req_filters(initial_urn) @@ -27,6 +29,27 @@ def apply_filters(self) -> None: self._db.set_metadata("filtered", "true") + def _remove_implementation_requirements(self) -> None: + """Delete requirements from nodes only reachable via implementation edges. + + Nodes reachable only via implementation edges are evidence contributors, not + requirement definers. Their requirements are out of scope from the initial URN's + perspective. CASCADE removes SVCs/MVRs/annotations that only linked to those + requirements; evidence rows linking to in-scope requirements survive. + """ + self._db.connection.execute( + """ + DELETE FROM requirements WHERE urn IN ( + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation' + EXCEPT + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import' + EXCEPT + SELECT value FROM metadata WHERE key = 'initial_urn' + ) + """ + ) + self._db.connection.commit() + # -- Requirement filters -- def _apply_req_filters(self, initial_urn: str) -> None: diff --git a/tests/resources/test_data/data/local/test_circular_impl/lib-a/requirements.yml b/tests/resources/test_data/data/local/test_circular_impl/lib-a/requirements.yml new file mode 100644 index 00000000..225d33d2 --- /dev/null +++ b/tests/resources/test_data/data/local/test_circular_impl/lib-a/requirements.yml @@ -0,0 +1,7 @@ +metadata: + urn: lib-a + title: Library A + +implementations: + local: + - path: ../lib-b diff --git a/tests/resources/test_data/data/local/test_circular_impl/lib-b/requirements.yml b/tests/resources/test_data/data/local/test_circular_impl/lib-b/requirements.yml new file mode 100644 index 00000000..7983c1c1 --- /dev/null +++ b/tests/resources/test_data/data/local/test_circular_impl/lib-b/requirements.yml @@ -0,0 +1,7 @@ +metadata: + urn: lib-b + title: Library B + +implementations: + local: + - path: ../lib-a diff --git a/tests/resources/test_data/data/local/test_recursive_impl/lib-a/requirements.yml b/tests/resources/test_data/data/local/test_recursive_impl/lib-a/requirements.yml new file mode 100644 index 00000000..11238b8e --- /dev/null +++ b/tests/resources/test_data/data/local/test_recursive_impl/lib-a/requirements.yml @@ -0,0 +1,15 @@ +metadata: + urn: lib-a + title: Library A + +requirements: + - id: REQ_LA_001 + title: Library A requirement + significance: shall + description: A requirement defined in lib-a + categories: [reliability] + revision: 0.0.1 + +implementations: + local: + - path: ../lib-b diff --git a/tests/resources/test_data/data/local/test_recursive_impl/lib-b/requirements.yml b/tests/resources/test_data/data/local/test_recursive_impl/lib-b/requirements.yml new file mode 100644 index 00000000..a384d3fa --- /dev/null +++ b/tests/resources/test_data/data/local/test_recursive_impl/lib-b/requirements.yml @@ -0,0 +1,15 @@ +metadata: + urn: lib-b + title: Library B + +requirements: + - id: REQ_LB_001 + title: Library B requirement + significance: shall + description: A requirement defined in lib-b + categories: [reliability] + revision: 0.0.1 + +implementations: + local: + - path: ../lib-c diff --git a/tests/resources/test_data/data/local/test_recursive_impl/lib-c/requirements.yml b/tests/resources/test_data/data/local/test_recursive_impl/lib-c/requirements.yml new file mode 100644 index 00000000..0ffa567c --- /dev/null +++ b/tests/resources/test_data/data/local/test_recursive_impl/lib-c/requirements.yml @@ -0,0 +1,11 @@ +metadata: + urn: lib-c + title: Library C + +requirements: + - id: REQ_LC_001 + title: Library C requirement + significance: shall + description: A requirement defined in lib-c + categories: [reliability] + revision: 0.0.1 diff --git a/tests/resources/test_data/data/local/test_recursive_impl/root/requirements.yml b/tests/resources/test_data/data/local/test_recursive_impl/root/requirements.yml new file mode 100644 index 00000000..68170913 --- /dev/null +++ b/tests/resources/test_data/data/local/test_recursive_impl/root/requirements.yml @@ -0,0 +1,15 @@ +metadata: + urn: root + title: Root + +requirements: + - id: REQ_ROOT_001 + title: Root requirement + significance: shall + description: A requirement defined at the root level + categories: [reliability] + revision: 0.0.1 + +implementations: + local: + - path: ../lib-a diff --git a/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py b/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py index 1e7dbc7e..98b2d31b 100644 --- a/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py +++ b/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py @@ -3,7 +3,7 @@ import pytest from reqstool_python_decorators.decorators.decorators import SVCs -from reqstool.common.exceptions import CircularImportError, MissingRequirementsFileError +from reqstool.common.exceptions import CircularImplementationError, CircularImportError, MissingRequirementsFileError from reqstool.common.validator_error_holder import ValidationErrorHolder from reqstool.common.validators.semantic_validator import SemanticValidator from reqstool.locations.local_location import LocalLocation @@ -96,3 +96,38 @@ def test_circular_import_raises(local_testdata_resources_rootdir_w_path): ) assert "node-a" in str(excinfo.value) assert "Circular import detected" in str(excinfo.value) + + +@SVCs("SVC_020") +def test_circular_implementation_raises(local_testdata_resources_rootdir_w_path): + semantic_validator = SemanticValidator(validation_error_holder=ValidationErrorHolder()) + with pytest.raises(CircularImplementationError) as excinfo: + combined_raw_datasets_generator.CombinedRawDatasetsGenerator( + initial_location=LocalLocation(path=local_testdata_resources_rootdir_w_path("test_circular_impl/lib-a")), + semantic_validator=semantic_validator, + ) + assert "lib-a" in str(excinfo.value) + assert "Circular implementation detected" in str(excinfo.value) + + +@SVCs("SVC_001") +def test_implementation_traversal_recursive(local_testdata_resources_rootdir_w_path): + semantic_validator = SemanticValidator(validation_error_holder=ValidationErrorHolder()) + + crd: CombinedRawDataset = combined_raw_datasets_generator.CombinedRawDatasetsGenerator( + initial_location=LocalLocation(path=local_testdata_resources_rootdir_w_path("test_recursive_impl/root")), + semantic_validator=semantic_validator, + ).combined_raw_datasets + + # all four nodes are in raw_datasets (recursive traversal reached lib-b and lib-c) + assert set(crd.raw_datasets.keys()) == {"root", "lib-a", "lib-b", "lib-c"} + + # implementation nodes are parsed (requirements present in raw_datasets) + assert len(crd.raw_datasets["lib-a"].requirements_data.requirements) == 1 + assert len(crd.raw_datasets["lib-b"].requirements_data.requirements) == 1 + assert len(crd.raw_datasets["lib-c"].requirements_data.requirements) == 1 + + # implementation edges are tagged correctly in the parsing graph + assert ("lib-a", "implementation") in crd.parsing_graph["root"] + assert ("lib-b", "implementation") in crd.parsing_graph["lib-a"] + assert ("lib-c", "implementation") in crd.parsing_graph["lib-b"] From 889ecbba2d3b99fd97a3ce8cb8b636c00ac42f96 Mon Sep 17 00:00:00 2001 From: Jimisola Laursen Date: Thu, 19 Mar 2026 22:56:07 +0100 Subject: [PATCH 4/6] fix: regenerate requirements_schema.py after variant made optional datamodel-codegen no longer needs a RootModel wrapper after the variant field became optional in 5c052a2. --- src/reqstool/models/generated/requirements_schema.py | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/reqstool/models/generated/requirements_schema.py b/src/reqstool/models/generated/requirements_schema.py index c696cc13..0e2f7966 100644 --- a/src/reqstool/models/generated/requirements_schema.py +++ b/src/reqstool/models/generated/requirements_schema.py @@ -278,7 +278,7 @@ class Locations(BaseModel): """ -class Model1(BaseModel): +class Model(BaseModel): model_config = ConfigDict( extra='forbid', ) @@ -299,7 +299,3 @@ class Model1(BaseModel): """ Array of Requirements """ - - -class Model(RootModel[Model1]): - root: Model1 From 722f469f93a3ba6f3ed46160d972632b144be4a2 Mon Sep 17 00:00:00 2001 From: Jimisola Laursen Date: Thu, 19 Mar 2026 23:02:29 +0100 Subject: [PATCH 5/6] fix: remove stale .root dereference after Model flattened to BaseModel (#324) commit 889ecbb regenerated requirements_schema.py, collapsing Model(RootModel[Model1]) into a flat Model(BaseModel). The generator was still calling validated.root; replace with validated directly. Signed-off-by: Jimisola Laursen --- .../model_generators/requirements_model_generator.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/reqstool/model_generators/requirements_model_generator.py b/src/reqstool/model_generators/requirements_model_generator.py index 5a93199d..744ae99f 100644 --- a/src/reqstool/model_generators/requirements_model_generator.py +++ b/src/reqstool/model_generators/requirements_model_generator.py @@ -92,7 +92,7 @@ def __generate( validated = RequirementsPydanticModel.model_validate(data) - r_metadata: MetaData = self.__parse_metadata(validated.root) + r_metadata: MetaData = self.__parse_metadata(validated) r_implementations: List[ImplementationDataInterface] = [] r_imports: List[ImportDataInterface] = [] @@ -100,10 +100,10 @@ def __generate( r_filters: Dict[str, RequirementFilter] = {} self.prefix_with_urn = False - r_imports = self.__parse_imports(validated.root) + r_imports = self.__parse_imports(validated) r_filters = self.__parse_requirement_filters(data=data) - r_implementations = self.__parse_implementations(validated.root) - r_requirements = self.__parse_requirements(validated.root, data=data) + r_implementations = self.__parse_implementations(validated) + r_requirements = self.__parse_requirements(validated, data=data) return RequirementsData( metadata=r_metadata, From d0596f851885834469c84d8d2af2441b9e4fd3e4 Mon Sep 17 00:00:00 2001 From: Jimisola Laursen Date: Thu, 19 Mar 2026 23:05:07 +0100 Subject: [PATCH 6/6] chore: remove stale PLAN_remove_variants.md (#324) Design decisions are now documented in CLAUDE.md and DESIGN.md. Signed-off-by: Jimisola Laursen --- docs/PLAN_remove_variants.md | 214 ----------------------------------- 1 file changed, 214 deletions(-) delete mode 100644 docs/PLAN_remove_variants.md diff --git a/docs/PLAN_remove_variants.md b/docs/PLAN_remove_variants.md deleted file mode 100644 index 1a2e3dd6..00000000 --- a/docs/PLAN_remove_variants.md +++ /dev/null @@ -1,214 +0,0 @@ -# Open Questions: Remove `variant` as Behavioral Gate (#324) - -## Q1: Implementation traversal depth - -Should `implementations:` sections be traversed recursively, or are implementation nodes leaves? - -### Answer: Recursive (revised — library-uses-library model) - -Initial assumption was "leaf nodes" based on a system→microservice mental model. This was revised -after removing `variant` as a behavioral gate. - -Think library-uses-library: lib-a can have its own implementations (lib-b → lib-c). All nodes in -the implementation subtree can have annotations and test results pointing to in-scope requirements. -Flat traversal silently misses their evidence. - -The one constraint that remains: `imports:` sections of implementation nodes are NOT followed. -An implementation's own imports point to a different requirement scope. - -``` -initial-source (C1) - imports/ → recurse upward (parents: B1 → A1, A4) — full insert - implementations/ → recurse downward (lib-a → lib-b → lib-c) — metadata-only insert - lib-a's own imports: NOT followed (different scope) -``` - ---- - -## Discussion Graph - -4 layers × 4 nodes for reasoning about traversal scenarios. - -```mermaid -graph TD - subgraph A["Layer A"] - A1 - A2 - A3 - A4 - end - - subgraph B["Layer B"] - B1 - B2 - B3 - B4 - end - - subgraph C["Layer C"] - C1 - C2 - C3 - C4 - end - - subgraph D["Layer D"] - D1 - D2 - D3 - D4 - end - - A1 --> B1 - A1 --> B2 - A2 --> B2 - A2 --> B3 - A3 --> B3 - A4 --> B1 - - B1 --> C1 - B1 --> C2 - B2 --> C2 - B2 --> C3 - B3 --> C3 - B3 --> C4 - B4 --> C4 - - C1 --> D1 - C2 --> D1 - C2 --> D2 - C3 --> D2 - C3 --> D3 - C4 --> D3 - C4 --> D4 -``` - -Edges are unlabelled — overlay `import` / `implementation` semantics per scenario. - -Notable properties: -- **Shared nodes**: B2 (A1+A2), B3 (A2+A3), C2 (B1+B2), C3 (B2+B3), D1 (C1+C2), D2 (C2+C3), D3 (C3+C4) -- **Isolated paths**: A4→B1 (A4 shares B1 with A1 but has no other children); B4→C4 (B4 only reachable if explicitly listed) -- **Diamond patterns**: A1→B1→C2→D2 and A1→B2→C2→D2 (two paths to D2 via C2) - -### Scenario 1 — A1 is initial, all edges are `import` - -A1 imports B1, B2. B1 imports C1, C2. B2 imports C2, C3. Etc. - -- Traversal reaches: B1, B2, C1, C2, C3, D1, D2, D3 -- C2 is reached twice (via B1 and B2) — visited-set prevents re-parsing -- **Not reached**: A2, A3, A4, B3, B4, C4, D4 - -### Scenario 2 — B2 is initial (microservice as initial source) - -A microservice CAN have imports — currently it calls `__import_systems` on them. -B2 imports C2, C3. Those recursively import D1, D2, D3. - -- Traversal reaches: C2, C3, D1, D2, D3 -- B2 has no implementations listed -- **Not reached**: anything in layer A, B1, B3, B4, C1, C4, D4 - -This scenario confirms microservices import systems and that imports must be followed regardless of who is initial. - -### Scenario 3 — A1 is initial, B4 is an `implementation` - -A1 imports B1, B2 (→ C1-C3, D1-D3 as in Scenario 1). -A1 also lists B4 as an implementation (microservice implementing A1's requirements). - -B4 is loaded as a leaf node: -- B4's annotations/SVCs/tests are checked against A1's (+ parents') requirements -- B4 may also import C4, but that import is irrelevant from A1's perspective -- C4, D4 are not reached — and shouldn't be. They are outside A1's requirement scope. - -### Scenario 4 — A2 is initial, B2/B3 are shared with A1's graph - -If A2 is parsed after A1 in a multi-root scenario: -- A2 imports B2 (already visited), B3 (new) -- B3 imports C3 (already visited), C4 (new) -- C4 imports D3 (already visited), D4 (new) - -Visited-set handles re-entry into already-parsed nodes cleanly. - -### Scenario 5 — Cycle - -Hypothetical: D1 has an import back to A1. - -- Without detection: A1→B1→C1→D1→A1→… infinite -- With visited-set on imports: after A1 is added to visited on first entry, D1→A1 triggers `CircularImportError` -- Same logic applies if the back-edge is via an implementation edge (Q2 scope question) - ---- - -## Q2: Cycle detection scope - -Where should circular dependency detection trigger? - -### Answer: Both import and implementation chains (revised) - -Originally: import chain only, since implementations were leaves. - -After revising Q1 to recursive implementations: implementation chains can also cycle -(lib-a → lib-b → lib-a). Both chains need independent visited sets and raise distinct errors: - -- `CircularImportError` — detected in `__import_systems` -- `CircularImplementationError` — detected in `__import_implementations` - ---- - -## Q3: Should reqs, SVCs, MVRs, annotations, and test results be parsed for ALL URNs? - -**Answer: Yes — presence-based, regardless of role.** - -All auxiliary files are parsed for every node based purely on file presence. The insertion rules -differ by phase (see Q4), but parsing always runs — including validation. - ---- - -## Q4: What data is inserted for implementation children? - -**Answer: Metadata only for requirements; all other files via FK-scoped insert.** - -| File | Import parent | Implementation child | -|------|--------------|----------------------| -| `requirements.yml` | full insert | metadata only (skip `insert_requirement`) | -| `svcs.yml` | full insert | insert — FK rejects rows referencing out-of-scope requirements | -| `mvrs.yml` | full insert | insert — FK rejects rows referencing out-of-scope SVCs | -| `annotations.yml` | full insert | insert — FK rejects rows referencing out-of-scope requirements | -| test results | full insert | insert with explicit scope check (no FK, keyed by FQN) | - -Validation still runs on `requirements.yml` for all nodes. Syntax errors in an implementation -child's file still surface — only the DB insertion is skipped. - ---- - -## Q5: How are implementation-child requirements excluded from the final scope? - -**Answer: Post-parse DELETE in `DatabaseFilterProcessor`.** - -`_remove_implementation_requirements()` runs at the start of `apply_filters()`: - -```sql -DELETE FROM requirements WHERE urn IN ( - SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation' - EXCEPT - SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import' - EXCEPT - SELECT value FROM metadata WHERE key = 'initial_urn' -) -``` - -CASCADE handles SVCs/MVRs/annotations that only linked to those deleted requirements. -Evidence rows linking to in-scope requirements (from Phase 1) survive. - ---- - -## Q6: Why post-parse and not ingest-time? - -**Answer: ~30 lines vs ~150 lines; identical result for an ephemeral in-memory DB.** - -Ingest-time filtering would require restructuring `CombinedRawDatasetsGenerator` into two explicit -phases with scope-aware population logic (~150 lines across 3–4 files). Post-parse is a single SQL -DELETE in the filter processor (~30 lines). The filter processor already runs a post-parse cleanup -pass for user-defined `filters:` blocks — adding structural cleanup there is consistent. - -Since the DB is in-memory and ephemeral (never persisted), the transient presence of -implementation-child requirements has no observable effect beyond the filter step.