diff --git a/CLAUDE.md b/CLAUDE.md index a5240eba..e9a94e14 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -47,12 +47,9 @@ The pipeline flows: **Location** → **parse** → **RawDataset** (transient) Abstractions for where source data lives. Implementations: `LocalLocation`, `GitLocation`, `MavenLocation`, `PypiLocation`. Each implements `_make_available_on_localdisk(dst_path)` to download/copy the source to a temp dir. `LocationResolver` (`location_resolver/`) handles relative path resolution when an import's location is relative to its parent. ### Data Ingestion (`model_generators/`, `requirements_indata/`) -`CombinedRawDatasetsGenerator` is the top-level parser. It: -1. Resolves the initial location to a local temp path (`TempDirectoryUtil`) -2. Parses `requirements.yml` → `RequirementsModelGenerator` → `RequirementsData` -3. Recursively follows `imports` (other system URNs) and `implementations` (microservice URNs) -4. For each SYSTEM/MICROSERVICE source also parses: `svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML test results -5. Each parsed `RawDataset` is immediately inserted into the in-memory SQLite database via `DatabasePopulator` +`CombinedRawDatasetsGenerator` is the top-level parser. It runs in two phases: +1. **Import chain** (recursive DFS): resolves location → parses `requirements.yml` → parses all auxiliary files (`svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML) → full insert into SQLite. Follows each node's own `imports:` section recursively. Cycle detection via visited set (`CircularImportError`). +2. **Implementation chain** (recursive): follows `implementations:` sections recursively (library-uses-library model, not system→microservice). Parses all files; inserts **metadata only** for `requirements.yml` — requirement rows are excluded post-parse by `DatabaseFilterProcessor`. Cycle detection via separate visited set (`CircularImplementationError`). ### Storage Layer (`storage/`) In-memory SQLite is the single source of truth after parsing: @@ -75,7 +72,7 @@ All domain objects are frozen/plain `@dataclass`s: - `TestsData` / `TestData` — JUnit XML test results - `CombinedRawDataset` — flat dict of all raw datasets + parsing graph (used during population and by `SemanticValidator`) -Variants (defined in `requirements.yml` metadata): `SYSTEM`, `MICROSERVICE`, `EXTERNAL`. +`variant` field in `requirements.yml` metadata is optional advisory metadata (`system`, `microservice`, `external`). It is NOT a behavioral gate — parsing is presence-based. See `docs/DESIGN.md`. ### Services (`services/`) Business logic layer querying the database via `RequirementsRepository`: @@ -145,6 +142,19 @@ diff /tmp/baseline-report-demo.txt /tmp/feature-report-demo.txt If a diff is expected (e.g. the PR intentionally changes output), note it in the PR description. +## Design Decisions + +Key architectural decisions that affect how to read and modify this codebase. +Full rationale in `docs/DESIGN.md`. + +- **Traversal is two-phase**: import chain (full insert) then implementation chain (metadata-only). Do not collapse these into a single pass. +- **Implementation chains are recursive**: a library can have its own implementations (lib-a → lib-b → lib-c). Do not treat implementations as leaf nodes. +- **`variant` is not a behavioral gate**: parsing is presence-based. Do not add `if variant == X` guards anywhere in the ingestion pipeline. +- **Implementation-child requirements are excluded post-parse**: `DatabaseFilterProcessor` deletes them via SQL after ingestion. Do not filter at ingest time. +- **Cycle detection covers both chains**: `CircularImportError` for the import chain, `CircularImplementationError` for the implementation chain. +- **FK constraints scope evidence from implementation children**: SVCs/MVRs/annotations referencing out-of-scope requirements are rejected by SQLite FK checks on insert — no explicit filtering needed. +- **Test results need explicit scoping**: no FK (keyed by FQN), so a scope check is required when inserting test results from implementation children. + ## Key Conventions - **URN format**: `some:urn:string` — the separator is `:`. `UrnId` is the canonical composite key used throughout indexes. diff --git a/docs/DESIGN.md b/docs/DESIGN.md new file mode 100644 index 00000000..96e05f99 --- /dev/null +++ b/docs/DESIGN.md @@ -0,0 +1,112 @@ +# Design: Graph Traversal and Data Ingestion + +Captures architectural decisions for how `CombinedRawDatasetsGenerator` traverses the URN graph +and what data is inserted into SQLite for each node role. + +Related code: `src/reqstool/model_generators/combined_raw_datasets_generator.py`, +`src/reqstool/storage/database_filter_processor.py` + +--- + +## The graph + +A reqstool graph is a directed graph of URNs connected by two edge types: + +- **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions) +- **`implementation`** — "this URN implements my requirements" (downward, toward evidence providers) + +Example: + +``` +A1 (defines requirements) + ← imported by B1 + ← imported by C1 (initial URN — the one being reported on) + ← implemented by lib-a + ← implemented by lib-b + ← implemented by lib-c +``` + +--- + +## Two-phase traversal + +### Phase 1 — import chain (DFS, recursive) + +Traverses `imports:` sections recursively. For each node, all five data types are fully inserted: +`requirements`, `svcs`, `mvrs`, `annotations`, `test_results`. + +Order: depth-first so ancestors are inserted before their children. This matters for FK constraints +(SVCs reference requirements that must exist first). + +Cycle detection: visited set seeded with the initial URN. `CircularImportError` raised on re-entry. + +### Phase 2 — implementation chain (recursive) + +Traverses `implementations:` sections recursively. Think library-uses-library, not +system→microservice. lib-a can have its own implementations (lib-b → lib-c). + +For each node: + +| File | Action | +|------|--------| +| `requirements.yml` | Parse fully (validation runs); insert **metadata only** — skip `insert_requirement` | +| `svcs.yml` | Insert normally — FK on `req_urn/req_id` rejects rows referencing out-of-scope requirements | +| `mvrs.yml` | Insert normally — FK on `svc_urn/svc_id` rejects rows referencing out-of-scope SVCs | +| `annotations.yml` | Insert normally — FK on `req_urn/req_id` rejects out-of-scope rows | +| test results | Insert with explicit scope check — no FK, keyed by FQN | + +Cycle detection: separate visited set. `CircularImplementationError` raised on re-entry. + +Note: `imports:` sections of implementation nodes are NOT followed. An implementation's own imports +point to a different requirement scope. + +--- + +## Post-parse cleanup + +After both phases complete, `DatabaseFilterProcessor._remove_implementation_requirements()` deletes +requirement rows for nodes that are only reachable via `implementation` edges: + +```sql +DELETE FROM requirements WHERE urn IN ( + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation' + EXCEPT + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import' + EXCEPT + SELECT value FROM metadata WHERE key = 'initial_urn' +) +``` + +CASCADE handles SVCs/MVRs/annotations that only linked to those deleted requirements. +SVCs/annotations that link to in-scope requirements (from Phase 1) survive. + +**Why post-parse and not ingest-time?** ~30 lines in the filter processor vs ~150 lines +restructuring the generator and populator. The result is identical for an ephemeral in-memory DB. +The filter processor already runs a post-parse cleanup pass for user-defined `filters:` blocks — +adding structural cleanup there is consistent. + +--- + +## Why recursive implementations? + +The original design (pre-#324) treated implementations as leaf nodes based on the +system→microservice mental model. This was revised because: + +- `variant` is no longer a behavioral gate (see #324) +- A library (`lib-a`) can depend on another library (`lib-b`) which itself has implementations +- All nodes in the implementation subtree can have annotations/tests pointing to in-scope requirements +- Flat traversal silently misses evidence from lib-b, lib-c, etc. + +--- + +## Why `variant` is not a behavioral gate + +Pre-#324, `variant: system/microservice/external` controlled which YAML sections were parsed and +which files were read. This was removed because: + +- It encoded relationship role as an intrinsic property (a URN is not inherently a "microservice") +- It created a confusing 3×N matrix of allowed/disallowed sections +- It silently ignored files when the variant didn't match, causing hard-to-debug data loss +- Presence-based parsing is simpler, more predictable, and more general + +`variant` remains in the schema as optional advisory metadata for display/tooling purposes. diff --git a/docs/modules/ROOT/pages/how_it_works.adoc b/docs/modules/ROOT/pages/how_it_works.adoc index 4c92f39f..2eaa117e 100644 --- a/docs/modules/ROOT/pages/how_it_works.adoc +++ b/docs/modules/ROOT/pages/how_it_works.adoc @@ -1,81 +1,105 @@ = How it Works -This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts]. +This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts]. For detailed architectural decisions, see the link:https://github.com/reqstool/reqstool-client/blob/main/docs/DESIGN.md[DESIGN.md] in the repository. -== Template generation +== Pipeline overview -The CombinedIndexedDatasetGenerator prepares the data provided from the CombinedRawDatasetsGenerator for rendering with the Jinja2 templates and is used by the ReportCommand and the StatusCommand components. +---- +Location → parse → RawDataset → INSERT into SQLite → Repository/Services → Command output +---- -== Overview of central components +Each command calls `build_database()` which: + +1. Parses all sources into the in-memory SQLite database (two-phase traversal, see below) +2. Applies filters (`DatabaseFilterProcessor`) — removes out-of-scope requirements and applies user-defined `filters:` blocks +3. Runs lifecycle validation (warns on DEPRECATED/OBSOLETE references) +4. Commands then query via `RequirementsRepository` and service layer + +== Two-phase graph traversal + +The requirement graph is a directed graph of URNs connected by two edge types: + +* **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions) +* **`implementation`** — "this URN provides evidence for my requirements" (downward, toward code/tests) + +`CombinedRawDatasetsGenerator` traverses this graph in two phases: + +=== Phase 1 — import chain (recursive DFS, full insert) + +Follows `imports:` sections recursively, depth-first. For each node, all five data types are fully inserted into SQLite: requirements, SVCs, MVRs, annotations, and test results. Cycle detection raises `CircularImportError`. + +=== Phase 2 — implementation chain (recursive, metadata-only insert) -Below is a breakdown of the central components of reqstool: +Follows `implementations:` sections recursively. Think library-uses-library — lib-a can itself have implementations (lib-b → lib-c), all of which may contribute test evidence for the initial URN's requirements. + +For each implementation node, `requirements.yml` is parsed (validation runs) but only the URN metadata is inserted — the requirement rows are excluded. All other files (SVCs, MVRs, annotations, test results) are inserted normally; SQLite FK constraints automatically discard rows that reference requirements outside the current scope. Cycle detection raises `CircularImplementationError`. + +NOTE: `imports:` sections of implementation nodes are not followed — an implementation's own imports belong to a different scope. + +== The `variant` field + +`variant: system/microservice/external` in `requirements.yml` is optional advisory metadata. It is not a behavioral gate — parsing is entirely presence-based. If a file exists, it is read. If a section exists in YAML, it is parsed. + +== Overview of central components [plantuml,format=svg] .... @startuml !include -Component(StatusCommand, "StatusCommand", "Processes status command") -Component(GenerateJsonCommand, "GenerateJsonCommand", "Generates JSON from imported Models") -Component(ReportCommand, "ReportCommand", "Generates reports") -Component(SemanticValidator, "SemanticValidator", "Validates data read from source") -Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Generates imported models") -Component(reqstoolConfig, "reqstoolConfig", "Resolves paths to yaml files") -Component(CombinedIndexedDatasetGenerator, "CombinedIndexedDatasetGenerator", "Prepares data for rendering of Jinja2 templates") -Component(Command, "Command", "Handles user commands") - -Rel(Command, StatusCommand, "Uses") -Rel(Command, GenerateJsonCommand, "Uses") -Rel(Command, ReportCommand, "Uses") -Rel(CombinedRawDatasetsGenerator, SemanticValidator, "Depends on") -Rel_Right(CombinedRawDatasetsGenerator, reqstoolConfig, "Uses") -Rel(StatusCommand, CombinedRawDatasetsGenerator, "Uses") -Rel(GenerateJsonCommand, CombinedRawDatasetsGenerator, "Uses") -Rel(ReportCommand, CombinedRawDatasetsGenerator, "Uses") -Rel(ReportCommand, CombinedIndexedDatasetGenerator, "Uses") -Rel(StatusCommand, CombinedIndexedDatasetGenerator, "Uses") - -Rel_Down(CombinedRawDatasetsGenerator, CombinedIndexedDatasetGenerator, "Provides data to") +Component(Command, "Command", "Handles user commands (status, report, export)") +Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Two-phase graph traversal and SQLite population") +Component(DatabaseFilterProcessor, "DatabaseFilterProcessor", "Post-parse requirement/SVC filtering") +Component(RequirementsRepository, "RequirementsRepository", "Data access layer over SQLite") +Component(StatisticsService, "StatisticsService", "Computes per-requirement status and totals") +Component(SemanticValidator, "SemanticValidator", "Cross-reference validation") +Component(SQLiteDB, "SQLite (in-memory)", "Single source of truth after parsing") + +Rel(Command, CombinedRawDatasetsGenerator, "build_database()") +Rel(CombinedRawDatasetsGenerator, SQLiteDB, "INSERT") +Rel(CombinedRawDatasetsGenerator, SemanticValidator, "validate_post_parsing()") +Rel(DatabaseFilterProcessor, SQLiteDB, "DELETE (filters)") +Rel(RequirementsRepository, SQLiteDB, "SELECT") +Rel(StatisticsService, RequirementsRepository, "queries") +Rel(Command, StatisticsService, "Uses") +Rel(Command, RequirementsRepository, "Uses") @enduml .... -== Sequence diagram of the program execution +== Sequence diagram -Below is an example to illustrate how reqstool parses data from the initial source. +Below illustrates how reqstool processes the `status` command against an initial source that imports a parent system. [plantuml,format=svg] .... @startuml !include -Person(user, "User", "", "") - +Person(user, "User") Container(reqsTool, "reqstool") -Container_Boundary(b, "Requirement files") - Container_Boundary(b1, "MS-001") - Component(reqs, "Requirements", "Requirements.yml") - Component(svcs, "SVCS", "software_verification_cases.yml") - Component(mvrs, "MVRS", "manual_verification_results.yml") - Component(annot_impls,"Implementations", "requirements_annotations.yml") - Component(annot_tests,"Automated tests", "svcs_annotations.yml") - Boundary_End() - Container_Boundary(b2, "Ext-001") - Component(reqs_ext, "Requirements", "Requirements.yml") - Boundary_End() +Container_Boundary(phase1, "Phase 1 — import chain") + Component(initial_reqs, "initial/requirements.yml") + Component(initial_svcs, "initial/svcs.yml") + Component(parent_reqs, "parent/requirements.yml") +Boundary_End() + +Container_Boundary(phase2, "Phase 2 — implementation chain") + Component(impl_reqs, "lib-a/requirements.yml") + Component(impl_svcs, "lib-a/svcs.yml") + Component(impl_tests, "lib-a/test results") Boundary_End() -Rel(user, reqsTool, "Submit command", "bash") -Rel(reqsTool, reqs, "Reads requirements") -Rel(reqsTool, svcs, "Reads svcs") -Rel(reqsTool, mvrs, "Reads mvrs") -Rel(reqsTool, annot_impls, "Reads impls annotations") -Rel(reqsTool, annot_tests, "Reads test annotations") -Rel(reqsTool, reqsTool, "Create imported model") -Rel(reqsTool, reqs_ext, "Reads imported requirements") -Rel(reqsTool, reqsTool, "Create imported model") -Rel(reqsTool, user, "Returns combined data based on imported") +Rel(user, reqsTool, "reqstool status local -p ./initial") +Rel(reqsTool, initial_reqs, "parse + full insert") +Rel(reqsTool, initial_svcs, "parse + full insert") +Rel(reqsTool, parent_reqs, "parse + full insert (recursive)") +Rel(reqsTool, impl_reqs, "parse + metadata only") +Rel(reqsTool, impl_svcs, "parse + FK-scoped insert") +Rel(reqsTool, impl_tests, "parse + scoped insert") +Rel(reqsTool, reqsTool, "post-parse: delete impl-child requirements") +Rel(reqsTool, user, "status table (exit code = unmet requirements)") @enduml .... diff --git a/src/reqstool/common/exceptions.py b/src/reqstool/common/exceptions.py index b0662dc3..48a294e5 100644 --- a/src/reqstool/common/exceptions.py +++ b/src/reqstool/common/exceptions.py @@ -7,3 +7,19 @@ class MissingRequirementsFileError(Exception): def __init__(self, path: str): self.path = path super().__init__(f"Missing requirements file: {path}") + + +class CircularImportError(Exception): + """Raised when a circular import is detected in the requirements graph.""" + + def __init__(self, urn: str, chain: list[str]): + self.urn = urn + super().__init__(f"Circular import detected: {' -> '.join(chain)} -> {urn}") + + +class CircularImplementationError(Exception): + """Raised when a circular implementation chain is detected in the requirements graph.""" + + def __init__(self, urn: str, chain: list[str]): + self.urn = urn + super().__init__(f"Circular implementation detected: {' -> '.join(chain)} -> {urn}") diff --git a/src/reqstool/common/utils.py b/src/reqstool/common/utils.py index fd2ac8b5..c7502188 100644 --- a/src/reqstool/common/utils.py +++ b/src/reqstool/common/utils.py @@ -16,7 +16,7 @@ from reqstool.common.models.urn_id import UrnId from reqstool.models.raw_datasets import RawDataset -from reqstool.models.requirements import VARIANTS, RequirementData +from reqstool.models.requirements import RequirementData from reqstool.models.svcs import SVCData @@ -131,8 +131,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]: all_svcs = {} for model_id, model_info in raw_datasets.items(): - if Utils.model_is_external(raw_datasets=model_info): - continue if model_info.svcs_data is not None: for svc_id, svc in model_info.svcs_data.cases.items(): if svc_id not in all_svcs: @@ -144,10 +142,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]: def flatten_list(list_to_flatten: Iterable) -> List[any]: return list(chain.from_iterable(list_to_flatten)) - @staticmethod - def model_is_external(raw_datasets: RawDataset) -> bool: - return raw_datasets.requirements_data.metadata.variant.value == VARIANTS.EXTERNAL.value - @staticmethod def string_contains_delimiter(string: str, delimiter: str) -> bool: return delimiter in string diff --git a/src/reqstool/model_generators/combined_raw_datasets_generator.py b/src/reqstool/model_generators/combined_raw_datasets_generator.py index 4b9bfda8..63b9b04c 100644 --- a/src/reqstool/model_generators/combined_raw_datasets_generator.py +++ b/src/reqstool/model_generators/combined_raw_datasets_generator.py @@ -2,11 +2,11 @@ import logging from collections import defaultdict -from typing import Dict, List, Optional +from typing import Dict, List, Optional, Set, Tuple from reqstool_python_decorators.decorators.decorators import Requirements -from reqstool.common.exceptions import MissingRequirementsFileError +from reqstool.common.exceptions import CircularImplementationError, CircularImportError, MissingRequirementsFileError from reqstool.common.utils import TempDirectoryUtil, Utils from reqstool.common.validators.semantic_validator import SemanticValidator from reqstool.location_resolver.location_resolver import LocationResolver @@ -20,7 +20,7 @@ from reqstool.models.implementations import ImplementationDataInterface from reqstool.models.mvrs import MVRsData from reqstool.models.raw_datasets import CombinedRawDataset, RawDataset -from reqstool.models.requirements import VARIANTS, RequirementsData +from reqstool.models.requirements import RequirementsData from reqstool.models.svcs import SVCsData from reqstool.models.test_data import TestsData from reqstool.requirements_indata.requirements_indata import RequirementsIndata @@ -36,13 +36,12 @@ def __init__( database: Optional[RequirementsDatabase] = None, ): self.__level: int = 0 - self.__initial_source_type: VARIANTS = None self.__initial_location_handler: LocationResolver = LocationResolver( parent=None, current_unresolved=initial_location ) self.semantic_validator = semantic_validator self._parsing_order: List[str] = [] - self._parsing_graph: Dict[str, List[str]] = defaultdict(list) + self._parsing_graph: Dict[str, List[Tuple[str, str]]] = defaultdict(list) self._database = database self.combined_raw_datasets = self.__generate() @@ -135,31 +134,32 @@ def __populate_test_results(self, crd: CombinedRawDataset) -> None: def __populate_parsing_graph(self, crd: CombinedRawDataset) -> None: for parent_urn, children in crd.parsing_graph.items(): - for child_urn in children: - self._database.insert_parsing_graph_edge(parent_urn, child_urn) + for child_urn, edge_type in children: + self._database.insert_parsing_graph_edge(parent_urn, child_urn, edge_type) def __handle_initial_imports(self, raw_datasets: Dict[str, RawDataset], rd: RequirementsData): - match self.__initial_source_type: - case VARIANTS.SYSTEM: - parsed_systems = self.__import_systems(raw_datasets, parent_rd=rd) - parsed_microservices = self.__import_implementations(raw_datasets, implementations=rd.implementations) - self._parsing_graph[rd.metadata.urn].extend(parsed_systems) - self._parsing_graph[rd.metadata.urn].extend(parsed_microservices) - - # add current urn as parent to all microservices - for ms_urn in parsed_microservices: - self._parsing_graph[ms_urn].append(rd.metadata.urn) - - case VARIANTS.MICROSERVICE: - parsed_systems = self.__import_systems(raw_datasets, parent_rd=rd) - self._parsing_graph[rd.metadata.urn].extend(parsed_systems) - case _: - raise RuntimeError("Unsupported initial source system type (this should not happen)") - - def __import_systems(self, raw_datasets: Dict[str, RawDataset], parent_rd: RequirementsData) -> List[str]: - if parent_rd.imports is None: + if rd.imports: + parsed_systems = self.__import_systems(raw_datasets, parent_rd=rd, visited={rd.metadata.urn}) + self._parsing_graph[rd.metadata.urn].extend([(u, "import") for u in parsed_systems]) + + if rd.implementations: + parsed_microservices = self.__import_implementations(raw_datasets, implementations=rd.implementations) + self._parsing_graph[rd.metadata.urn].extend([(u, "implementation") for u in parsed_microservices]) + for ms_urn in parsed_microservices: + self._parsing_graph[ms_urn].append((rd.metadata.urn, "implementation")) + + def __import_systems( + self, + raw_datasets: Dict[str, RawDataset], + parent_rd: RequirementsData, + visited: Optional[Set[str]] = None, + ) -> List[str]: + if not parent_rd.imports: return [] + if visited is None: + visited = set() + self.__level += 1 parsed_urns: List[str] = [] @@ -167,27 +167,23 @@ def __import_systems(self, raw_datasets: Dict[str, RawDataset], parent_rd: Requi current_imported_model = self.__parse_source(current_location_handler=system) current_urn = current_imported_model.requirements_data.metadata.urn + if current_urn in visited: + raise CircularImportError(current_urn, list(visited)) + + visited.add(current_urn) + # add urn to parsing_order_list self._parsing_order.append(current_urn) parsed_urns.append(current_urn) raw_datasets[current_urn] = current_imported_model - assert ( - current_imported_model.requirements_data.metadata.variant == VARIANTS.SYSTEM - or current_imported_model.requirements_data.metadata.variant == VARIANTS.EXTERNAL + # recursively import systems + imported_systems = self.__import_systems( + raw_datasets=raw_datasets, parent_rd=current_imported_model.requirements_data, visited=visited ) - # if current source type is system or external import systems recursively - if ( - current_imported_model.requirements_data.metadata.variant == VARIANTS.SYSTEM - or current_imported_model.requirements_data.metadata.variant == VARIANTS.EXTERNAL - ): - imported_systems = self.__import_systems( - raw_datasets=raw_datasets, parent_rd=current_imported_model.requirements_data - ) - - self._parsing_graph[current_urn].extend(imported_systems) + self._parsing_graph[current_urn].extend([(u, "import") for u in imported_systems]) self.__level -= 1 @@ -197,7 +193,11 @@ def __import_implementations( self, raw_datasets: Dict[str, RawDataset], implementations: List[ImplementationDataInterface], + visited: Optional[Set[str]] = None, ) -> List[str]: + if visited is None: + visited = set() + parsed_urns: List[str] = [] self.__level += 1 @@ -205,12 +205,25 @@ def __import_implementations( parsed_model = self.__parse_source(current_location_handler=implementation) current_urn = parsed_model.requirements_data.metadata.urn + if current_urn in visited: + raise CircularImplementationError(current_urn, list(visited)) + + visited.add(current_urn) + # add urn to parsing_order_list self._parsing_order.append(current_urn) parsed_urns.append(current_urn) raw_datasets[current_urn] = parsed_model + # recurse into this implementation's own implementations + sub_impls = parsed_model.requirements_data.implementations + if sub_impls: + sub_urns = self.__import_implementations(raw_datasets, sub_impls, visited) + self._parsing_graph[current_urn].extend([(u, "implementation") for u in sub_urns]) + for sub_urn in sub_urns: + self._parsing_graph[sub_urn].append((current_urn, "implementation")) + self.__level -= 1 return parsed_urns @@ -243,17 +256,10 @@ def __parse_source(self, current_location_handler: LocationResolver) -> RawDatas else: logging.info(f"{requirements_indata.dst_path}") - if self.__initial_location_handler is current_location_handler: - self.__initial_source_type = rmg.requirements_data.metadata.variant - - if ( - rmg.requirements_data.metadata.variant == VARIANTS.SYSTEM - or rmg.requirements_data.metadata.variant == VARIANTS.MICROSERVICE - ): - # parse file sources other than requirements.yml - annotations_data, svcs_data, automated_tests, mvrs_data = self.__parse_source_other( - actual_tmp_path, requirements_indata, rmg - ) + # parse file sources other than requirements.yml + annotations_data, svcs_data, automated_tests, mvrs_data = self.__parse_source_other( + actual_tmp_path, requirements_indata, rmg + ) raw_dataset = RawDataset( requirements_data=rmg.requirements_data, @@ -309,8 +315,4 @@ def __parse_source_other( uri=requirements_indata.requirements_indata_paths.annotations_yml.path, urn=current_urn ).model - # requirement annotations (impls) - only for microservices - if rmg.requirements_data.metadata.variant != VARIANTS.MICROSERVICE: - assert not annotations_data.implementations - return annotations_data, svcs_data, automated_tests, mvrs_data diff --git a/src/reqstool/model_generators/requirements_model_generator.py b/src/reqstool/model_generators/requirements_model_generator.py index 288b318c..744ae99f 100644 --- a/src/reqstool/model_generators/requirements_model_generator.py +++ b/src/reqstool/model_generators/requirements_model_generator.py @@ -92,30 +92,18 @@ def __generate( validated = RequirementsPydanticModel.model_validate(data) - r_metadata: MetaData = self.__parse_metadata(validated.root) + r_metadata: MetaData = self.__parse_metadata(validated) r_implementations: List[ImplementationDataInterface] = [] r_imports: List[ImportDataInterface] = [] r_requirements: Dict[str, RequirementData] = {} r_filters: Dict[str, RequirementFilter] = {} - match r_metadata.variant: - case VARIANTS.SYSTEM: - self.prefix_with_urn = False - r_imports = self.__parse_imports(validated.root) - r_filters = self.__parse_requirement_filters(data=data) - r_implementations = self.__parse_implementations(validated.root) - r_requirements = self.__parse_requirements(validated.root, data=data) - case VARIANTS.MICROSERVICE: - self.prefix_with_urn = False - r_imports = self.__parse_imports(validated.root) - r_filters = self.__parse_requirement_filters(data=data) - r_requirements = self.__parse_requirements(validated.root, data=data) - case VARIANTS.EXTERNAL: - self.prefix_with_urn = False - r_requirements = self.__parse_requirements(validated.root, data=data) - case _: - raise RuntimeError("Unsupported system type") + self.prefix_with_urn = False + r_imports = self.__parse_imports(validated) + r_filters = self.__parse_requirement_filters(data=data) + r_implementations = self.__parse_implementations(validated) + r_requirements = self.__parse_requirements(validated, data=data) return RequirementsData( metadata=r_metadata, @@ -127,7 +115,7 @@ def __generate( def __parse_metadata(self, model): r_urn: str = model.metadata.urn - r_variant: VARIANTS = VARIANTS(model.metadata.variant.value) + r_variant = VARIANTS(model.metadata.variant.value) if model.metadata.variant else None r_title: str = model.metadata.title r_url: str = model.metadata.url diff --git a/src/reqstool/models/generated/requirements_schema.py b/src/reqstool/models/generated/requirements_schema.py index a786d786..0e2f7966 100644 --- a/src/reqstool/models/generated/requirements_schema.py +++ b/src/reqstool/models/generated/requirements_schema.py @@ -33,7 +33,7 @@ class Metadata(BaseModel): """ Unique resource name """ - variant: Variant + variant: Variant | None = None """ Enum of system, microservice, or external """ @@ -278,7 +278,7 @@ class Locations(BaseModel): """ -class Model1(BaseModel): +class Model(BaseModel): model_config = ConfigDict( extra='forbid', ) @@ -299,7 +299,3 @@ class Model1(BaseModel): """ Array of Requirements """ - - -class Model(RootModel[Model1]): - root: Model1 diff --git a/src/reqstool/models/raw_datasets.py b/src/reqstool/models/raw_datasets.py index 7d1b7117..b3a10cc3 100644 --- a/src/reqstool/models/raw_datasets.py +++ b/src/reqstool/models/raw_datasets.py @@ -1,6 +1,6 @@ # Copyright © LFV -from typing import Dict, List, Optional +from typing import Dict, List, Optional, Tuple from pydantic import BaseModel, ConfigDict, Field @@ -30,5 +30,5 @@ class CombinedRawDataset(BaseModel): initial_model_urn: str urn_parsing_order: List[str] = Field(default_factory=list) - parsing_graph: Dict[str, List[str]] = Field(default_factory=dict) + parsing_graph: Dict[str, List[Tuple[str, str]]] = Field(default_factory=dict) raw_datasets: Dict[str, RawDataset] = Field(default_factory=dict) diff --git a/src/reqstool/models/requirements.py b/src/reqstool/models/requirements.py index c46c326b..b1b8b334 100644 --- a/src/reqstool/models/requirements.py +++ b/src/reqstool/models/requirements.py @@ -101,7 +101,7 @@ class MetaData(BaseModel): model_config = ConfigDict(frozen=True) urn: str - variant: VARIANTS + variant: Optional[VARIANTS] = None title: str url: Optional[str] = None diff --git a/src/reqstool/resources/schemas/v1/requirements.schema.json b/src/reqstool/resources/schemas/v1/requirements.schema.json index e406b006..2a3459b6 100644 --- a/src/reqstool/resources/schemas/v1/requirements.schema.json +++ b/src/reqstool/resources/schemas/v1/requirements.schema.json @@ -30,35 +30,6 @@ "required": [ "metadata" ], - "anyOf": [ - { - "if": { - "properties": { - "metadata": { - "properties": { - "variant": { - "anyOf": [ - { - "const": "microservice" - }, - { - "const": "external" - } - ] - } - } - } - } - }, - "then": { - "not": { - "required": [ - "implementations" - ] - } - } - } - ], "$defs": { "metadata": { "type": "object", @@ -89,7 +60,6 @@ }, "required": [ "urn", - "variant", "title" ] }, diff --git a/src/reqstool/storage/database.py b/src/reqstool/storage/database.py index d46ee34c..149b0e07 100644 --- a/src/reqstool/storage/database.py +++ b/src/reqstool/storage/database.py @@ -146,16 +146,22 @@ def insert_test_result(self, urn: str, fqn: str, status: TEST_RUN_STATUS) -> Non (urn, fqn, status.value), ) - def insert_parsing_graph_edge(self, parent_urn: str, child_urn: str) -> None: + def insert_parsing_graph_edge(self, parent_urn: str, child_urn: str, edge_type: str) -> None: self._conn.execute( - "INSERT OR IGNORE INTO parsing_graph (parent_urn, child_urn) VALUES (?, ?)", - (parent_urn, child_urn), + "INSERT OR IGNORE INTO parsing_graph (parent_urn, child_urn, edge_type) VALUES (?, ?, ?)", + (parent_urn, child_urn, edge_type), ) def insert_urn_metadata(self, metadata: MetaData) -> None: self._conn.execute( "INSERT INTO urn_metadata (urn, variant, title, url, parse_position) VALUES (?, ?, ?, ?, ?)", - (metadata.urn, metadata.variant.value, metadata.title, metadata.url, self._next_parse_position), + ( + metadata.urn, + metadata.variant.value if metadata.variant else None, + metadata.title, + metadata.url, + self._next_parse_position, + ), ) self._next_parse_position += 1 diff --git a/src/reqstool/storage/database_filter_processor.py b/src/reqstool/storage/database_filter_processor.py index ee1a2d4a..c9c656f4 100644 --- a/src/reqstool/storage/database_filter_processor.py +++ b/src/reqstool/storage/database_filter_processor.py @@ -7,7 +7,6 @@ from reqstool.common.models.urn_id import UrnId from reqstool.filters.id_filters import IDFilters from reqstool.models.raw_datasets import RawDataset -from reqstool.models.requirements import VARIANTS from reqstool.storage.database import RequirementsDatabase from reqstool.storage.el_to_sql_compiler import ELToSQLCompiler @@ -21,6 +20,8 @@ def __init__(self, db: RequirementsDatabase, raw_datasets: dict[str, RawDataset] self._parsing_graph = self._load_parsing_graph() def apply_filters(self) -> None: + self._remove_implementation_requirements() + initial_urn = self._db.get_metadata("initial_urn") self._apply_req_filters(initial_urn) @@ -28,6 +29,27 @@ def apply_filters(self) -> None: self._db.set_metadata("filtered", "true") + def _remove_implementation_requirements(self) -> None: + """Delete requirements from nodes only reachable via implementation edges. + + Nodes reachable only via implementation edges are evidence contributors, not + requirement definers. Their requirements are out of scope from the initial URN's + perspective. CASCADE removes SVCs/MVRs/annotations that only linked to those + requirements; evidence rows linking to in-scope requirements survive. + """ + self._db.connection.execute( + """ + DELETE FROM requirements WHERE urn IN ( + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation' + EXCEPT + SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import' + EXCEPT + SELECT value FROM metadata WHERE key = 'initial_urn' + ) + """ + ) + self._db.connection.commit() + # -- Requirement filters -- def _apply_req_filters(self, initial_urn: str) -> None: @@ -50,8 +72,8 @@ def _process_req_filters_per_urn(self, urn: str) -> tuple[set[UrnId], set[UrnId] kept_imports: set[UrnId] = set() filtered_out_imports: set[UrnId] = set() - for import_urn in self._parsing_graph.get(urn, []): - if self._raw_datasets[import_urn].requirements_data.metadata.variant == VARIANTS.MICROSERVICE: + for import_urn, edge_type in self._parsing_graph.get(urn, []): + if edge_type == "implementation": continue kept_per_import, filtered_per_import = self._process_req_filters_per_urn(import_urn) @@ -101,8 +123,8 @@ def _process_svc_filters_per_urn(self, urn: str) -> tuple[set[UrnId], set[UrnId] kept_imports: set[UrnId] = set() filtered_out_imports: set[UrnId] = set() - for import_urn in self._parsing_graph.get(urn, []): - if self._raw_datasets[import_urn].requirements_data.metadata.variant == VARIANTS.MICROSERVICE: + for import_urn, edge_type in self._parsing_graph.get(urn, []): + if edge_type == "implementation": continue kept_per_import, filtered_per_import = self._process_svc_filters_per_urn(import_urn) @@ -243,13 +265,13 @@ def _check_filter_refs(self, id_filter: IDFilters, accessible: set[UrnId]) -> No if uid not in accessible: logger.warning(f"Cannot exclude: {uid} does not exist or is not accessible") - def _load_parsing_graph(self) -> dict[str, list[str]]: - graph: dict[str, list[str]] = {} - rows = self._db.connection.execute("SELECT parent_urn, child_urn FROM parsing_graph").fetchall() + def _load_parsing_graph(self) -> dict[str, list[tuple[str, str]]]: + graph: dict[str, list[tuple[str, str]]] = {} + rows = self._db.connection.execute("SELECT parent_urn, child_urn, edge_type FROM parsing_graph").fetchall() # Initialize all URNs as keys (including leaves with no children) all_urns = {row["urn"] for row in self._db.connection.execute("SELECT urn FROM urn_metadata").fetchall()} for urn in all_urns: graph[urn] = [] for row in rows: - graph.setdefault(row["parent_urn"], []).append(row["child_urn"]) + graph.setdefault(row["parent_urn"], []).append((row["child_urn"], row["edge_type"])) return graph diff --git a/src/reqstool/storage/schema.py b/src/reqstool/storage/schema.py index add89de4..197d4d77 100644 --- a/src/reqstool/storage/schema.py +++ b/src/reqstool/storage/schema.py @@ -114,12 +114,13 @@ CREATE TABLE IF NOT EXISTS parsing_graph ( parent_urn TEXT NOT NULL, child_urn TEXT NOT NULL, + edge_type TEXT NOT NULL CHECK (edge_type IN ('import', 'implementation')), PRIMARY KEY (parent_urn, child_urn) ); CREATE TABLE IF NOT EXISTS urn_metadata ( urn TEXT NOT NULL PRIMARY KEY, - variant TEXT NOT NULL CHECK (variant IN ('system', 'microservice', 'external')), + variant TEXT CHECK (variant IN ('system', 'microservice', 'external')), title TEXT NOT NULL, url TEXT, parse_position INTEGER NOT NULL UNIQUE diff --git a/tests/resources/test_data/data/local/test_circular_impl/lib-a/requirements.yml b/tests/resources/test_data/data/local/test_circular_impl/lib-a/requirements.yml new file mode 100644 index 00000000..225d33d2 --- /dev/null +++ b/tests/resources/test_data/data/local/test_circular_impl/lib-a/requirements.yml @@ -0,0 +1,7 @@ +metadata: + urn: lib-a + title: Library A + +implementations: + local: + - path: ../lib-b diff --git a/tests/resources/test_data/data/local/test_circular_impl/lib-b/requirements.yml b/tests/resources/test_data/data/local/test_circular_impl/lib-b/requirements.yml new file mode 100644 index 00000000..7983c1c1 --- /dev/null +++ b/tests/resources/test_data/data/local/test_circular_impl/lib-b/requirements.yml @@ -0,0 +1,7 @@ +metadata: + urn: lib-b + title: Library B + +implementations: + local: + - path: ../lib-a diff --git a/tests/resources/test_data/data/local/test_circular_import/node-a/requirements.yml b/tests/resources/test_data/data/local/test_circular_import/node-a/requirements.yml new file mode 100644 index 00000000..79663aa3 --- /dev/null +++ b/tests/resources/test_data/data/local/test_circular_import/node-a/requirements.yml @@ -0,0 +1,7 @@ +metadata: + urn: node-a + title: Node A Requirements + +imports: + local: + - path: ../node-b diff --git a/tests/resources/test_data/data/local/test_circular_import/node-b/requirements.yml b/tests/resources/test_data/data/local/test_circular_import/node-b/requirements.yml new file mode 100644 index 00000000..4342a8a2 --- /dev/null +++ b/tests/resources/test_data/data/local/test_circular_import/node-b/requirements.yml @@ -0,0 +1,7 @@ +metadata: + urn: node-b + title: Node B Requirements + +imports: + local: + - path: ../node-a diff --git a/tests/resources/test_data/data/local/test_recursive_impl/lib-a/requirements.yml b/tests/resources/test_data/data/local/test_recursive_impl/lib-a/requirements.yml new file mode 100644 index 00000000..11238b8e --- /dev/null +++ b/tests/resources/test_data/data/local/test_recursive_impl/lib-a/requirements.yml @@ -0,0 +1,15 @@ +metadata: + urn: lib-a + title: Library A + +requirements: + - id: REQ_LA_001 + title: Library A requirement + significance: shall + description: A requirement defined in lib-a + categories: [reliability] + revision: 0.0.1 + +implementations: + local: + - path: ../lib-b diff --git a/tests/resources/test_data/data/local/test_recursive_impl/lib-b/requirements.yml b/tests/resources/test_data/data/local/test_recursive_impl/lib-b/requirements.yml new file mode 100644 index 00000000..a384d3fa --- /dev/null +++ b/tests/resources/test_data/data/local/test_recursive_impl/lib-b/requirements.yml @@ -0,0 +1,15 @@ +metadata: + urn: lib-b + title: Library B + +requirements: + - id: REQ_LB_001 + title: Library B requirement + significance: shall + description: A requirement defined in lib-b + categories: [reliability] + revision: 0.0.1 + +implementations: + local: + - path: ../lib-c diff --git a/tests/resources/test_data/data/local/test_recursive_impl/lib-c/requirements.yml b/tests/resources/test_data/data/local/test_recursive_impl/lib-c/requirements.yml new file mode 100644 index 00000000..0ffa567c --- /dev/null +++ b/tests/resources/test_data/data/local/test_recursive_impl/lib-c/requirements.yml @@ -0,0 +1,11 @@ +metadata: + urn: lib-c + title: Library C + +requirements: + - id: REQ_LC_001 + title: Library C requirement + significance: shall + description: A requirement defined in lib-c + categories: [reliability] + revision: 0.0.1 diff --git a/tests/resources/test_data/data/local/test_recursive_impl/root/requirements.yml b/tests/resources/test_data/data/local/test_recursive_impl/root/requirements.yml new file mode 100644 index 00000000..68170913 --- /dev/null +++ b/tests/resources/test_data/data/local/test_recursive_impl/root/requirements.yml @@ -0,0 +1,15 @@ +metadata: + urn: root + title: Root + +requirements: + - id: REQ_ROOT_001 + title: Root requirement + significance: shall + description: A requirement defined at the root level + categories: [reliability] + revision: 0.0.1 + +implementations: + local: + - path: ../lib-a diff --git a/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py b/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py index f64ede7d..98b2d31b 100644 --- a/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py +++ b/tests/unit/reqstool/model_generators/test_combined_raw_datasets_generator.py @@ -3,7 +3,7 @@ import pytest from reqstool_python_decorators.decorators.decorators import SVCs -from reqstool.common.exceptions import MissingRequirementsFileError +from reqstool.common.exceptions import CircularImplementationError, CircularImportError, MissingRequirementsFileError from reqstool.common.validator_error_holder import ValidationErrorHolder from reqstool.common.validators.semantic_validator import SemanticValidator from reqstool.locations.local_location import LocalLocation @@ -84,3 +84,50 @@ def test_missing_requirements_file(local_testdata_resources_rootdir_w_path): semantic_validator=semantic_validator, ) assert "this/path/does/not/have/a/requirements/file" in str(excinfo.value) + + +@SVCs("SVC_020") +def test_circular_import_raises(local_testdata_resources_rootdir_w_path): + semantic_validator = SemanticValidator(validation_error_holder=ValidationErrorHolder()) + with pytest.raises(CircularImportError) as excinfo: + combined_raw_datasets_generator.CombinedRawDatasetsGenerator( + initial_location=LocalLocation(path=local_testdata_resources_rootdir_w_path("test_circular_import/node-a")), + semantic_validator=semantic_validator, + ) + assert "node-a" in str(excinfo.value) + assert "Circular import detected" in str(excinfo.value) + + +@SVCs("SVC_020") +def test_circular_implementation_raises(local_testdata_resources_rootdir_w_path): + semantic_validator = SemanticValidator(validation_error_holder=ValidationErrorHolder()) + with pytest.raises(CircularImplementationError) as excinfo: + combined_raw_datasets_generator.CombinedRawDatasetsGenerator( + initial_location=LocalLocation(path=local_testdata_resources_rootdir_w_path("test_circular_impl/lib-a")), + semantic_validator=semantic_validator, + ) + assert "lib-a" in str(excinfo.value) + assert "Circular implementation detected" in str(excinfo.value) + + +@SVCs("SVC_001") +def test_implementation_traversal_recursive(local_testdata_resources_rootdir_w_path): + semantic_validator = SemanticValidator(validation_error_holder=ValidationErrorHolder()) + + crd: CombinedRawDataset = combined_raw_datasets_generator.CombinedRawDatasetsGenerator( + initial_location=LocalLocation(path=local_testdata_resources_rootdir_w_path("test_recursive_impl/root")), + semantic_validator=semantic_validator, + ).combined_raw_datasets + + # all four nodes are in raw_datasets (recursive traversal reached lib-b and lib-c) + assert set(crd.raw_datasets.keys()) == {"root", "lib-a", "lib-b", "lib-c"} + + # implementation nodes are parsed (requirements present in raw_datasets) + assert len(crd.raw_datasets["lib-a"].requirements_data.requirements) == 1 + assert len(crd.raw_datasets["lib-b"].requirements_data.requirements) == 1 + assert len(crd.raw_datasets["lib-c"].requirements_data.requirements) == 1 + + # implementation edges are tagged correctly in the parsing graph + assert ("lib-a", "implementation") in crd.parsing_graph["root"] + assert ("lib-b", "implementation") in crd.parsing_graph["lib-a"] + assert ("lib-c", "implementation") in crd.parsing_graph["lib-b"] diff --git a/tests/unit/reqstool/storage/test_database.py b/tests/unit/reqstool/storage/test_database.py index 960c0810..72210d10 100644 --- a/tests/unit/reqstool/storage/test_database.py +++ b/tests/unit/reqstool/storage/test_database.py @@ -254,8 +254,8 @@ def test_insert_test_result(db): def test_insert_parsing_graph_edge(db): - db.insert_parsing_graph_edge("sys-001", "ms-001") - db.insert_parsing_graph_edge("sys-001", "ms-002") + db.insert_parsing_graph_edge("sys-001", "ms-001", "implementation") + db.insert_parsing_graph_edge("sys-001", "ms-002", "implementation") rows = db.connection.execute("SELECT child_urn FROM parsing_graph WHERE parent_urn = 'sys-001'").fetchall() children = {row["child_urn"] for row in rows} @@ -263,8 +263,8 @@ def test_insert_parsing_graph_edge(db): def test_insert_duplicate_parsing_graph_edge_ignored(db): - db.insert_parsing_graph_edge("sys-001", "ms-001") - db.insert_parsing_graph_edge("sys-001", "ms-001") + db.insert_parsing_graph_edge("sys-001", "ms-001", "implementation") + db.insert_parsing_graph_edge("sys-001", "ms-001", "implementation") count = db.connection.execute("SELECT COUNT(*) FROM parsing_graph").fetchone()[0] assert count == 1 diff --git a/tests/unit/reqstool/storage/test_database_filter_processor.py b/tests/unit/reqstool/storage/test_database_filter_processor.py index 57c74534..397f4b90 100644 --- a/tests/unit/reqstool/storage/test_database_filter_processor.py +++ b/tests/unit/reqstool/storage/test_database_filter_processor.py @@ -66,7 +66,7 @@ def _setup_db_with_raw_datasets(raw_datasets, parsing_graph, initial_urn): for parent, children in parsing_graph.items(): for child in children: - db.insert_parsing_graph_edge(parent, child) + db.insert_parsing_graph_edge(parent, child, "import") return db diff --git a/tests/unit/reqstool/storage/test_requirements_repository.py b/tests/unit/reqstool/storage/test_requirements_repository.py index 57ae942c..155b4d8e 100644 --- a/tests/unit/reqstool/storage/test_requirements_repository.py +++ b/tests/unit/reqstool/storage/test_requirements_repository.py @@ -99,7 +99,7 @@ def test_get_import_graph(db): _setup_metadata(db, "ms-001") m2 = MetaData(urn="sys-001", variant=VARIANTS.SYSTEM, title="System") db.insert_urn_metadata(m2) - db.insert_parsing_graph_edge("ms-001", "sys-001") + db.insert_parsing_graph_edge("ms-001", "sys-001", "import") db.commit() repo = RequirementsRepository(db)