Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,9 @@ The pipeline flows: **Location** → **parse** → **RawDataset** (transient)
Abstractions for where source data lives. Implementations: `LocalLocation`, `GitLocation`, `MavenLocation`, `PypiLocation`. Each implements `_make_available_on_localdisk(dst_path)` to download/copy the source to a temp dir. `LocationResolver` (`location_resolver/`) handles relative path resolution when an import's location is relative to its parent.

### Data Ingestion (`model_generators/`, `requirements_indata/`)
`CombinedRawDatasetsGenerator` is the top-level parser. It:
1. Resolves the initial location to a local temp path (`TempDirectoryUtil`)
2. Parses `requirements.yml` → `RequirementsModelGenerator` → `RequirementsData`
3. Recursively follows `imports` (other system URNs) and `implementations` (microservice URNs)
4. For each SYSTEM/MICROSERVICE source also parses: `svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML test results
5. Each parsed `RawDataset` is immediately inserted into the in-memory SQLite database via `DatabasePopulator`
`CombinedRawDatasetsGenerator` is the top-level parser. It runs in two phases:
1. **Import chain** (recursive DFS): resolves location → parses `requirements.yml` → parses all auxiliary files (`svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML) → full insert into SQLite. Follows each node's own `imports:` section recursively. Cycle detection via visited set (`CircularImportError`).
2. **Implementation chain** (recursive): follows `implementations:` sections recursively (library-uses-library model, not system→microservice). Parses all files; inserts **metadata only** for `requirements.yml` — requirement rows are excluded post-parse by `DatabaseFilterProcessor`. Cycle detection via separate visited set (`CircularImplementationError`).

### Storage Layer (`storage/`)
In-memory SQLite is the single source of truth after parsing:
Expand All @@ -75,7 +72,7 @@ All domain objects are frozen/plain `@dataclass`s:
- `TestsData` / `TestData` — JUnit XML test results
- `CombinedRawDataset` — flat dict of all raw datasets + parsing graph (used during population and by `SemanticValidator`)

Variants (defined in `requirements.yml` metadata): `SYSTEM`, `MICROSERVICE`, `EXTERNAL`.
`variant` field in `requirements.yml` metadata is optional advisory metadata (`system`, `microservice`, `external`). It is NOT a behavioral gate — parsing is presence-based. See `docs/DESIGN.md`.

### Services (`services/`)
Business logic layer querying the database via `RequirementsRepository`:
Expand Down Expand Up @@ -145,6 +142,19 @@ diff /tmp/baseline-report-demo.txt /tmp/feature-report-demo.txt

If a diff is expected (e.g. the PR intentionally changes output), note it in the PR description.

## Design Decisions

Key architectural decisions that affect how to read and modify this codebase.
Full rationale in `docs/DESIGN.md`.

- **Traversal is two-phase**: import chain (full insert) then implementation chain (metadata-only). Do not collapse these into a single pass.
- **Implementation chains are recursive**: a library can have its own implementations (lib-a → lib-b → lib-c). Do not treat implementations as leaf nodes.
- **`variant` is not a behavioral gate**: parsing is presence-based. Do not add `if variant == X` guards anywhere in the ingestion pipeline.
- **Implementation-child requirements are excluded post-parse**: `DatabaseFilterProcessor` deletes them via SQL after ingestion. Do not filter at ingest time.
- **Cycle detection covers both chains**: `CircularImportError` for the import chain, `CircularImplementationError` for the implementation chain.
- **FK constraints scope evidence from implementation children**: SVCs/MVRs/annotations referencing out-of-scope requirements are rejected by SQLite FK checks on insert — no explicit filtering needed.
- **Test results need explicit scoping**: no FK (keyed by FQN), so a scope check is required when inserting test results from implementation children.

## Key Conventions

- **URN format**: `some:urn:string` — the separator is `:`. `UrnId` is the canonical composite key used throughout indexes.
Expand Down
112 changes: 112 additions & 0 deletions docs/DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Design: Graph Traversal and Data Ingestion

Captures architectural decisions for how `CombinedRawDatasetsGenerator` traverses the URN graph
and what data is inserted into SQLite for each node role.

Related code: `src/reqstool/model_generators/combined_raw_datasets_generator.py`,
`src/reqstool/storage/database_filter_processor.py`

---

## The graph

A reqstool graph is a directed graph of URNs connected by two edge types:

- **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions)
- **`implementation`** — "this URN implements my requirements" (downward, toward evidence providers)

Example:

```
A1 (defines requirements)
← imported by B1
← imported by C1 (initial URN — the one being reported on)
← implemented by lib-a
← implemented by lib-b
← implemented by lib-c
```

---

## Two-phase traversal

### Phase 1 — import chain (DFS, recursive)

Traverses `imports:` sections recursively. For each node, all five data types are fully inserted:
`requirements`, `svcs`, `mvrs`, `annotations`, `test_results`.

Order: depth-first so ancestors are inserted before their children. This matters for FK constraints
(SVCs reference requirements that must exist first).

Cycle detection: visited set seeded with the initial URN. `CircularImportError` raised on re-entry.

### Phase 2 — implementation chain (recursive)

Traverses `implementations:` sections recursively. Think library-uses-library, not
system→microservice. lib-a can have its own implementations (lib-b → lib-c).

For each node:

| File | Action |
|------|--------|
| `requirements.yml` | Parse fully (validation runs); insert **metadata only** — skip `insert_requirement` |
| `svcs.yml` | Insert normally — FK on `req_urn/req_id` rejects rows referencing out-of-scope requirements |
| `mvrs.yml` | Insert normally — FK on `svc_urn/svc_id` rejects rows referencing out-of-scope SVCs |
| `annotations.yml` | Insert normally — FK on `req_urn/req_id` rejects out-of-scope rows |
| test results | Insert with explicit scope check — no FK, keyed by FQN |

Cycle detection: separate visited set. `CircularImplementationError` raised on re-entry.

Note: `imports:` sections of implementation nodes are NOT followed. An implementation's own imports
point to a different requirement scope.

---

## Post-parse cleanup

After both phases complete, `DatabaseFilterProcessor._remove_implementation_requirements()` deletes
requirement rows for nodes that are only reachable via `implementation` edges:

```sql
DELETE FROM requirements WHERE urn IN (
SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'implementation'
EXCEPT
SELECT DISTINCT child_urn FROM parsing_graph WHERE edge_type = 'import'
EXCEPT
SELECT value FROM metadata WHERE key = 'initial_urn'
)
```

CASCADE handles SVCs/MVRs/annotations that only linked to those deleted requirements.
SVCs/annotations that link to in-scope requirements (from Phase 1) survive.

**Why post-parse and not ingest-time?** ~30 lines in the filter processor vs ~150 lines
restructuring the generator and populator. The result is identical for an ephemeral in-memory DB.
The filter processor already runs a post-parse cleanup pass for user-defined `filters:` blocks —
adding structural cleanup there is consistent.

---

## Why recursive implementations?

The original design (pre-#324) treated implementations as leaf nodes based on the
system→microservice mental model. This was revised because:

- `variant` is no longer a behavioral gate (see #324)
- A library (`lib-a`) can depend on another library (`lib-b`) which itself has implementations
- All nodes in the implementation subtree can have annotations/tests pointing to in-scope requirements
- Flat traversal silently misses evidence from lib-b, lib-c, etc.

---

## Why `variant` is not a behavioral gate

Pre-#324, `variant: system/microservice/external` controlled which YAML sections were parsed and
which files were read. This was removed because:

- It encoded relationship role as an intrinsic property (a URN is not inherently a "microservice")
- It created a confusing 3×N matrix of allowed/disallowed sections
- It silently ignored files when the variant didn't match, causing hard-to-debug data loss
- Presence-based parsing is simpler, more predictable, and more general

`variant` remains in the schema as optional advisory metadata for display/tooling purposes.
126 changes: 75 additions & 51 deletions docs/modules/ROOT/pages/how_it_works.adoc
Original file line number Diff line number Diff line change
@@ -1,81 +1,105 @@
= How it Works

This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts].
This page covers the internal architecture of the reqstool client. For general concepts like annotations, parsing, and validation, see xref:reqstool::concepts.adoc[Concepts]. For detailed architectural decisions, see the link:https://github.com/reqstool/reqstool-client/blob/main/docs/DESIGN.md[DESIGN.md] in the repository.

== Template generation
== Pipeline overview

The CombinedIndexedDatasetGenerator prepares the data provided from the CombinedRawDatasetsGenerator for rendering with the Jinja2 templates and is used by the ReportCommand and the StatusCommand components.
----
Location → parse → RawDataset → INSERT into SQLite → Repository/Services → Command output
----

== Overview of central components
Each command calls `build_database()` which:

1. Parses all sources into the in-memory SQLite database (two-phase traversal, see below)
2. Applies filters (`DatabaseFilterProcessor`) — removes out-of-scope requirements and applies user-defined `filters:` blocks
3. Runs lifecycle validation (warns on DEPRECATED/OBSOLETE references)
4. Commands then query via `RequirementsRepository` and service layer

== Two-phase graph traversal

The requirement graph is a directed graph of URNs connected by two edge types:

* **`import`** — "I reference requirements from this URN" (upward, toward requirement definitions)
* **`implementation`** — "this URN provides evidence for my requirements" (downward, toward code/tests)

`CombinedRawDatasetsGenerator` traverses this graph in two phases:

=== Phase 1 — import chain (recursive DFS, full insert)

Follows `imports:` sections recursively, depth-first. For each node, all five data types are fully inserted into SQLite: requirements, SVCs, MVRs, annotations, and test results. Cycle detection raises `CircularImportError`.

=== Phase 2 — implementation chain (recursive, metadata-only insert)

Below is a breakdown of the central components of reqstool:
Follows `implementations:` sections recursively. Think library-uses-library — lib-a can itself have implementations (lib-b → lib-c), all of which may contribute test evidence for the initial URN's requirements.

For each implementation node, `requirements.yml` is parsed (validation runs) but only the URN metadata is inserted — the requirement rows are excluded. All other files (SVCs, MVRs, annotations, test results) are inserted normally; SQLite FK constraints automatically discard rows that reference requirements outside the current scope. Cycle detection raises `CircularImplementationError`.

NOTE: `imports:` sections of implementation nodes are not followed — an implementation's own imports belong to a different scope.

== The `variant` field

`variant: system/microservice/external` in `requirements.yml` is optional advisory metadata. It is not a behavioral gate — parsing is entirely presence-based. If a file exists, it is read. If a section exists in YAML, it is parsed.

== Overview of central components

[plantuml,format=svg]
....
@startuml
!include <C4/C4_Component>

Component(StatusCommand, "StatusCommand", "Processes status command")
Component(GenerateJsonCommand, "GenerateJsonCommand", "Generates JSON from imported Models")
Component(ReportCommand, "ReportCommand", "Generates reports")
Component(SemanticValidator, "SemanticValidator", "Validates data read from source")
Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Generates imported models")
Component(reqstoolConfig, "reqstoolConfig", "Resolves paths to yaml files")
Component(CombinedIndexedDatasetGenerator, "CombinedIndexedDatasetGenerator", "Prepares data for rendering of Jinja2 templates")
Component(Command, "Command", "Handles user commands")

Rel(Command, StatusCommand, "Uses")
Rel(Command, GenerateJsonCommand, "Uses")
Rel(Command, ReportCommand, "Uses")
Rel(CombinedRawDatasetsGenerator, SemanticValidator, "Depends on")
Rel_Right(CombinedRawDatasetsGenerator, reqstoolConfig, "Uses")
Rel(StatusCommand, CombinedRawDatasetsGenerator, "Uses")
Rel(GenerateJsonCommand, CombinedRawDatasetsGenerator, "Uses")
Rel(ReportCommand, CombinedRawDatasetsGenerator, "Uses")
Rel(ReportCommand, CombinedIndexedDatasetGenerator, "Uses")
Rel(StatusCommand, CombinedIndexedDatasetGenerator, "Uses")

Rel_Down(CombinedRawDatasetsGenerator, CombinedIndexedDatasetGenerator, "Provides data to")
Component(Command, "Command", "Handles user commands (status, report, export)")
Component(CombinedRawDatasetsGenerator, "CombinedRawDatasetsGenerator", "Two-phase graph traversal and SQLite population")
Component(DatabaseFilterProcessor, "DatabaseFilterProcessor", "Post-parse requirement/SVC filtering")
Component(RequirementsRepository, "RequirementsRepository", "Data access layer over SQLite")
Component(StatisticsService, "StatisticsService", "Computes per-requirement status and totals")
Component(SemanticValidator, "SemanticValidator", "Cross-reference validation")
Component(SQLiteDB, "SQLite (in-memory)", "Single source of truth after parsing")

Rel(Command, CombinedRawDatasetsGenerator, "build_database()")
Rel(CombinedRawDatasetsGenerator, SQLiteDB, "INSERT")
Rel(CombinedRawDatasetsGenerator, SemanticValidator, "validate_post_parsing()")
Rel(DatabaseFilterProcessor, SQLiteDB, "DELETE (filters)")
Rel(RequirementsRepository, SQLiteDB, "SELECT")
Rel(StatisticsService, RequirementsRepository, "queries")
Rel(Command, StatisticsService, "Uses")
Rel(Command, RequirementsRepository, "Uses")

@enduml
....

== Sequence diagram of the program execution
== Sequence diagram

Below is an example to illustrate how reqstool parses data from the initial source.
Below illustrates how reqstool processes the `status` command against an initial source that imports a parent system.

[plantuml,format=svg]
....
@startuml
!include <C4/C4_Sequence>

Person(user, "User", "", "")

Person(user, "User")
Container(reqsTool, "reqstool")

Container_Boundary(b, "Requirement files")
Container_Boundary(b1, "MS-001")
Component(reqs, "Requirements", "Requirements.yml")
Component(svcs, "SVCS", "software_verification_cases.yml")
Component(mvrs, "MVRS", "manual_verification_results.yml")
Component(annot_impls,"Implementations", "requirements_annotations.yml")
Component(annot_tests,"Automated tests", "svcs_annotations.yml")
Boundary_End()
Container_Boundary(b2, "Ext-001")
Component(reqs_ext, "Requirements", "Requirements.yml")
Boundary_End()
Container_Boundary(phase1, "Phase 1 — import chain")
Component(initial_reqs, "initial/requirements.yml")
Component(initial_svcs, "initial/svcs.yml")
Component(parent_reqs, "parent/requirements.yml")
Boundary_End()

Container_Boundary(phase2, "Phase 2 — implementation chain")
Component(impl_reqs, "lib-a/requirements.yml")
Component(impl_svcs, "lib-a/svcs.yml")
Component(impl_tests, "lib-a/test results")
Boundary_End()

Rel(user, reqsTool, "Submit command", "bash")
Rel(reqsTool, reqs, "Reads requirements")
Rel(reqsTool, svcs, "Reads svcs")
Rel(reqsTool, mvrs, "Reads mvrs")
Rel(reqsTool, annot_impls, "Reads impls annotations")
Rel(reqsTool, annot_tests, "Reads test annotations")
Rel(reqsTool, reqsTool, "Create imported model")
Rel(reqsTool, reqs_ext, "Reads imported requirements")
Rel(reqsTool, reqsTool, "Create imported model")
Rel(reqsTool, user, "Returns combined data based on imported")
Rel(user, reqsTool, "reqstool status local -p ./initial")
Rel(reqsTool, initial_reqs, "parse + full insert")
Rel(reqsTool, initial_svcs, "parse + full insert")
Rel(reqsTool, parent_reqs, "parse + full insert (recursive)")
Rel(reqsTool, impl_reqs, "parse + metadata only")
Rel(reqsTool, impl_svcs, "parse + FK-scoped insert")
Rel(reqsTool, impl_tests, "parse + scoped insert")
Rel(reqsTool, reqsTool, "post-parse: delete impl-child requirements")
Rel(reqsTool, user, "status table (exit code = unmet requirements)")

@enduml
....
16 changes: 16 additions & 0 deletions src/reqstool/common/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,19 @@ class MissingRequirementsFileError(Exception):
def __init__(self, path: str):
self.path = path
super().__init__(f"Missing requirements file: {path}")


class CircularImportError(Exception):
"""Raised when a circular import is detected in the requirements graph."""

def __init__(self, urn: str, chain: list[str]):
self.urn = urn
super().__init__(f"Circular import detected: {' -> '.join(chain)} -> {urn}")


class CircularImplementationError(Exception):
"""Raised when a circular implementation chain is detected in the requirements graph."""

def __init__(self, urn: str, chain: list[str]):
self.urn = urn
super().__init__(f"Circular implementation detected: {' -> '.join(chain)} -> {urn}")
8 changes: 1 addition & 7 deletions src/reqstool/common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

from reqstool.common.models.urn_id import UrnId
from reqstool.models.raw_datasets import RawDataset
from reqstool.models.requirements import VARIANTS, RequirementData
from reqstool.models.requirements import RequirementData
from reqstool.models.svcs import SVCData


Expand Down Expand Up @@ -131,8 +131,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]:
all_svcs = {}

for model_id, model_info in raw_datasets.items():
if Utils.model_is_external(raw_datasets=model_info):
continue
if model_info.svcs_data is not None:
for svc_id, svc in model_info.svcs_data.cases.items():
if svc_id not in all_svcs:
Expand All @@ -144,10 +142,6 @@ def flatten_all_svcs(raw_datasets: Dict[str, RawDataset]) -> Dict[str, SVCData]:
def flatten_list(list_to_flatten: Iterable) -> List[any]:
return list(chain.from_iterable(list_to_flatten))

@staticmethod
def model_is_external(raw_datasets: RawDataset) -> bool:
return raw_datasets.requirements_data.metadata.variant.value == VARIANTS.EXTERNAL.value

@staticmethod
def string_contains_delimiter(string: str, delimiter: str) -> bool:
return delimiter in string
Expand Down
Loading
Loading