Skip to content

Commit 919649e

Browse files
jimisolaJimisola Laursen
andauthored
feat: add SQLite storage as single source of truth (#321)
* feat: add SQLite storage infrastructure with schema, populator, and security (#313) Add in-memory SQLite database as foundation for replacing intermediate data structures (CombinedRawDataset, CombinedIndexedDataset). Includes schema with 13 tables, FK indexes, authorizer for security, insert API, and DatabasePopulator that converts RawDataset to SQL rows. Phase 1+2 of the migration plan — pure addition with opt-in DB param on CombinedRawDatasetsGenerator. * feat: add CHECK constraints for all enum-backed columns Add SQL CHECK constraints to enforce valid values at the database level for significance, lifecycle_state, implementation, category, verification_type, test status, variant, and element_kind columns. * feat: add EL-to-SQL compiler and SQL-based filter processor (#313) Add ELToSQLCompiler that translates Lark expression language parse trees into SQL WHERE clauses with parameterized queries. Add DatabaseFilterProcessor that replicates the recursive DAG-walk filter logic using SQL DELETEs with cascade cleanup of orphaned SVCs and MVRs. * feat: add service layer, repository, and pipeline for SQLite-backed commands (#313) Phase 4 of the SQLite storage migration: replace CombinedIndexedDataset with direct database queries through RequirementsRepository and service layer (StatisticsService, ExportService). - Add RequirementsRepository as the data access layer wrapping RequirementsDatabase - Add StatisticsService with TestStats/RequirementStatus/TotalStats dataclasses - Add ExportService for JSON export with --req-ids/--svc-ids filtering - Add build_database() pipeline helper in storage/pipeline.py - Rewrite status, export, and report commands to use DB pipeline - Migrate LifecycleValidator from CombinedIndexedDataset to RequirementsRepository - Migrate GroupByOrganizor from CombinedIndexedDataset to RequirementsRepository - Fix multi-pass DB population to satisfy FK constraints across URNs - Update all affected tests for new interfaces Signed-off-by: Jimisola Laursen <jimisola@jimisola.com> * refactor: clean up dead code, improve resource management, and align file names (#313) - Make build_database() a context manager; update all commands to use `with` blocks - Replace Utils dict helpers with collections.defaultdict in parsing graph - Delete unused CombinedIndexedDataset, statistics_container, statistics_generator, indexed_dataset_filter_processor, and 5 dead Utils methods - Remove empty RequirementsELTransformer/SVCsELTransformer subclasses - Rename files to match their primary class names (el_compiler → el_to_sql_compiler, filter_processor → database_filter_processor, generic_el → generic_el_transformer) - Add unit tests for RequirementsRepository, StatisticsService, ExportService, pipeline - Update CLAUDE.md architecture docs to reflect SQLite pipeline Signed-off-by: jimisola <jimisola@jimisola.com> * refactor: collapse status table test columns into single formatted cells (#313) Replace 13-column status table (5 sub-columns per test group) with 5 columns using compact inline formatting. Each test cell shows positionally-aligned counts (T P F S M) with colored numbers and dim dashes for zeros. Remove merged header complexity. - Add _format_test_cell() for single-cell test stats rendering - Use orange for missing, yellow for skipped (was both red) - Empty cell for not_applicable (was ambiguous dash) - Color-coded legend - Delete _build_merged_headers, _parse_col_widths, _replace_header_with_merged, _format_cell Signed-off-by: jimisola <jimisola@jimisola.com> * style: apply black formatting Signed-off-by: jimisola <jimisola@jimisola.com> * fix: correct missing test/MVR totals and handle dangling FK references (#313) Fix two bugs: - TotalStats.missing_automated_tests and missing_manual_tests were never aggregated from per-requirement stats, always reporting 0. Now accumulated from each requirement's TestStats after calculation. - Dangling FK references (e.g. SVC referencing non-existent requirement) crashed with IntegrityError. Now caught gracefully with warnings, allowing semantic validation to report all errors. Signed-off-by: jimisola <jimisola@jimisola.com> * chore: remove stale baselines directory Regression testing verified directly against main. Baselines were from pre-Pydantic-v2 and are no longer needed. Signed-off-by: jimisola <jimisola@jimisola.com> * style: apply black formatting Signed-off-by: jimisola <jimisola@jimisola.com> * fix: remove unused variable in test_database_filter_processor Signed-off-by: jimisola <jimisola@jimisola.com> * refactor: reduce cyclomatic complexity in service and populator methods Extract helper methods from ExportService.to_export_dict (C901: 21→<10), StatisticsService._calculate (C901: 17→<10), and DatabasePopulator.populate_from_raw_dataset (C901: 14→<10) to satisfy flake8 C901 complexity threshold. Signed-off-by: jimisola <jimisola@jimisola.com> * fix: restore Utils import removed during dead-code cleanup The top-level `from reqstool.common.utils import Utils` was incorrectly removed in the dead-code cleanup commit, causing NameError when running as an installed package (the conditional import only covers direct exec). Signed-off-by: jimisola <jimisola@jimisola.com> * feat: migrate commands to SQLite pipeline and fix missing test/MVR counts (#313) Replace CRD→CID pipeline in all three commands (status, export, report) with direct DB queries via RequirementsRepository and service layer. Fix StatisticsService undercounting missing automated tests and MVRs by aggregating from per-requirement stats instead of global annotation scan. Signed-off-by: Jimisola Laursen <jimisola@jimisola.com> * chore: remove stale TestStatisticsItem pytest exclusion Signed-off-by: jimisola <jimisola@users.noreply.github.com> --------- Signed-off-by: Jimisola Laursen <jimisola@jimisola.com> Signed-off-by: jimisola <jimisola@jimisola.com> Signed-off-by: jimisola <jimisola@users.noreply.github.com> Co-authored-by: Jimisola Laursen <jimisola.laursen@resurs.se>
1 parent 32f8479 commit 919649e

167 files changed

Lines changed: 4497 additions & 13136 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CLAUDE.md

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ hatch run dev:pytest --cov=reqstool tests/unit
2525
hatch run dev:pytest --cov=reqstool tests/integration
2626

2727
# Run a single test file
28-
hatch run dev:pytest tests/unit/reqstool/model_generators/test_combined_indexed_dataset_generator.py
28+
hatch run dev:pytest tests/unit/reqstool/storage/test_requirements_repository.py
2929

3030
# Run a single test by name
3131
hatch run dev:pytest -k "test_name"
@@ -41,7 +41,7 @@ pytest markers: `slow`, `integration`, `flaky`. By default `-m "not slow and not
4141

4242
## Architecture
4343

44-
The pipeline flows: **Location****RawDataset****CombinedRawDataset****CombinedIndexedDataset****Command output**.
44+
The pipeline flows: **Location****parse****RawDataset** (transient) **INSERT into SQLite****Repository/Services query DB****Command output**.
4545

4646
### Locations (`locations/`)
4747
Abstractions for where source data lives. Implementations: `LocalLocation`, `GitLocation`, `MavenLocation`, `PypiLocation`. Each implements `_make_available_on_localdisk(dst_path)` to download/copy the source to a temp dir. `LocationResolver` (`location_resolver/`) handles relative path resolution when an import's location is relative to its parent.
@@ -52,7 +52,18 @@ Abstractions for where source data lives. Implementations: `LocalLocation`, `Git
5252
2. Parses `requirements.yml``RequirementsModelGenerator``RequirementsData`
5353
3. Recursively follows `imports` (other system URNs) and `implementations` (microservice URNs)
5454
4. For each SYSTEM/MICROSERVICE source also parses: `svcs.yml`, `mvrs.yml`, `annotations.yml`, JUnit XML test results
55-
5. Builds a `CombinedRawDataset` (dict keyed by URN string) plus a DAG (`parsing_graph`)
55+
5. Each parsed `RawDataset` is immediately inserted into the in-memory SQLite database via `DatabasePopulator`
56+
57+
### Storage Layer (`storage/`)
58+
In-memory SQLite is the single source of truth after parsing:
59+
- `schema.py` — DDL constants defining all tables with CHECK constraints and FK cascades
60+
- `database.py``RequirementsDatabase` wrapper (connection, insert API, authorizer, regexp function)
61+
- `populator.py``DatabasePopulator` inserts `RawDataset` contents into the database
62+
- `pipeline.py``build_database()` orchestrates: parse → populate → filter → lifecycle validate
63+
- `filter_processor.py``DatabaseFilterProcessor` applies requirement/SVC filters via SQL DELETEs + CASCADE
64+
- `el_compiler.py` — compiles Lark EL parse trees into SQL WHERE clauses with parameterized queries
65+
- `authorizer.py` — SQLite authorizer callback restricting allowed SQL operations
66+
- `requirements_repository.py``RequirementsRepository` data access layer, reconstructs domain objects from DB rows
5667

5768
### Core Data Model (`models/`)
5869
All domain objects are frozen/plain `@dataclass`s:
@@ -62,24 +73,25 @@ All domain objects are frozen/plain `@dataclass`s:
6273
- `MVRsData` / `MVRData` — manual verification results from `mvrs.yml`
6374
- `AnnotationsData` / `AnnotationData` — from `annotations.yml` (code annotations exported by `reqstool-python-decorators`)
6475
- `TestsData` / `TestData` — JUnit XML test results
65-
- `CombinedRawDataset` — flat dict of all raw datasets + parsing graph
66-
- `CombinedIndexedDataset` — fully resolved, indexed, post-filtered dataset used by commands
76+
- `CombinedRawDataset` — flat dict of all raw datasets + parsing graph (used during population and by `SemanticValidator`)
6777

6878
Variants (defined in `requirements.yml` metadata): `SYSTEM`, `MICROSERVICE`, `EXTERNAL`.
6979

70-
### Indexing & Filtering (`model_generators/`)
71-
`CombinedIndexedDatasetGenerator` (`combined_indexed_dataset_generator.py`) takes a `CombinedRawDataset` and produces a `CombinedIndexedDataset` with cross-reference indexes (e.g. `svcs_from_req`, `mvrs_from_svc`). If `_filtered=True`, it delegates filter application to `IndexedDatasetFilterProcessor` (`indexed_dataset_filter_processor.py`), which applies requirement and SVC filters defined in the YAML using the expression language.
80+
### Services (`services/`)
81+
Business logic layer querying the database via `RequirementsRepository`:
82+
- `StatisticsService` — computes per-requirement and total statistics (`TestStats`, `RequirementStatus`, `TotalStats`)
83+
- `ExportService` — builds export dict conforming to `export_output.schema.json`
7284

7385
### Expression Language (`expression_languages/`)
74-
Custom Lark-based DSL for filter expressions in `requirements.yml` / `svcs.yml`. Grammar supports `and`, `or`, `not`, `ids ==`, `ids !=`, and regex matching. `GenericELTransformer[T]` is the base; `RequirementsELTransformer` and `SVCsELTransformer` are thin subclasses.
86+
Custom Lark-based DSL for filter expressions in `requirements.yml` / `svcs.yml`. Grammar supports `and`, `or`, `not`, `ids ==`, `ids !=`, and regex matching. `GenericELTransformer[T]` is the base; `RequirementsELTransformer` and `SVCsELTransformer` are thin subclasses. `ELToSQLCompiler` compiles parse trees into SQL WHERE clauses.
7587

7688
### Validation (`common/validators/`)
7789
- `syntax_validator.py` — JSON Schema validation against `resources/schemas/v1/`
7890
- `semantic_validator.py` — post-parse cross-reference checks (SVC refs valid reqs, annotations ref valid IDs, MVRs ref valid SVCs)
7991
- `lifecycle_validator.py` — warns when DEPRECATED/OBSOLETE items are still referenced
8092

8193
### Commands (`commands/`)
82-
Four commands, each consuming a `CombinedIndexedDataset`:
94+
Each command calls `build_database()` then queries via `RequirementsRepository` and services:
8395
- `report` — Jinja2 template rendering (`common/jinja2.py`) → AsciiDoc or Markdown via `--format`
8496
- `report-asciidoc`*deprecated*, use `report --format asciidoc` instead
8597
- `export` — JSON output with optional `--req-ids` / `--svc-ids` filters (replaces `generate-json`)
@@ -137,7 +149,7 @@ If a diff is expected (e.g. the PR intentionally changes output), note it in the
137149

138150
- **URN format**: `some:urn:string` — the separator is `:`. `UrnId` is the canonical composite key used throughout indexes.
139151
- **`@Requirements("REQ_xxx")`** decorator from `reqstool-python-decorators` annotates methods that implement a requirement. This is how the tool tracks its own requirement coverage.
140-
- Data flows are uni-directional: raw parsing → indexing → output. Mutation only happens inside generators before the final `CombinedIndexedDataset` is frozen.
152+
- Data flows are uni-directional: parsing → SQLite population → filtering (SQL DELETEs) → read-only queries via repository. Commands never mutate the database.
141153
- `assert` statements are used for invariant checks in the generators (not for user-facing validation).
142154
- Tests under `tests/unit` use file-based fixtures from `tests/resources/`.
143155
- After code changes, also verify scenarios in `TEST_MATRIX.md`.

baselines/README.md

Lines changed: 0 additions & 137 deletions
This file was deleted.

0 commit comments

Comments
 (0)