Skip to content

Commit 81022e6

Browse files
jimisolaCopilotJimisola Laursen
authored
feat: migrate data models from dataclasses to Pydantic v2 (#306)
* chore: capture regression baselines for Pydantic v2 migration Capture CLI output (status, report adoc/md, export) for all 12 in-repo test datasets + reqstool-demo + filtered exports + generate-json alias. Also save full pytest output (212 passed, 2 skipped, 92% coverage). These baselines will be used to verify zero regressions after each migration phase. Remove before merging to main. Ref: #140 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: correct JSON Schema design issues in 3 schema files Fix 1: common.schema.json — move requirement_ids, svc_ids, mvr_ids from patternProperties to properties. These are fixed field names, not dynamic keys. The regex also had an operator-precedence bug that accidentally accepted malformed keys. Fix 2: annotations.schema.json — move implementations and tests from patternProperties to properties. Same issue as Fix 1. Fix 3: reqstool_config.schema.json — move misplaced additionalProperties:false from inside properties block to the resources sub-object where it belongs. All fixes are non-breaking: every valid YAML is still accepted. Regression verified: 212 tests pass, CLI output identical across all 12 datasets + reqstool-demo. Ref: #140 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add Pydantic v2 deps and generate models from JSON Schemas - Add pydantic==2.12.5 to project dependencies - Add datamodel-code-generator==0.54.1 to dev dependencies - Add hatch dev script 'codegen' for reproducible model generation - Generate Pydantic v2 models from all 6 JSON Schemas - All 49 YAML fixtures validate against generated models (100% pass) - All 212 existing tests pass unchanged Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: migrate model generators to use Pydantic model_validate() - MVRsModelGenerator: uses generated MVRsPydanticModel - AnnotationsModelGenerator: uses generated AnnotationsPydanticModel - SVCsModelGenerator: uses generated SVCsPydanticModel - RequirementsModelGenerator: uses generated RequirementsPydanticModel All generators now parse YAML via model_validate() instead of manual dict['key'] access. Internal dataclass models (RequirementData, SVCData, etc.) are preserved — generators convert from Pydantic to dataclass. 212 tests pass, zero regression diffs across all CLI commands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: replace jsonpickle with stdlib json serialization - Remove jsonpickle dependency and custom handlers (UrnIdHandler, RevisionHandler, JsonEnumHandler) - Add _serialize() function using dataclasses.fields() for recursive serialization of UrnId, Version, Enum, and nested dataclasses - Remove post-processing regex hack for json:// artifacts - Fix: LIFECYCLESTATE and IMPLEMENTATION enums now serialize to clean values instead of leaking jsonpickle internals (_value_, _name_, __objclass__, py/type) - All other JSON fields remain structurally identical 212 tests pass, reports/status output byte-identical to baseline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: convert data model dataclasses to Pydantic BaseModel Convert all @DataClass data models to Pydantic v2 BaseModel: - UrnId: ConfigDict(frozen=True) for hashability - LifecycleData: fix reason default (was type object) - RequirementData/SVCData/MVRData/AnnotationData/TestData: strict Dict[UrnId, X] types - CombinedIndexedDataset/RawDataset: Pydantic with Optional fields - StatisticsContainer: rename _private fields to public (Pydantic PrivateAttr incompatible) - Filters, configs, indata: BaseModel with model_post_init - Replace dataclasses.replace() with model_copy(update=...) - Fix annotations nesting bug: append->extend (List[List[T]] -> List[T]) - Fix type annotations: Dict[str,...] -> Dict[UrnId,...] where UrnId keys used - Fix CombinedRequirementTestItem defaults (was type objects) - Use VersionField (BeforeValidator) for packaging.version.Version coercion - Access model_fields on class not instance: type(obj).model_fields Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: remove unnecessary @DataClass decorators and dead code - Remove @DataClass from implementations.py and imports.py subclasses (empty classes inheriting from LocationResolver, no own fields) - Remove is_dataclass fallback from generate_json.py _serialize() (dead code — CombinedIndexedDataset has no dataclass fields) - Update CONTRIBUTING.md with model generation documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: add frozen models, Pydantic serializers, and baselines docs - Add frozen=True to immutable data models: RequirementData, ReferenceData, MetaData, MVRData, AnnotationData, TestData, LifecycleData - Add model_serializer to UrnId for automatic str serialization - Add PlainSerializer to VersionField for {major, minor, patch} output - Add field_serializer for sorted Set serialization - Replace custom _serialize() in generate_json.py with model_dump(mode='json') - Deduplicate VersionField: svcs.py now imports from requirements.py - Add baselines/README.md with regression testing instructions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: migrate location classes from dataclass to Pydantic - Convert LocationInterface, LocationResolver, and all location classes (GitLocation, LocalLocation, MavenLocation, PypiLocation, LocalMavenLocation, LocalPypiLocation) from @DataClass to Pydantic BaseModel - Replace dataclasses.replace() with model_copy(update=...) in LocationResolver - Replace __post_init__ with model_post_init in LocationResolver - Rename _current_unresolved to current_unresolved (Pydantic PrivateAttr) - Make env_token Optional[str] in GitLocation and MavenLocation - Zero @DataClass usage remains in the codebase (except model generators which are infrastructure, not data models) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: rename common/dataclasses/ to common/models/ After the Pydantic v2 migration, this directory contains Pydantic BaseModel classes (UrnId, LifecycleData), not Python dataclasses. Rename for clarity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * style: fix formatting and remove unused imports Apply black formatting and remove unused imports flagged by flake8: - PrivateAttr, model_validator (group_by.py) - Requirement (annotations_model_generator.py) - field (indexed_dataset_filter_processor.py) - LOCATIONTYPES (requirements_model_generator.py) - PrivateAttr (requirements_indata.py) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * ci: add codegen-check to validate generated models stay in sync - Add codegen-check hatch script (codegen + --check + --disable-timestamp) - Add --disable-timestamp to codegen script to avoid timestamp-only diffs - Add validation step to lint.yml workflow - Regenerate models with current datamodel-code-generator version CI will now fail if JSON Schemas are changed but generated Pydantic models are not regenerated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: count 1 per requirement with implementation instead of summing annotations The TotalStatisticsItem.update() was accumulating all annotation counts instead of counting 1 per requirement that has any implementation, causing the "Implemented" count to exceed total requirements. Also changed the Implementation column from text (Implemented/Missing) to numeric count, and added implementation total to the Total row. Signed-off-by: jimisola <jimisola@jimisola.com> --------- Signed-off-by: jimisola <jimisola@jimisola.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Jimisola Laursen <jimisola.laursen@resurs.se>
1 parent e3c934b commit 81022e6

188 files changed

Lines changed: 11196 additions & 644 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/lint.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,5 @@ jobs:
2323
run: pip install check-jsonschema
2424
- name: Validate JSON Schemas
2525
run: check-jsonschema --verbose --schemafile "https://json-schema.org/draft/2020-12/schema" src/reqstool/resources/schemas/*/*
26+
- name: Validate generated Pydantic models are up to date
27+
run: hatch run dev:codegen-check

CONTRIBUTING.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,18 @@ hatch run dev:black src tests
3939
# Lint with flake8
4040
hatch run dev:flake8
4141
```
42+
43+
## Model Generation
44+
45+
Pydantic data models in `src/reqstool/models/generated/` are auto-generated from the JSON Schemas
46+
in `src/reqstool/resources/schemas/v1/`. **JSON Schema is the source of truth** — never edit the
47+
generated files directly.
48+
49+
To regenerate after modifying a schema:
50+
51+
```bash
52+
hatch run dev:codegen
53+
```
54+
55+
This runs `datamodel-codegen` to produce Pydantic v2 `BaseModel` classes from all schema files.
56+
The generated files should be committed alongside the schema changes.

PLAN_CODE_SMELLS.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Code Smells — Future Issues
2+
3+
Identified during the Pydantic v2 migration (PR #306). These are **not** related to the migration itself and should be addressed in separate issues/PRs.
4+
5+
---
6+
7+
## High Severity
8+
9+
### Scattered `sys.exit()` in location classes
10+
- **Files:** `maven_location.py:42-53`, `pypi_location.py:39-58`, `command.py:379-402`
11+
- **Problem:** Location classes call `sys.exit(1)` directly instead of raising exceptions. Makes unit testing impossible without mocking `sys.exit()`. Inconsistent error context (`RuntimeError` vs `RequestException`).
12+
- **Fix:** Create exception hierarchy (`LocationException`, `ArtifactNotFoundError`, etc.). Only `command.py` should call `sys.exit()`.
13+
14+
### Mutable singleton `TempDirectoryUtil`
15+
- **Files:** `common/utils.py:346-372`
16+
- **Problem:** Mutable class-level state (`tmpdir`, `count`), lazy init singleton, no cleanup guarantee, thread-unsafe counter. Test pollution risk.
17+
- **Fix:** Replace with dependency-injected `TempDirectoryManager` using context managers for guaranteed cleanup.
18+
19+
---
20+
21+
## Medium Severity
22+
23+
### Duplicated filter parsing logic
24+
- **Files:** `requirements_model_generator.py:250-299`, `svcs_model_generator.py:72-115`
25+
- **Problem:** Two near-identical ~50-line methods parse filters with same dict keys, same nesting, same UrnId conversion. Both have `# NOSONAR` suppression.
26+
- **Fix:** Extract into a shared `FilterParser` utility. Both methods reduce to 3-line calls.
27+
28+
### Monolithic `status.py` with manual table rendering (428 lines)
29+
- **Files:** `commands/status/status.py:78-200+`
30+
- **Problem:** Mixes statistics calculation, manual Unicode box-drawing, and ANSI color handling. 70-line function just for merged headers. Works around `tabulate` library limitations.
31+
- **Fix:** Replace manual table rendering with **Rich** library. Eliminates ~300 lines of string manipulation.
32+
33+
### No unit tests for CLI entry point
34+
- **Files:** `src/reqstool/command.py` (408 lines)
35+
- **Problem:** Main CLI entry point has zero unit tests. Only legacy E2E test exists for deprecated `generate-json`. Contains a `TODO $$$` comment.
36+
- **Fix:** Add `tests/unit/reqstool/test_command.py` covering argument parsing, location routing, error handling, deprecation warnings.
37+
38+
---
39+
40+
### Remaining `@dataclass` in generators/validators
41+
- **Files:** `combined_indexed_dataset_generator.py`, `indexed_dataset_filter_processor.py`, `syntax_validator.py`
42+
- **Problem:** Three infrastructure/service classes still use `@dataclass` after the Pydantic migration, creating inconsistency. They have 100+ internal `_` prefixed field references that Pydantic treats as `PrivateAttr`, so converting requires renaming all fields.
43+
- **Fix:** Convert to `BaseModel`, rename `_` prefixed fields to public names. Large diff but purely cosmetic — these are behavioral classes, not data models.
44+
45+
---
46+
47+
## Low Severity
48+
49+
### Repetitive CLI parser setup
50+
- **Files:** `command.py:115-169`
51+
- **Problem:** Four location types repeat ~8 lines each of parser registration. Deprecated wrappers (`report-asciidoc`, `generate-json`) still exist as code.
52+
- **Fix:** Config-driven builder pattern with a location registry dict.
53+
54+
### Expression language — minor inefficiencies
55+
- **Files:** `expression_languages/generic_el.py`, `requirements_el.py`, `svcs_el.py`
56+
- **Assessment:** Lark is the right tool — 26-line grammar, LALR parser, handles operator precedence and regex. **Not over-engineered.**
57+
- **Minor issues:**
58+
1. `RequirementsELTransformer` and `SVCsELTransformer` are empty `pass` subclasses — no runtime value
59+
2. New transformer instance created per item evaluated — could cache and reuse
60+
3. Regex and nested boolean logic exist in grammar but production YAML only uses simple expressions
61+
- **Fix:** Remove empty subclasses, optimize transformer reuse. Low effort.

baselines/README.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Regression Baselines
2+
3+
This directory contains CLI output captured from `main` **before** the Pydantic v2 migration.
4+
Use these baselines to verify that code changes don't produce unintended output differences.
5+
6+
**This directory should be removed before merging to `main`.**
7+
8+
## How baselines were captured
9+
10+
All baselines were captured on `main` at commit `784f77b` (the first commit on the feature branch).
11+
12+
### Naming convention
13+
14+
```
15+
<dataset>__<command>.txt — CLI stdout (+ stderr for status)
16+
<dataset>__<command>.exitcode — exit code of the command
17+
```
18+
19+
### Datasets
20+
21+
| Baseline prefix | Fixture path |
22+
|---|---|
23+
| `test_standard_baseline_ms-001` | `tests/resources/test_data/data/local/test_standard/baseline/ms-001` |
24+
| `test_standard_baseline_sys-001` | `tests/resources/test_data/data/local/test_standard/baseline/sys-001` |
25+
| `test_standard_empty_ms_ms-001` | `tests/resources/test_data/data/local/test_standard/empty_ms/ms-001` |
26+
| `test_standard_empty_ms_sys-001` | `tests/resources/test_data/data/local/test_standard/empty_ms/sys-001` |
27+
| `test_basic_baseline_ms-101` | `tests/resources/test_data/data/local/test_basic/baseline/ms-101` |
28+
| `test_basic_lifecycle_ms-101` | `tests/resources/test_data/data/local/test_basic/lifecycle/ms-101` |
29+
| `test_basic_lifecycle_validation_error` | `tests/resources/test_data/data/local/test_basic/lifecycle/validation_error` |
30+
| `test_basic_no_impls_basic_ms-101` | `tests/resources/test_data/data/local/test_basic/no_impls/basic/ms-101` |
31+
| `test_basic_no_impls_with_error_ms-101` | `tests/resources/test_data/data/local/test_basic/no_impls/with_error/ms-101` |
32+
| `test_delete_mvr_ms-001` | `tests/resources/test_data/data/local/test_delete_mvr/ms-001` |
33+
| `test_delete_mvr_sys-001` | `tests/resources/test_data/data/local/test_delete_mvr/sys-001` |
34+
| `test_errors_ms-101` | `tests/resources/test_data/data/local/test_errors/ms-101` |
35+
| `reqstool_demo` | `../reqstool-demo/docs/reqstool` (sibling project, requires `./mvnw verify` first) |
36+
37+
## Running regression tests
38+
39+
### 1. Run pytest
40+
41+
```bash
42+
hatch run dev:pytest --override-ini="log_cli=false" -q
43+
```
44+
45+
### 2. Compare CLI output against baselines
46+
47+
For each in-repo dataset, run all 4 commands and diff against the baseline:
48+
49+
```bash
50+
# Status (strip ANSI color codes)
51+
hatch run dev:python src/reqstool/command.py status local -p <FIXTURE_PATH> 2>&1 \
52+
| sed 's/\x1b\[[0-9;]*m//g' > /tmp/feature-status.txt
53+
diff baselines/<DATASET>__status.txt /tmp/feature-status.txt
54+
55+
# Report (AsciiDoc)
56+
hatch run dev:python src/reqstool/command.py report --format asciidoc local -p <FIXTURE_PATH> \
57+
> /tmp/feature-report-adoc.txt 2>&1
58+
diff baselines/<DATASET>__report_adoc.txt /tmp/feature-report-adoc.txt
59+
60+
# Report (Markdown)
61+
hatch run dev:python src/reqstool/command.py report --format markdown local -p <FIXTURE_PATH> \
62+
> /tmp/feature-report-md.txt 2>&1
63+
diff baselines/<DATASET>__report_md.txt /tmp/feature-report-md.txt
64+
65+
# Export (JSON)
66+
hatch run dev:python src/reqstool/command.py export local -p <FIXTURE_PATH> \
67+
> /tmp/feature-export.txt 2>&1
68+
diff baselines/<DATASET>__export.txt /tmp/feature-export.txt
69+
```
70+
71+
### 3. Quick full regression script
72+
73+
Run all in-repo datasets at once (copy-paste friendly):
74+
75+
```bash
76+
cd /path/to/reqstool-client
77+
78+
# Key datasets to check
79+
for ds in \
80+
"test_standard_baseline_ms-001 tests/resources/test_data/data/local/test_standard/baseline/ms-001" \
81+
"test_standard_baseline_sys-001 tests/resources/test_data/data/local/test_standard/baseline/sys-001" \
82+
"test_basic_baseline_ms-101 tests/resources/test_data/data/local/test_basic/baseline/ms-101"; do
83+
84+
name=$(echo $ds | cut -d' ' -f1)
85+
path=$(echo $ds | cut -d' ' -f2)
86+
87+
echo "=== $name ==="
88+
89+
hatch run dev:python src/reqstool/command.py status local -p $path 2>&1 \
90+
| sed 's/\x1b\[[0-9;]*m//g' > /tmp/f.txt
91+
diff -q baselines/${name}__status.txt /tmp/f.txt && echo " status: OK" || echo " status: DIFF"
92+
93+
hatch run dev:python src/reqstool/command.py report --format asciidoc local -p $path > /tmp/f.txt 2>&1
94+
diff -q baselines/${name}__report_adoc.txt /tmp/f.txt && echo " report_adoc: OK" || echo " report_adoc: DIFF"
95+
96+
hatch run dev:python src/reqstool/command.py report --format markdown local -p $path > /tmp/f.txt 2>&1
97+
diff -q baselines/${name}__report_md.txt /tmp/f.txt && echo " report_md: OK" || echo " report_md: DIFF"
98+
99+
hatch run dev:python src/reqstool/command.py export local -p $path > /tmp/f.txt 2>&1
100+
diff -q baselines/${name}__export.txt /tmp/f.txt && echo " export: OK" || echo " export: DIFF"
101+
done
102+
```
103+
104+
### 4. reqstool-demo regression
105+
106+
Requires the sibling `../reqstool-demo` project with Maven artifacts built (`./mvnw verify`):
107+
108+
```bash
109+
hatch run dev:python src/reqstool/command.py status local -p ../reqstool-demo/docs/reqstool 2>&1 \
110+
| sed 's/\x1b\[[0-9;]*m//g' > /tmp/f.txt
111+
diff baselines/reqstool_demo__status.txt /tmp/f.txt
112+
113+
hatch run dev:python src/reqstool/command.py report --format asciidoc local -p ../reqstool-demo/docs/reqstool \
114+
> /tmp/f.txt 2>&1
115+
diff baselines/reqstool_demo__report_adoc.txt /tmp/f.txt
116+
117+
hatch run dev:python src/reqstool/command.py report --format markdown local -p ../reqstool-demo/docs/reqstool \
118+
> /tmp/f.txt 2>&1
119+
diff baselines/reqstool_demo__report_md.txt /tmp/f.txt
120+
121+
hatch run dev:python src/reqstool/command.py export local -p ../reqstool-demo/docs/reqstool \
122+
> /tmp/f.txt 2>&1
123+
diff baselines/reqstool_demo__export.txt /tmp/f.txt
124+
```
125+
126+
## Known expected diffs from Pydantic v2 migration
127+
128+
1. **Export JSON — enum serialization**: Enums now serialize as clean values (`"effective"`)
129+
instead of verbose jsonpickle format (`{"_value_": "effective", "_name_": "EFFECTIVE", ...}`).
130+
This affects all datasets' `__export.txt` files.
131+
132+
2. **Standard dataset — implementation counts**: The annotations nesting bug fix
133+
(`List[List[AnnotationData]]``List[AnnotationData]`) changes implementation counts
134+
in `test_standard_*` status and report outputs where requirements have multiple annotations.
135+
Basic datasets are unaffected (1 annotation per requirement).
136+
137+
If a diff is expected due to an intentional change, note it in the PR description.

0 commit comments

Comments
 (0)